...
Performance
These benchmark performance numbers were obtained by connecting Xilinx boards to Linux PCs/server machines (Ubuntu/Red Hat Enterprise).The tool used is netperf (Refer to tool information below).
The protocol, MTU size and option to note CPU load can all be selected from netperf/netserver options
Zynq
Board: ZC706CPU Freq: 666MHz (A9)
Link Speed: 1000Mbps, Full duplex
Linux version: 4.19
TCP (Mbps) | UDP(Mbps) | |||||||
---|---|---|---|---|---|---|---|---|
MTU | TX | CPU(%) | RX | CPU(%) | TX | CPU(%) | RX | CPU(%) |
1500 | 728.76 | 97.29 | 548.70 | 95.96 | 565.6 | 65.00 | 444.8 | 99.55 |
Linux version: 5.4
NOTE- There is ~10% drop in performance (compared to 2019.2) for 1500 MTU.
The drop is due to this commit enabling CONFIG_OPTIMIZE_INLINING forcibly in linux kernel. It is observed on GEM and Xilinx Axi Ethernet drivers on Zynq.
Kernel and networking stack has a large number of inline functions and it could be some unoptimized inline function (could also be dependent on gcc version) leading to performance drop.
The plan is to document this performance drop on Zynq and initiate a discussion with mainline community so that it is analyzed by respective kernel maintainers.
TCP (Mbps) | UDP(Mbps) | |||||||
---|---|---|---|---|---|---|---|---|
MTU | TX | CPU(%) | RX | CPU(%) | TX | CPU(%) | RX | CPU(%) |
1500 | 636654.8579 | 9993.9711 | 737.63 | 81.43 | 487486.8 | 6463.1756 | 303 | 96.23 |
ZynqMP
Board: ZCU102CPU Freq 1100MHz (A53)
Link Speed 1000Mbps, Full duplex
DDR 533MHz
CCU: No
Linux version: 5.4
TCP (Mbps) | UDP (Mbps) | |||||||
---|---|---|---|---|---|---|---|---|
MTU | TX | CPU (%) | RX | CPU (%) | TX | CPU (%) | RX | CPU (%) |
1500 | 941.2723 | 5.0 | 941939.3517 | 2754.0594 | 961.5 | 20.3 | 961.4 | 22.07 |
8192 | 988.85 | 4.9 | 988989.8609 | 1117.6401 | 991.9 | 5.29 | 978985.1 | 1128.7117 |
Test Procedure
Diagnostic and Protocol Tests
PING
This utility used to test the reachability of a host on an Internet Protocol(IP) network and to measure the round trip time for messages sent from the originating host to a destination computer.How to run:
Code Block | ||
---|---|---|
| ||
ping <Remote IP Address> |
WebServer
Connect zynq board to a Linux x86 machine. Ensure that telnet server is running on the Zynq board. It tests for remote access for Zynq board on host machineOpen a web browser on host machine and enter the static IP assigned to zynq board. Webpage is expected to be displayed properly.
Telnet
Code Block | ||
---|---|---|
| ||
telnet <Server IP Address> |
FTP & TFTP
How to run:Open a ftp client on the host with the Zynq.
Code Block | ||
---|---|---|
| ||
x86> ftp 192.168.1.10 |
Code Block | ||
---|---|---|
| ||
x86> mput <file_name> |
Pkt Generator
Please refer to link below for how to run and various optionshttps://www.kernel.org/doc/Documentation/networking/pktgen.txt
Performance Tests
Netperf
How to run:Server:
Code Block | ||
---|---|---|
| ||
netserver |
Code Block | ||
---|---|---|
| ||
taskset 2 ./netperf -H <Server IP> -t TCP_STREAM taskset 2 ./netperf -H <Server IP> -t UDP_STREAM |
http://www.netperf.org/netperf/
Iperf
How to run:Server:
Code Block | ||
---|---|---|
| ||
./iperf_arm -s -u ./iperf_arm -s |
Code Block | ||
---|---|---|
| ||
./iperf_arm -c <Server IP> -u -b <banwidth> ./iperf_arm -c <Server IP> |
http://en.wikipedia.org/wiki/Iperf
Stress Test
Iperf with option -d
Run iperf in dual testing mode. This will cause the server to connect back to the client on the port specified in the -L option (or defaults to the port the client connected to the server on). This is done immediately therefore running the tests simultaneously.Code Block | ||
---|---|---|
| ||
./iperf_arm -c <Server IP> -d |
Ping flood test
Users can send hundred or more packets per second using -f option. It prints a ‘.’ when a packet is sent, and a backspace is printed when a packet is receivedCode Block | ||
---|---|---|
| ||
ping -f localhost |
PTP
1588 synchronization can be tested on ZynqMP using open source linuxptp application.http://linuxptp.sourceforge.net/
The setup requires a master with precise clock and timstamping capabilities, typically a NIC or another 1588 capable device.
How to run
master:
Code Block | ||
---|---|---|
| ||
#ptp4l -i <interface name> -m |
Code Block | ||
---|---|---|
| ||
#ptp4l -i <interface name> -s -m |
Mainline status
The macb driver is currently at mainline kernel 5.4 .19 with some patches pulled in from 5.0. Xilinx local patches are in upstream 5.0, 5.1 and net-next 5.2later kernels. The patches that not yet in any mainline kernel are as follows:- WOL via ARP support (~70 lines)
- Partial store and forward support (~80 lines)
- Versal support (~50 lines)
- Minor differences including mdio phy node support (gmii2rgmii), PCS autoneg and CAPS change, gem_rx_refill skbuff error handling and , optimized HW timestamp reading, high DDR handling and other bugfixes (~50 lines altogether).
Any further changes will be upstreamed
PHY details
The following PHYs were tested with ZynqMP GEM:...