Zynq-7000 AP SoC Benchmark - LMBench Tech Tip
Document History
Date | Version | Author | Description of Revisions |
16/05/13 | 0.0 | Prush Palanichamy | Initial revision |
Summary
This document shows steps to download, compile and run LMbench.
Theory
LMbench is a suite of simple, portable, ANSI/C microbenchmarks for UNIX/POSIX. In general, it measures two key features: latency and bandwidth. LMbench is intended to give system developers insight into basic costs of key operations.
The LMbench suite includes the following benchmarks
- Bandwidth benchmarks
- Latency benchmarks
- Cached file read
- Memory copy (bcopy)
- Memory read
- Memory write
- Pipe
- TCP
- Context switching
- Networking: connection establishment, pipe, TCP, UDP, and RPC
- File system creates and deletes
- Process creation
- Signal handling
- System call overhead
- Memory read latency
Hardware Setup
Implementation Details |
Design Type | PS |
SW Type | Linux |
Boards/Tools | ZC702 |
Xilinx Tools Version | IDE 14.5 |
Files Provided |
| Pre-built LMbench binaries and shell script to run the benchmark application. If you are using these files, you can skip the Download LMbench and Build LMbenchsteps and jump right to Running LMbench section. |
Build System | Linux |
1. Download LMbench
1) Go to
http://sourceforge.net/projects/LMbench/ and download the tar file
2. Build LMbench
1) Untar the LMbench-3.0-a9.tgz
2) cd to LMbench directory
3) The command to make is “make CC=arm-xilinx-linux-gnueabi-gcc”.
Note: 4.7.2 version of the compiler “arm-xilinx-linux-gnueabi-gcc” is used for this tech tip
.4) Copy the folder lmbench (unzip the lmbench.zip file) to an SD card
5) Copy the shell scripts, basic.sh, bw.sh and lat.sh to SD card
6) Copy the 14.5 released kernel to SD card. You can find the released files at
http://www.wiki.xilinx.com/Zynq+14.5+-+2013.1+Release7) Power up the board in SD boot mode
a. login : root b. Password : root8) Mount the SD card using the command
3. Run LMbench
1) Power up the zc702 board in SD boot mode
2) Mount sd card
>mount /dev/mmcvblk0p1 /mnt3) cd to lmbench
4) Run the full.sh script
>./full.sh
4. Result
# ./full.shBANDWIDTH MEASUREMENTSFile read bandwidth with openclose7.00 233.00File read bandwidth with io only7.00 230.94Memory read7.00 549.67Mem write7.00 382.05Mem Read/write7.00 334.61Memory copy7.00 238.26Mem- file write7.00 1640.50Mem- File read7.00 350.67Mem- File cp7.00 247.42Mem- bzero7.00 1281.58Mem- bcopy7.00 257.14Mmap with openclose7.00 261.18Mmap only7.00 350.35pipe bandwidthPipe bandwidth: 165.48 MB/secSocketAF_UNIX sock stream bandwidth: 289.56 MB/secLATENCY MEASUREMENTSlatency for ls commandlat_cmd: 2515.8333 microsecondslatency for connectTCP/IP connection cost to localhost: 157.9634 microsecondsLatency- context switch"size=128k ovr=318.292 234.13latency DRAM59.329605Latency -fcntlFcntl lock latency: 9.8366 microsecondsLatency -FS0k 53014 36518 281961k 29831 16649 373934k 28116 18324 2371810k 17671 12002 17740Latency - Mem rd"stride=1280.00049 7.6550.00098 11.4920.00195 8.5420.00293 7.2050.00391 7.5270.00586 7.5960.00781 7.4300.01172 7.7140.01562 7.5910.02344 8.1690.03125 18.5610.04688 50.7900.06250 47.3290.09375 75.7660.12500 59.9460.18750 62.9530.25000 63.2430.37500 76.0920.50000 136.6100.75000 98.0951.00000 91.529Latency -Mmapx: No such file or directoryLatency - Operationsinteger bit: 2.24 nanosecondsinteger add: 2.24 nanosecondsinteger mul: 9.40 nanosecondsinteger div: 155.42 nanosecondsinteger mod: 52.76 nanosecondsint64 bit: 1.85 nanosecondsuint64 add: 2.77 nanosecondsint64 mul: 20.30 nanosecondsint64 div: 417.96 nanosecondsint64 mod: 261.66 nanosecondsfloat add: 16.71 nanosecondsfloat mul: 18.28 nanosecondsfloat div: 31.16 nanosecondsdouble add: 10.02 nanosecondsdouble mul: 16.73 nanosecondsdouble div: 75.54 nanosecondsfloat bogomflops: 47.64 nanosecondsdouble bogomflops: 78.60 nanosecondsLatency -Page faultx: No such file or directoryLatency-PipePipe latency: 35.8538 microsecondsLatency - Process opsProcess fork+exit: 2075.0841 microsecondsProcess fork+execve: 2818.9615 microsecondssh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundsh: /tmp/hello: not foundProcess fork+/bin/sh -c: 6247.0000 microsecondsProcedure call: 0.0203 microsecondslatency-semaphoreSemaphore latency: 5.7432 microsecondsLatency -TCP selectSelect on 200 tcp fd's: 72.4160 microsecondsLatency-SignalsSignal handler installation: 2.3158 microsecondsSignal handler overhead: 6.2273 microsecondsmmap: Bad file descriptorLatency - system callsSimple fstat: 1.7319 microsecondsSimple stat: 9.9183 microsecondsSimple open/close: 20.6044 microsecondsSimple write: 1.6044 microsecondsSimple read: 0.8984 microsecondsSimple syscall: 0.6315 microsecondsLatency tcp/udpTCP latency using localhost: 1.5221 microsecondslat_udp client: recv failed: Connection refusedlatency-socketsAF_UNIX sock stream latency: 46.5033 microsecondsLatency sleepusleep 100 microseconds: 227.3042 microsecondsnanosleep 100 microseconds: 214.0462 microsecondsselect 100 microseconds: 199.9189 microsecondsitimer 100 microseconds: 170.3161 microsecondsCache line size32lmdd testing8.1920 MB in 0.0264 secs, 310.7975 MB/sec960MB OK960CPU frequencymhz: should take approximately 297 seconds539 MHz, 1.8553 nanosec clockTest sleep - 3 secsParallel memory ops0.524288 5.12integer bit parallelism: 2.47integer add parallelism: 2.29integer mul parallelism: 3.23integer div parallelism: 1.83integer mod parallelism: 1.05int64 bit parallelism: 1.00int64 add parallelism: 1.69int64 mul parallelism: 1.00int64 div parallelism: 1.14int64 mod parallelism: 1.00float add parallelism: 7.94float mul parallelism: 3.05float div parallelism: 1.62double add parallelism: 5.02double mul parallelism: 2.29double div parallelism: 2.20STREAM opsSTREAM copy latency: 19.86 nanosecondsSTREAM copy bandwidth: 805.64 MB/secSTREAM scale latency: 33.26 nanosecondsSTREAM scale bandwidth: 481.07 MB/secSTREAM add latency: 52.48 nanosecondsSTREAM add bandwidth: 457.31 MB/secSTREAM triad latency: 33.53 nanosecondsSTREAM triad bandwidth: 715.67 MB/sec5. Making sense of LMbench results
LMbench is a set of microbenchmarks. If you are using LMbench to derive performance number for a macro application like multimedia experience for example, you have to find out which one of these micro benchmarks are used most in that application. You can use Profiling to identify that. Profiling is a process whereby one analyses, for example, a real-world, macro-level workload such as video streaming, and determines what system-level, micro-level units of work (e.g. context switching) make up a significant portion of that higher-level workload. After profiling is done and the micro workloads that affect the system most are identified, you can run the micro benchmarks which are your key influencers.