Zynq UltraScale MPSoC Software Acceleration TRD 2016.1
This wiki page contains information on how to build various components of the Zynq UltraScale+ MPSoC Software Acceleration reference design (TRD). The page also has information on how to setup the hardware and software platforms and run the design on ZCU102 kit. The part used on ZCU102 board is xczu9eg-ffvb1156-1-e-es1.
Table of Contents
About the TRDThe Software acceleration TRD is an embedded signal processing application designed to showcase various features and capabilities of the Zynq UltrScale+ MPSoC ZU9EG device for the embedded domain. The TRD consists of two elements: The Zynq UltraScale+ MPSoC Processing System (PS) and a signal processing application (FFT) implemented in Programmable Logic (PL). The MPSoC allows the user to implement a signal processing algorithm that performs FFT on samples (coming from TPG in PL or SYSMON through external channel) either as a software program running on the Zynq UltraScale+ MP SoC based PS or as a hardware accelerator inside the PL. The TRD demonstrates how the user can seamlessly switch between a software or a hardware implementation and evaluate the cost and benefit of each implementation. The TRD also demonstrates the value of offloading computation-intensive tasks onto PL, thereby freeing the CPU resources to be available for user-specific applications.
For detailed information on complete feature set, hardware and software architecture of the design, please refer to the TRD user guide here.
Download the TRDThe TRD archive (rdf0376-zcu102-swaccel-trd-2016-1.zip) can be downloaded from here.
TRD Directory structure and package contentsThe Software acceleration TRD package is released with the source code, Tcl scripts to build the hardware design through Xilinx Vivado, SDK projects, and an SD card image that enables the user to run the demonstration and software application. It also includes the binaries necessary to configure and boot the ZCU102 board. Prior to running the steps mentioned in this wiki page, user has to download the TRD package and extract its contents to a directory referred to as ‘TRD_HOME' which is the home directory.
The below table describes the content of each directory in detail.
|hardware||Contains hardware design files|
|Sources||Contains HDL sources, constraints and local IP repository|
|Vivado/scripts||Contains the scripts to build the hardware design|
|Software||Contains the software source files|
|Petalinux||Contains the Petalinux project's configuration|
|Xsdk||Contains the SDK project sources|
|Qt_gui||Contains GUI sources|
|Ready_to_test||Contains ready to test binaries|
|BOOT.BIN||BIN file containing FSBL, PL bitstream, U-boot and ARM trusted firmware|
|Autostart.sh||Script to launch the demo|
|Bin||This directory contains the Qt GUI application.|
|README.txt||Contains design version history, steps to implement the design, Vivado and Petalinux versions to be used to build the design.|
|THIRD_PARTY_NOTICES.zip||Contains the Copyright text for third party libraries|
|IMPORTANT_NOTICE_CONCERNING_THIRD_PARTY-CONTENT.txt||Contains information about the third party licences|
- ZCU102 Evaluation Kit with Xilinx Vivado Design Suite, Device locked to xczu9eg-ffvb1156-1-e-es1.
- A Linux development PC with following tools installed
- Xilinx Vivado 2016.1
- Xilinx SDK 2016.1
- Petalinux 2016.1
- Distributed version control system Git installed. For information, refer to the Xilinx Git wiki.
- GNU make utility version 3.81 or higher.
Known IssuesThe mouse response is observed to be slow when the demonstration/test is running.
Running the demoThis section provides step by step instructions on bringing up the ZCU102 board for demonstration of the TRD and running different options from the Graphical User Interface (referred to as GUI).
The binaries required to run the design are in $TRD_HOME/ready_to_test folder. It also includes the binaries necessary to configure and boot the ZCU102 board.
Things to know before running the demo:
a) The SD-MMC card has to be formatted as FAT32 using a SD-MMC card reader. Copy the entire folder content from $TRD_HOME/ready_to_test onto the primary partition of the SD-MMC.
b) Petalinux console login details
User : root
Password : root
Hardware Setup Requirements
Requirements for theTRD demo setup
- The ZCU102 Evaluation kit with the part xczu9eg-ffvb1156-1
- AC power adapter (12 VDC)
- Optional: An USB Type-A to USB Micro-B cable (for UART communication) and a Tera Term Pro (or similar) UART terminal program.
- USB-UART drivers from Silicon Labs
- USB Micro-B to female Adaptor with USB hub is needed for connecting a mouse.
- USB mouse
- 4K monitor with Display Port support
- Certified Display Port cable (version 1.2); TRD tested with 6 feet long E342987, Cable matters
- (Optional, required only for testing with external audio input): XA3 SYSMON Headphone Adapter card from Faster Technology
- (Optional, required only for testing with external audio input) An audio source like MP3 player
- (Optional, required only for testing with external audio input) An aux cable with 3.5mm male jack on both ends.
- A SD-MMC flash card containing TRD binaries formatted with FAT32. The SD-MMC should have the required binaries in its primary partition. Copy the binaries from ready_to_test folder of the TRD zip file. The required binaries include :
Note: It is recommended to use ZCU102 Rev-C/D board. TRD binaries have been tested with ViewSonic, Acer display monitors. However, the binaries should work well with any Display Port-compatible output device provided it supports 4K resolution in its EDID database.
Steps for setting the boardConnect various cables to the ZCU102 board as shown in the below figure.
1. Connect a 4K monitor to the DP port on ZCU102 using DP 1.2 cable.
2. Connect an USB mouse to the Micro-B USB connector (Jumper J96 on ZCU102 board).
3. Optional: Connect an USB Micro-B cable into the micro USB port (J83) labeled USB UART on the ZCU102 board and the USB Type-A cable end into an open USB port on the host PC for UART communication.
4. Connect the power supply to the ZCU102 board. Do not switch the power ON.
5. Optional: Plug the XA3 Adapter card into the Sysmon Header on ZCU102 board (J3). Connect Jumpers J5 and J4 on XA3 card as shown in below figure.
6. Optional: Connect the 3.5mm auxiliary cable to XA3 card and audio source. One end connects to audio source and the other end connects to 3.5mm female connector on XA3 card.
7. Insert a SD-MMC memory card, which contains the TRD binaries, into the SD receptacle on the ZCU102 board
8. Make sure the DIP switches (SW6) are set as shown in figure below, which allows the ZCU102 board to boot from the SD-MMC card.
9. Optional: Open a serial communication terminal software like TeraTerm, and set up a new serial communicaiton as shown in below figure.
Click on "New Connection" and select Interface 0 and click OK (as shown in below figure).
Click on Setup -> Serial Port and make sure to setup as shown in below figure
User can see the following on the serial terminal
After linux boot is complete, you see the Petalinux login prompt, as shown in below figure
Run Qt GUI application
A Linux application with Qt-based GUI is provided with the package included on the SD-MMC memory card. This application provides options to user to exercise different modes of the demonstration. User can select Test Pattern Generator (TPG) samples or External audio source (requires the XA3 adapter card, aux cable and audio source for testing).
User can select to perform FFT computation in APU (run as software code on the PS) or in PL (run in the FPGA fabric as a hardware IP core).
User can also apply various windowing techniques on input samples before performing FFT.
Powering on the Qt-based GUI application demo
- Make sure the monitor is set for DP Ultra HD (4K) resolution.
- Turn on power switch (J52)
- The Linux image will load and the frame buffer console is displayed on the monitor.
- The Qt based GUI will load
- When the GUI starts up, the demonstration starts with FFT being computed by software running in APU on samples coming from TPG in PL. You will see that the CPU graph, one CPU is always 100% utilized and other A53 cores are at a low level of activity. The Full power domain AXI HP port 1 is utilized around 0.75 Mbps which is just passing the samples from TPG to PS DDR. The read bandwidth is 0 because TPG is only writing samples to PS DDR.
Running the Qt-based GUI application demo
Exercise different options by pressing the buttons available in the GUI to evaluate the different use cases mentioned below.
Test Start/PauseDemonstration can be paused at any instant by clicking on Pause button, as shown in figure below.
Input SourceThere are two sources of data samples.
Note : To test the external audio (assuming that setup is made as per procedure mentioned above), play an audio from the MP3 player/Phone. The peak voltage of the audio source depends on the manufacturer. The voltage levels of the samples depend on the volume. If the output voltage of the audio signal goes beyond 1V, the waveform will be clipped. Adjust the volume on the audio source so that the voltage of the samples lies within 1V peak-to-peak.
|Use case||Input source|
|1||Hardware Test Pattern Generator (TPG in PL). This is the default option.|
|2||External audio input(through XA3 SYSMON Headphone Adapter card)|
FFT Computation Engine
For the two input sources mentioned in above table, user can select one of the following compute engines for FFT computation.
|FFT Compute Engine||Description|
|APU (default)||FFT computation is done by software running on APU|
|NEON||FFT computation is done by software running on APU. Neon intrinsic APIs are used for FFT computation to make|
sure that the instructions are executed on NEON.
|APU controlled PL Accelerator||FFT computation is done by the FFT core in Programmable Logic(PL)|
|RPU as Co-processor||FFT computation is done by software running on RPU. APU is involved in moving samples from TPG in PL to PS DDR.|
Samples from PS DDR are copied to OCM by APU software and that information is passed to RPU through OpenAMP channel.
|RPU controlled PL Accelerator||FFT computation is done by PL FFT IP. RPU controls the AXI DMA transfers to/from PL FFT core from/to PS DDR.|
APU is involved in moving samples from TPG in PL to PS DDR. Samples from PS DDR are copied to OCM by APU
software and that information is passed to RPU through OpenAMP channel. PL FFT core fetches samples from OCM
and computes FFT on the samples and writes samples back to OCM.
|All||Runs FFT on all engines one at a time. This mode is useful for comparing computation times for various engines.|
FFT LengthFFT length determines the number of samples on which FFT computation is performed. User can run the following FFT sizes.
FFT WindowUser can apply one of the window function on the input samples before FFT computation.
|None (Default, No windowing)|
Frequency ZoomUser can select the following Frequency Zoom options
|FFT Zoom option||Description|
|ZOOM (default)||This is the default option. Selecting this option fixes the units on frequency axis in the Frequency domain plot to 512.|
This enables users to closely observe the values on frequency axis. This is 4X zoom for 4096 and 8X for 8192 point FFT.
|NONE||None is No Zoom. Selecting this option will plot all points on frequency axis (Number of points equal to half of the FFT size)|
FFT ScaleUser can select the different scales on Voltage/Amplitude axis. This option is important when using external audio source as input. The voltage of the samples is dependent on the volume of the audio signal. Depending on the amplitude of the audio samples, the scale can be selected. Available options are:
Sample RateThe sampling rate of the SYSMON in PL can be changed on run time. Supported sampling rates are:
|200 kSPS (default)|
Note: The sampling rate option is applicable for SYSMON and is visible on the GUI only when Input source is selected as External Audio input.
Time and Frequency domain plotsThe time domain plot plots the samples corresponding to data generated by either TPG or by external audio source. The number of points in the plot depends on the FFT size.
The frequency domain plot plots the power spectral density (not in logarithm scale). It is a function of voltage vs frequency bins. The value “Fp” on the extreme right corner of frequency domain plot depicts the frequency bin with highest energy. The number of frequency bins plotted is half of FFT size (half because of symmetry for real valued samples) when “NONE” is selected in Frequency Zoom control and 512 by default (ZOOM enabled).
FFT Computation time plotThe time taken for FFT computation by each engine is plotted on the “FFT computation plot”. The average computation times for 4096 point FFT are captured for reference in below table:
|Computation Engine||~Average computation time (us)|
|APU with Neon as Co-processor||350|
|APU controlled PL||120|
|RPU controlled PL||240*|
- RPU is running at 500 MHz and APU is running at 1.1G. Also, the OpenAMP communication latency is included which is approximately 100 μs.
CPU Utilization plotThe APU cluster (A53 cores) utilization is plotted in “CPU Utilization Plot”.
PS-PL Interface Performance plotThe bandwidth utilization of Full Power domain and Low power domain high performance ports is plotted by “PS-PL performance plot”. The write and read throughput is plotted.
PL Die temperatureThe PL Die temperature is read from the SYSMON and displayed on the GUI.
Block Diagram viewThe top-level block diagram and the blocks involved in data path for each of the modes in Input source and FFT computation engines is displayed in the bottom right corner of the GUI.
Building the Hardware design using VivadoThis section explains how to generate the FPGA hardware bitstream using the Xilinx Vivado tool and how to export the hardware platform to Xilinx Software Development Kit (XSDK) for software application development.
Steps for building the FPGA hardware bitstream
On Windows 7, select Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado 2016.1 Tcl shell
On Linux, enter Vivado at the command prompt.
NOTE for Windows users: Copy directory 'hardware' that is at '$TRD_HOME/' to a drive directly because of windows file path limit (255 characters) before following the next steps for building hardware bitstream. If the design errors out due to the path length limitation, please follow steps mentioned in the Answer Record.
From the Vivado welcome screen, in TCL console, run following commands
1. cd $TRD_HOME/hardware/vivado/scripts 2. source ./create_project.tcl
In the Flow Navigator pane on the left-hand side under Program and Debug, click Generate Bitstream > Yes (shown in below figure).
After the bitstream generation is successful, the user will see a screen as shown in below figure. The bitstream will be generated at $TRD_HOME/hardware/vivado/runs/swaccel_design.runs/impl_1/swaccel_top.bit
Before exporting the hardware design the implemented design has to be opened. Select Open Implemented design > OK.
After the implemented design is opened, export the Hardware design by clicking on File > Export > Export Hardware as shown in below Figure.
Select the option, Include bitstream as shown in below Figure.
The SDK hardware platform will be exported to $TRD_HOME/hardware/vivado/runs/swaccel_design.sdk/swaccel_top.hdf
To exit Vivado, click on button X on the top right corner of Vivado IDE. Click on OK to exit.
Building the Software components
Building RPU firmware using XSDK
- Open XSDK with new workspace (say $WORKSPACE)
- Add local repo to support floating point computation in hardware
- Xilinx Tools > Repositories
- Add New Local Repository by clicking new button
- Specify the path for the local repository: $TRD_HOME/software/xsdk/local_repo/
- Click "Apply" and then "OK" button
- Create New FreeRTOS BSP
- File > New > Board Support Package
- As no hw platform is present, Specify hardware design when prompted.
- "New Hardware Project " Window will appear
- Set Project name: sw_acc_hw_platform
- Set Target hardware specification path as $TRD_HOME/hardware/vivado/runs/swaccel_design.sdk/swaccel_top.hdf
- Prebuilt hdf file can also be used from $TRD_HOME/software/petalinux/hw-description/swaccel_top.hdf
- Click Finish
- "New Board Support Package Project" window will appear.
- Set Target Hardware > CPU: psu_cortexr5_0
- Select Board Support package OS: freertos823_xilinx
- Click on Finish.
- Board support package setting window will open [select following]:
- Select/check : Overview > Supported Libraries > openamp
- Navigate to Overview > drivers > psu_cortexr5_0
- Set value field of "extra_compiler_flags" as -DARMR5 -O3 -Os -mfloat-abi=hard
- Click Finish.
- Import RPU application.
- Select File > Import > General > Existing Projects into Workspace
- Click Next
- Browse for “Select root directory” and set the path as “$TRD_HOME/software/xsdk”
- Select r5FFT project
- Select “copy projects into workspace” and select OK.
- Click "Finish"
- This is will automatically build the r5FFT application.
- copy this firmware from $WORKSPACE/r5FFT/Release/r5FFT.elf to sd card @ bin/firmware directory.
Build Linux and Boot images using Petalinux
Setup PetaLinux Working Environment
- source petalinux's settings script from the petalinux installation path.
bash> source <path to PetaLinux installation>/settings.sh
Below is an example of the output from sourcing the setup script for the first time:
PetaLinux environment set to ’/home/user/petalinux-v2016.1-final
INFO: Checking free disk space
INFO: Checking installed tools
INFO: Checking installed development libraries
INFO: Checking network and other services
WARNING: No tftp server found - please refer to "PetaLinux SDK Installation Guide" for its impact and solution
The post-install step only occurs once. Subsequent runs of the settings script should be much quicker, and simply output a confirmation message such as that shown below:
PetaLinux environment set to ’/opt/petalinux-v2016.1-final'
Note: After this step the petalinux installation path is set in $PETALINUX envirnoment variable.
Build FSBL, ATF, u-boot, Linux Kernel, Device tree and rootfs images
- Configure the BSP with the hardware definition (hdf file) exported from Vivado and use provided system config files.
bash> cd $TRD_HOME/software/petalinux bash> petalinux-config --get-hw-description=$TRD_HOME/hardware/vivado/runs/swaccel_design.sdk --oldconfig
- Build all PetaLinux BSP components including FSBL, ATF, u-boot, kernel, device tree blob and rootfs.
Copy image.ub in to the root directory of the SD card.
Creating BOOT.BIN image
To create BOOT.BIN image run following command
bash > petalinux-package --boot --fpga subsystems/linux/hw-description/swaccel_top.bit --uboot
This will generate the BOOT.BIN at $TRD_HOME/software/petalinux/images/linux/ directory.
Copy BOOT.BIN in to the root directory of the SD card.
Building QT GUI application
QT application can be built using petalinux build environment.
- Verify petalinux environment is set.
Code Block theme Midnight
bash > echo $PETALINUX
- Setup environment for the QT application build.
Code Block theme Midnight
bash> cd $TRD_HOME/software/Qt_gui bash> export SYSROOT=$TRD_HOME/software/petalinux/build/linux/rootfs/stage bash> source ./qmake_set_env.sh bash> export PATH=$PETALINUX/tools/yocto-sdk/sysroots/x86_64-petalinux-linux/usr/bin/qt5:$PATH bash> export QT_CONF_PATH=./qt.conf
- Build the QT application
Code Block theme Midnight
bash> qmake sw_accel_qt.pro -r -spec linux-oe-g++ bash> make
- This will build sw_accel_qt application @ $TRD_HOME/software/Qt_gui
- Copy sw_accel_qt in bin/ directory of the SD card.
User can now follow the above Board Setup steps to start the demo.