Zynq UltraScale+MPSoC Graphics- GPU Profiling using ARM Streamline performance analyzer
Table of Contents
Document History
Date | Version | Author | Description of Revisions |
11/11/2016 | 1.0 | Rajesh Gugulothu | Initial Release |
21/02/2017 | 2.0 | Rajesh Gugulothu | Updated with 2016.4 tools Release and Added design files to support zcu102 Rev-B/Rev-D ,Rev-1.0 boards |
10/07/2017 | 3.0 | Rajesh Gugulothu | Updated with 2017.1 tools Release and Added design files to support zcu102 Rev-D2 with production silicon,Rev-1.0 boards with production silicon and Rev 1.0 board. |
14/06/2018 | 4.0 | Surender Polsani | Updated with 2018.1 tools Release and Added design files to support zcu102 Rev-D2 with production silicon,Rev-1.0 boards with production silicon and Rev 1.0 board. |
Implementation
Implementation Details | |
Design Type | PS Only |
SW Type | Linux |
CPUs | Quad-core ARM® Cortex™-A53 Application Processing Unit,ARM® Mali-400 MP2 Graphics Processing Unit |
PS Features | DDR controller,UART,SD/eMMC interface, USB 3.0,DisplayPort |
Boards/Tools | ZCU102 Rev-B/Rev-D,Rev 1.0 with production silicon,Rev-1.0 |
Xilinx Tools Version | SDK 2018.1 |
Other Details | ARM® DS-5 Development Streamline performance analyzer |
Host Type | Windows 64-bit |
Files Provided | |
ARM_Streamline_Performance_Analyzer.zip | Archive file contains the Design_files directory. |
Summary:
- Zynq® UltraScale+™ MPSoC delivers unprecedented levels of heterogeneous multi-processing and combines seven user programmable processors including Quad-coreARM® Cortex™-A53 Application Processing Unit (APU), Dual-core 32-bit ARM® Cortex™-R5 Real Time Processing Unit (RPU), and ARM® Mali™-400 MP2 Graphics Processing Unit (GPU). It is Industry’s First Multi-Processor SoC delivering 5x system level performance-per-watt and any-to-any connectivity.
- The main aim of this tech tip is to provide steps by step procedure to analyze performance of a graphics application running on Zynq UltraSCale + MPSoC Mali GPU, using ARM® DS-5 Streamline performance analyzer.
Description
- ARM® DS-5 Development Studio Streamline performance analyzer enables you to get the best out of your system’s resources and create high performance, energy efficient products. Its innovative user interface brings together system performance metrics, software tracing, statistical profiling and power measurement to present you with a system dashboard where you can quickly identify code hotspots, system bottlenecks and other unintended effects of your code or the system architecture.
- This tech tip showcases how easy a user can analyse the performance of a graphics application, using ARM® DS-5 development studio streamline performance analyzer which enables the user to trace their system bottlenecks, helps in optimizing their application.It is also provides the steps to compile the complete Linux system for Zynq UltraScale + MPSoC and demonstrates how to analyze a simple graphics application.
- Highlights of this Tech tip:
- Below are the main topics which are covered in this tech tip :
- Executing the tech tip with prebuilt images.
- Compiling & building Linux for different silicon versions.
- Compiling & building gator module and gator daemon.
- Installing ARM Streamline performance analyzer tool on windows machine.
- Profiling an example graphics application.
Requirements:
- Linux or windows Host machine
- Xilinx ZCU102 (Rev D and above) evaluation kit with power supply.
- Class 10 SD card (4GB or more).
- Micro USB to Standard USB cable
- 1080p Monitor or Full HD Monitor and Display Port Cable.
- USB 3.0 connector or USB 2.0 micro cable to standard USB female adapter, USB Hub to connect USB mouse and USB keyboard or connect USB keyboard with mouse integrated etc.
- Install any HyperTerminal like Tera term, putty, etc. for serial terminal prints. Download the Tera term application at http://download.cnet.com/Tera-Term/3000-20432_4-75766675.html and follow the instructions to install.
Block Diagram:
- Below Block diagram depicts hardware stack and software flow diagram of profiling Graphics application running on Zynq® UltraScale+™ MPSoC GPU, using ARM streamline performance analyzer.
Executing the tech tip with prebuilt images :
- This section helps how to execute this this tech tip using the prebuilt images supplied along with this tech tip.If you want to build all the images from the scratch then, skip this section.
- First of all install the performance analyzer tool by following the section "Installing ARM Streamline performance analyzer tool on windows machine " below.
- Download the ARM_Streamline_Performance_Analyzer.zip file using link provided at the top of this tech tip and extract to your local directory.It will be extracted as below directories
- Rev_1_0_Design_Files - It has images are for zcu102 Rev-1.0 board.
- Rev_D_Prod_Design_Files - It has images are for zcu102 Rev-B/Rev-D ,Rev 1.0 boards with production silicon .
- According to the board, copy corresponding Prebuilt_SD_Images on to a SD card (4GB or above ).
- Do the board setup by following the "ZCU102 board setup" section below in this page.
- Once board setup is done,follow the section "profiling an example graphics application " to execute the demo.
Compiling and building Linux images :
- This section goes through the procedure to compile the Linux for different versions of zcu102 boards.It covers the following topics
- Preparing the Linux Kernel using the Petalinux 2018.1
- BSP Creation
- Kernel Configuration
- Device Tree Configuration
- Building the kernel and device tree blob
- Preparing the FSBL
- Preparing the BOOT.bin file
- BSP Creation
- Download the Petalinux petalinux-v2018.1-installer.run and ZCU102 BSP and ZCU102 BSP for corresponding board
- Note:Download the bsp based on the board you have.Download the ZCU102 BSP ( prod-silicon) for Rev-B/Rev-C/Rev-D,Rev 1.0 boards with production silicon and ZCU102 ES2.0 Rev1.0 BSP for Rev-1.0 board.
- Install the Petalinux by running the above downloaded installer
- $ . / petalinux-v2018.1-installer.run
- $ source <Petalinux_installation_path>/settings.sh
- After installation is done set the Petalinux environment by running below command in bash shell
- Note: use source <Petalinux_installation_path>/settings.csh command for c shell.
- Cross Check the PETALINUX environment variable is set to the above installation path
- $ echo $PETALINUX
- Create the Petalinux project with the below command
- $ petalinux-create -t project –s <path to the downloaded zcu102 bsp>/Xilinx-ZCU102-v2018.1-final.bsp
- Change the directory to created Petalinux project.
Note: In the above command, give the bsp path according to the board revision. - Configure the kernel with below options save the configuration
- $petalinux-config -c kernel
- Kernel hacking > Tracers > Kernel function Tracer
- Configure the rootfs with below mentioned package.
- $petalinux-config -c rootfs
- Filesystem Packages > libs> libmali-xlnx > libmali-xlnx
- Filesystem Packages > libs > libmali-xlnx > libmali-xlnx-dev
- Petalinux Package Groups > packagegroup-petalinux-x11 > packagegroup-petalinux-x11
- Petalinux Package Groups > packagegroup-petalinux-x11 > packagegroup-petalinux-x11-dev
- After enabling all the packages, save the config file and exit the rootfs configuration settings.
- Once the above changes are done, build the petalinux project.
- $ Petalinux-build
- After Petalinux build is done , output binaries get created under the Xilinx-ZCU102-2018.1images/linux
- Create BOOT.bin with above created binaries.Boot.bin will get created under Xilinx-ZCU102-2018.1/images/linux directory.
- $ Petalinux-package --boot --fsbl Xilinx-ZCU102-2018.1/images/linux/zynqmp_fsbl.elf --u-boot Xilinx-ZCU102-2018.1/images/linux/u-boot.elf --fpga Xilinx-ZCU102-2018.1/images/linux/design_1_wrapper.bit
Copy the BOOT.BIN, image.ub files on to a SD card.
Compiling & building gator module and gator daemon:
- This section explains about how to compile & build gator kernel module and daemon.
- Clone the gator-6.1 source from the git path provided below
- change the directory to daemon directory under the Gator source and run the below command to build the gator daemon.
- $bash
- $ export CROSS_COMPILE=<petalinux_installed_directory>/tools/linux-i386/aarch64-linux-gnu/bin/aarch64-linux-gnu-
- cd <Gator_source>/daemon
- $make -f Makefile_aarch64
- After the above build , copy the gatord binary created under the <Gator_source>/daemon directory, to a SD card.
Note: Make sure that cross compilation path is set to the current working environment. - Extract the Mali driver source code by following below command
- $cd <petalinux_project_directory>/build/downloads
- $tar -xvf DX910-SW-99002-r5p1-01rel0.tgz
- Build the gator.ko module by following below commands.
- $ cd <Gator_source>/driver
- $bash
- $ export CROSS_COMPILE=<petalinux_installed_directory>/tools/linux-i386/aarch64-linux-gnu/bin/aarch64-linux-gnu-
- $ GATOR_WITH_MALI_SUPPORT=MALI_4xx CONFIG_GATOR_MALI_4XXMP_PATH=<petalinux_project_directory>/build/downloads/DX910-SW-99002-r7p0-00rel0/driver/src/devicedrv/mali/ make -C <petalinux_project_directory>build/tmp/work-shared/zcu102-zynqmp/kernel-build-artifacts/M=`pwd` ARCH=arm64 modules
- Note: In the above command, it is mandatory to use the Petalinux project directory (Xilinx-ZCU102-2018.1) which is created in above section and set the absolute paths.
After executing the above command , gator.ko module get created under the <Gator_source>/driver directory. - Now copy the above built gatord and gator.ko files on to the SD card( The once which used above to copy the BOOT.bin,image.ub images).
- Make sure that below listed images are present on the SD card and use this for executing the demo.
- BOOT.BIN
- image.ub
- gatord
- gator.ko
- tri_cube(Copy this application from the Prebuilt_SD_Images directory).
Installing ARM Streamline performance analyzer tool on windows machine :
- In this section goes through the procedure for installing streamline performance analyzer tool on windows machine.
- Note:This tech tip assumes host machine as windows 64-bit.
- Download the ARM DS-5 tool chain for windows-64 bit machine from ARM Mali developer’s website. and install it on your windows host machine .ARM DS-5 Streamline performance analyzer tool comes along with this installation.
Note:Install the Latest ARM DS-5 5.26.2 version. - Follow the below link to set the tool for 30 days evaluation license.
https://developer.arm.com/products/software-development-tools/ds-5-development-studio/resources/tutorials/getting-started-with-ds-5-development-studio
ZCU102 Board Setup:
- Before going to execute the demo , make sure the proper connections between the host system and board.Bellow are the steps to do the same
- Connect the Micro USB cable into the ZCU102 Board Micro USB port J83, and the other end into an open USB port on the host PC. This cable will be used for UART over USB communication.
- Connect one end of Ethernet cable into the ZCU102 connector J73, and other end connect to the Ethernet socket of the host machine.
- As target board and host machines are connected locally in the above step, set the host machine address as to 192.168.1.110 by following below steps
- In the Windows machine, click on the start button and select the option control Panel-->Network and Internet -->Network and Sharing Center.
- At the left side pane select the option Change adapter settings.
- Right click on the Local Area Connection and select properties option.
- A user access control dialog box will appear, select yes in that.
- In the Local Area Connection Properties wizard select the internet protocol version 4 ( TCP/IPv4 )option and click on properties tab as marked in the below figure.
- A properties wizard opens, select the Use the following IP address option and give the IP address as 192.168.1.110 and subnet mask as 255.255.255.0 as following
- Insert SD card into the SD card slot J100.
- Set the SW6 switches as shown. This configures the boot settings to boot from SD:
- Connect 12V Power to the ZCU102 6-Pin Molex connector.
- Connect the monitor using DisplayPort cable to U50.
- The following figure shows the ZCU102 board with connections
Profiling an example graphics application:
- This section walks through the steps how to execute,capture and analyze the graphics application performance data using ARM Streamline performance analyzer.
- Now power on the board.
- Setup the Tera term application with following configuration
- Baud rate 115200
- Stop 1 bit
- Data 8bit
- Parity none
- Once the Tera term is connected to the target, Linux booting logs will appear on the terminal. After few second, it will asks for linux username and password. Type username and password as “root “.
- Set the IP address as 192.168.1.111 on the target side with the following command
- $ ifconfig eth0 192.168.1.111
- Make sure that host and target are connected properly by doing ping form either side.
- To communicate with the target device, Streamline performance analyzer tool requires the gator daemon gatord, and gator kernel module gator.ko to be running on the device.Type below commands on at Linux command prompt to insert the module and run the gatord daemon.
- $ mount /dev/<device_name> /media
- $ cd /media
- $ insmod gator.ko
- $ ./gatord &
- Now Run the example tricube application by following below commands.
- $ln -s /usr/lib/libMali.so.8 /usr/lib/libMali.so.1
- $ export DISPLAY=:0.0
- $ ./tri_cube &
- Now open the ARM DS-5 Streamline performance analyzer tool and provide the target IP address in the box marked in the below shown figure
Fig:setting target IP address in streamline tool |
- Now click on the capture and analysis option , see the below figure.
- A capture and analysis wizard opens like as shown in the below figure, configure the required option and click on save.
Fig:capture and analysis wizard |
- Now click on the counter configure option ,as marked red in the below figure
Fig:selecting counter configure option |
- Observe the counter configuration wizard available option and save.
Fig:counter configure wizard |
- Now click on the Start capture option, as shown in below figure
Fig:selecting start capture option |
- After clicking start capture option, it will asks for the location to save the captured data, browse the working directory and save the file for future analysis. Then you will start seeing the GPU performance metrics information in the Streamline performance analyzer like as show in the below figure
Fig: GPU performance metrics information |
Appendix A:File Description in Design directory
- ARM_Streamline_Performance_Analyzer.zip is extracted as
- Rev_1_0_Design_Files
- Prebuilt_SD_Images
- BOOT.BIN
- image.ub
- gatord
- gator.ko
- tri_cube
- Prebuilt_SD_Images
- Rev_D_Prod_Design_Files
- Prebuilt_SD_Images
- BOOT.BIN
- image.ub
- garord
- gator.ko
- tri_cube
- Prebuilt_SD_Images
- Rev_1_0_Design_Files
© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy