Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip

Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip

Document History

Description of Revisions
September 15, 2013
Faster Technology
Initial posting
February 28, 2014
Faster Technology
Update to 2013.4 release


Virtually all electronic systems today contain some signal processing as part of their fundamental capabilities. The Zynq-7000 AP SoC is ideally suited to handling many of these functions in a single chip solution as will be demonstrated in this Tech Tip.

In Tech Tip "Zynq Ne10 Library Tech Tip" a library of complex filtering functions was obtained and built. This Tech Tip describes a signal processing application that uses the NE10 library built in that prior Tech Tip. The application documented in this Tech TIp performs a complex FFT on a sampled input signal executing on either the ARM processor alone or on the NEON SIMD engine. The application is constructed so it can be used stand alone from the command line or integrated into a larger program. A subsequent Tech Tip will demonstrate how to integrate this Tech Tip into a larger graphical user system.

In addition to demonstrating a speed up of 1.25 to 1.85 when using the NEON SIMD engine vs the ARM processor alone, this Tech Tip will show how to use a standard library of functions in an application and modification of that library for a specific need. All of these are facilitated by the standard implementation of the ARM processor system (PS) in the Zynq-7000 AP AoC, opening up the vast ecosystem of available software to the Xilinx development community. We will also see the power of the debug capabilities in the Xilinx SDK.


Implementation Details
Design Type
PS Only
SW Type
1 CPU - standard ZC702 Frequency
PS Features
ARM processor and NEON SIMD engine
PL Cores
Xilinx Tools Version
Vivado / SDK 2013.4
Other Details
Standard ZC702 setup for console terminal and Ethernet required

Files Provided
FFT Application C source code
Tested starting point workspace for SDK

Block Diagram

Step by Step Instructions

A library of signal processing functions was built in the Tech Tip "Zynq Ne10 Library Tech Tip" and tested in the Tech Tip "Zynq Ne10 Testing Tech Tip". This Tech Tip is built starting with that compiled and tested library.

The application performs a Fast Fourier Transform (FFT) on sampled data from an input waveform. The input data is in a table in the processor memory space and the spectrum output of the FFT is also in a table in processor memory space. A register interface is used for controlling various parameters of the FFT process to facilitate integration of the FFT application with other software. An example of this integration is described in a subsequent Tech Tip "Zynq-7000 AP SoC <name> Tech Tip. The register values are also available in the command line version controlled by the following options:
-v --version Print program version
-h --help Print help message
-s --size Size of FFT
-t --type Type of FFT, real or complex
-i --input Input data type, int or float for 16 bit integers, or 32 bit floats
-o --output Output data type
-r --source Physical address of input data
-d --dest Physical address of output results
-a --arch Processor architecture, 0 = ARM, 1 = NEON, or 2 = CORE
-p --pipeline This is part of a continuous processing pipeline, and not a one time FFT
-g --debug Generate an impulse test pattern at location N-1 of the input table
-l --loop execution of the FFT for N iterations - for timing purposes only

Input and output buffer sizes are calculated from the FFT size, whether the FFT is real or complex and fixed or floating point.
The -a option in the application selects between using just the ARM processor for the computations or using the NEON SIMD engine. This enables the user to see the difference in execution time between these two software approaches. In a subsequent Tech Tip a hardware unit will be added in Programmable Logic to show the performance difference between the two software approaches and execution of the FFT in hardware. The Conclusions section at the end of this Tech Tip contains a simple table of execution time differences between the two software only approaches.

For this Tech Tip, the input source has additional code used in subsequent Tech Tips:
- code to enable using in a continuous sampling and processing system - pipeline mode
- an option to lock the FFT to a specific CPU in the PS (used in a demonstration system)

Modification of the Ne10 Library

The Ne10 Library previously built and tested uses the following process to calculate the Complex FFT:

Input real and imaginary data:

x(n) = xa + j * ya
x(n+N/4 ) = xb + j * yb
x(n+N/2 ) = xc + j * yc
x(n+3N 4) = xd + j * yd
where N is length of FFT

Output real and imaginary data:
X(4r) = xa'+ j * ya'
X(4r+1) = xb'+ j * yb'
X(4r+2) = xc'+ j * yc'
X(4r+3) = xd'+ j * yd'

Twiddle factors for radix-4 FFT:

Wn = co1 + j * (- si1)
W2n = co2 + j * (- si2)
W3n = co3 + j * (- si3

Output from Radix-4 CFFT Results in Digit reversal order. Interchange middle two branches of every butterfly results in Bit reversed output.

Butterfly CFFT equations:

xa' = xa + xb + xc + xd
ya' = ya + yb + yc + yd
xc' = (xa+yb-xc-yd)* co1 + (ya-xb-yc+xd)* (si1)
yc' = (ya-xb-yc+xd)* co1 - (xa+yb-xc-yd)* (si1)
xb' = (xa-xb+xc-xd)* co2 + (ya-yb+yc-yd)* (si2)
yb' = (ya-yb+yc-yd)* co2 - (xa-xb+xc-xd)* (si2)
xd' = (xa-yb-xc+yd)* co3 + (ya+xb-yc-xd)* (si3)
yd' = (ya+xb-yc-xd)* co3 - (xa-yb-xc+yd)* (si3)

The "twiddle factors" in the Ne10 library are hard coded for the FFT sizes 16, 64, 256 and 1024. To expand the size to support 4096 size FFT, code was included in the application to calculate the twiddle factors and then bypass the hard coded tables in the original library. Line 117 in the fft-zynq.c source file is the start of this additional routine. The specifics of the calculations as well as the algorithm used for the CFFT in the Ne10 library are beyond the scope of this Tech Tip. A wealth of information is available on the web such as http://en.wikipedia.org/wiki/Fast_Fourier_transform.

Building the application

Download the C source file for this Tech Tip - "fft-zynq.c" and save it to a convenient location on your computer system. Note where it is saved. In our case, we saved it to G:\Projects.

This Tech Tip uses the workspace that resulted from building and testing the Ne10 library in the Tech Tips "Zynq Ne10 Library Tech Tip" and "Zynq Ne10 Testing Tech Tip". If that workspace is available, skip to the instructions below to start SDK.

If the workspace is not available, or if there is a question if it was completed properly, the referenced file "Ne10TestBuild.zip" can be used to create a known working starting point for this Tech Tip.

Download the Zip file from the Ne10TestBuild.zip link.

Create an empty directory where you will be implementing this Tech Tip. To be consistent with the balance of these step by step instructions, the directory could be:


However, these steps to import a known workspace will work with any new folder of the user's choosing.

Many users have unusual problems with SDK when using different directory structures and names. If you encounter any odd behaviors with SDK, it is advised to use the suggested directory structure and names.

Start SDK

Start -> All Programs -> Xilinx Design Tools -> Vivado 2013.4 -> SDK -> Xilinx SDK 2013.4

In the Workspace Launcher, browse to and select ZC702fft\zc702-zvik-base-trd-rdf0286\sw or the empty directory that you have created.

Click OK to continue.

If you are presented with a welcome tab, close it by clicking on the X on the tab.

SDK will start with a blank Project Explorer pane

Select File -> Import or right click on the white space in the Project Explorer pane and select Import.

The Import dialogue box will appear. Expand the General line and select Existing Projects into Workspace

Click Next

Click the Select archive file button. Then click Browse to navigate to the saved workspace file that you want to import and click Open. In our case this is Ne10TestBuild.zip.

Click Finish

SDK will build the workspace automatically. Because SDK is already started and the workspace is in place, you can skip the following instructions to start SDK and go directly to after SDK is running with the workspace in place.

Old Workspace in Place

If you have not already started SDK, do so by:

Start -> All Programs -> Xilinx Design Tools -> Vivado 2013.4 -> SDK -> Xilinx SDK 2013.4

In the Workspace Launcher, browse to and select the existing workspace as ZC702fft\zc702-zvik-base-trd-rdf0286\sw

Click OK to continue

When presented with the Welcome screen, click the X in the Welcome tab to close that screen

The workspace should appear as:

The FFT application is a new project within SDK so we need to create it.

Select File -> New -> Project

A new C project can also be created by right clicking in the white space of the Project Explorer pane and then selecting New -> C Project.

When the New Project dialogue box appears, expand the C/C++ line and select C Project

Click Next

The Project creation dialogue box opens. In the Project Name: box type fft-zynq to match the name of the source file.

Make sure the check box on the default location is checked.

In the Project type: box, select the "Xilinx ARM Linux Executable" type by clicking on it.

Click Finish.

We now see the new project in the Project Explorer.

We can now import the source code for the FFT application into SDK.

Right click on fft-zynq in the Project Explorer column and select Import.

The import window appears.

Be sure File System is highlighted (you may need to expand the General group), and click Next.

In the Import File System window, browse to the location where the C source file fft-zynq.c has been saved and select the file; check the box next to the source file.

Click Finish to add the source code to the project. The options can be left un-checked.

We now need to add the proper include files or paths to them that will be used in the build process.

Right click on fft-zynq in the Project Explorer column of SDK and then select "C/C++ Build Settings".

Expand the C/C++ General line in the left column and select the Paths and Symbols item.

Be sure the Configuration: option is set to [ All Configurations ] unless you have a specific reason to have the debug compiled differently than the release.

With the "Includes" tab selected, click Add (in the right column).

Check both the option boxes for "Add to all languages" and "Is a workspace path" and click the workspace button on the right side of the dialog box.

In the Folder Selection box, expand the Ne10-master item and select common.

Click OK to return to the Add directory path dialog box.

Click OK to add this path.

Repeat the process to add the following paths:
- Ne10-master / inc
- Ne10-master / modules / math
- Ne10-master / test / include

When completed you should have the following include directories set.

We now need to add some library paths for tools to find all of the required components.

In the left column of the Properties for fft-zynq window, expand C/C++ Build options, and select Settings. In the left portion of the right side of the window, under ARM Linux gcc linker, select Libraries.

At the top of the Libraries (-l) area, click on the add icon (looks like a sheet of paper with a plus sign on it).

In the text entry box, type Ne10 and click OK. Then do the same steps for Linux libraries "m" and "rt".

In the Library search path pane, click the add icon (the paper with the plus sign as before) to get a pop up window:

Click Workspace and then select Ne10/Release.

Click OK.

You should now have the Properties for fft-zynq completed as follows:

Click Apply, then click OK.

We can now build the project.

To assure the fastest execution times, we want to use the Release default build settings. To enable these,

Right click on fft-zynq in the Project Explorer pane

Select Build Configurations > Set Active > Release

With the build set to use the Release options, we can now build the project.

In the Project Explorer pane, right click on the fft-zynq project and select Build Project. If Console is selected in the bottom middle pane, the progress will be displayed. The result should be a completed build although there will be some warnings.

In the Project Explorer pane, expand the fft-zynq project label and then expand the Binaries line under it. The fft-zynq.elf file that resulted from the build will be shown.

Expected Results

Testing the Application

With the application built, we can run it on the ZC702 and test that it operates as expected. Because the application is intended to be used in a larger system, testing at this point will be somewhat limited. See the Tech Tip Zynq FFT Signal Analyzer GUI Tech Tip describing integration into a graphics framework for demonstration purposes.

As noted in the Tech Tip "Zynq Ne10 Testing Tech Tip" there are multiple ways to load the file into the ZC702 and execute it. For the balance of this Tech Tip, Remote System Explorer (RSE) will be used to control the ZC702 and execute various tests to demonstrate that the fft-zynq program is operating as expected.

We will access the ZC702 over an Ethernet connection from the PC where SDK is running. The ZC702 has a default IP address of so be sure your computer can reach that sub-net. Connect your ZC702 to your computer or network with an Ethernet cable.

The ZC702 must be set to boot from the SD-MMC card supplied with it. This contains the base TRD Linux system required to run this application.

Set the boot select switches as shown below, then power on your ZC702.

With the ZC702 running, we will set up a Remote System Explorer connection to the ZC702 board to allow us to download and run this application.

Right click in the Project Explorer window, select New -> Other. In the pop up window, expand Remote System Explorer, highlight Connection, and click Next.

In the next window, select SSH Only, and click Next.

Use the IP address of the ZC702 ( as the Host name and Connection name. Fill in the description field if you wish then click Finish.

With the connection established, we can now run the fft-zynq application program.

Right click on fft-zynq in the Project Explorer pane, select Run As and then Run Configurations from the expanded list. The Run Configurations dialog box will appear.

If RSE has been used for running the Ne10 tests as described in previous Tech Tips in this series, an existing configuration under Remote ARM Linux Application for NE10-test will be shown. For testing the FFT application we will set up a new run configuration that manages the fft application program.

Click Remote ARM Linux Application in the left pane, then click on the New launch configuration icon (the piece of paper with the + in the upper right corner). A new Remote ARM Linux Application will be created.

As shown here, the check box for Select configuration using 'C/C++ Application' is checked indicating that the selection of either a Debug build or Release build will be selected in the next step.

Click on the Search Project button under the blank C/C++ Application: line and select the Release fft-zynq.elf from the Qualifier box at the bottom of the dialog box.

Click OK.

We now need to specify where the file will be loaded onto the remotely connected Zc702.

Click the Browse button adjacent to the Remote Absolute File Path for C/C++ Application: entry box.

If a RSE connection to the ZC702 has been properly established, the connection drop down list will contain an item for Select that item from the list if it is there. Then expand Root and expand the file system and select tmp.

If prompted for the login information, the username and password are root.

If you are presented with a login screen, use root as the username and password.

If prior connections using the same computer and ZC702 have been made, SDK remembers these and does not require the subsequent login process.

Click OK.

Append /fft-zynq.elf to /tmp in the Remote Absolute File Path for C/C++ Application: entry box.

Click Apply and then Run.

The fft-zynq.elf binary executable file will be downloaded to the ZC702 and executed. The console window will show that it has run successfully.

To further exercise the fft-zynq program using the command line arguments while still using RSE there are at least two ways to proceed; adding command line arguments and using a remote terminal perspective.

Adding Command line arguments:

As just previously done, right click on fft-zynq in the Project Explorer pane, scroll down to Run As and in the expanded list click on Run Configurations...

The Run Configurations dialog box will appear with all of the information for running fft-zynq already in place. Double check that it matches and then click on the Arguments tab at the top of the right pane of the dialog box.

The fft-zynq program includes a simple test pattern generator that is invoked with the -g option. This option inserts a single impulse in the input table. Because the input table contains a real and imaginary component for each sample, odd N values will set the real portion of the complex input value while even N values will set the imaginary portion. For example, the option -g 1 will produce a single impulse in the real value for sample 0 (zero MHz frequency), effectively representing a DC value. The FFT is then executed showing the same value for the first 16 output samples confirming this effectively DC value. The time required to process the FFT, excluding overhead for setup, etc. is displayed after the last value of the output table is displayed.

Click inside the Program arguments box and type -g 1 as the first test to run.

Then click Run.

The program will run and the Console window will show the results. To see the full output listing it is convenient to expand the console to occupy the full SDK window. To do this, click the Maximize icon on the upper right corner of the window pane border just above the console display.

Here we see a display of the first 16 samples of the input data table used by the FFT calculation, the first 16 entries in the output table and the time required for calculation of the FFT.

The default is to have the FFT run entirely on the ARM processor. To force execution on the NEON SIMD extension, use the -a 1 option.

Minimize the console display by clicking on the Restore icon on the upper right corner of the window pane border. Then repeat the process of right clicking on fft-zynq in the Project Explorer pane, selecting Run As -> run Configurations.... and then selecting the Arguments tab of the Run Configurations dialog box.

Click in the Program arguments: box and add -a 1 after the existing -g 1 argument, being sure to leave a space between the arguments. Then click Run.

The console now shows the same FFT calculation but executed on the NEON SIMD extension. Note that the execution time has been reduced from 7853 us (your mileage may vary slightly...) to 1751 us. Differences in the reported times for your operation, or between runs with the same command line arguments are due to the inaccuracies in the Linux based execution time routine being used, Linux interruptions that may happen during execution and other factors beyond the scope of this Tech Tip.

This process of adding command line arguments can be used to further explore the results of different sizes of FFT, etc. Note that this program does not currently support the hardware FFT option (-a 2 command line argument).

Remote Terminal Perspective:

An alternate way to explore running the fft-zynq program with different command line arguments is through a remote terminal perspective. This enable you to enter commands directly into the Linux system running on the ZC702 as opposed to having SDK controlling the execution as is the case with the previous method.

At the top of the SDK window, click on Window, scroll to Open Perspective and then select Other... from the list.

Click on Remote System Explorer and then click OK.

SDK replaces the Project Explorer pane with the Remote System pane and shows the Remote System details pane.

In the Remote System pane, expand the entry, right click Ssh Terminals and click Launch Terminal. A new terminal will be displayed with a Linux prompt from the ZC702.

For best viewing the FFT results, Maximize the Terminal window and then enter the desired commands.

The terminal window starts in the root directory. To run the fft-zynq program, recall that it was downloaded to the tmp directory. To run it, change to the tmp directory (cd /tmp) and then run it directly using ./fft-zynq.elf with the desired options. For example:

./fft-zynq.elf -g 5 -a 1

The previous runs with -g 1 forced the real component of sample 0 to be 1. This is a DC value and the displayed output table results all have a real component of 1, indicating that the input is DC. In this instance with -g 5, the real component of sample 2 is set to 1 which is an impulse function having a range of different harmonic frequencies. This is seen clearly in the table display above as values in all of the first 16 entries in the output table.

Defaults and other options

The defaults that are used by fft-zynq are the following:
- FFT size is 4096 samples
- FFT type is Complex
- Input type is assumed to be Integer 16 (the -g option uses FLOAT)
- Output type is Floating Point 32
- Source table address, output table address and the control register area are in OCM of the PS
- ARM only is the assumed processing method
- The processing is a one time (non-pipelined)

The FFT size can be set to any power of 4 beginning at 16; 16, 64, 256, 1024 and 4096.


Using the -g option and various FFT sizes with execution on the ARM processor alone and then with the NEON SIMD engine provides a measure of the speed up possible using the NEON SIMD engine. The table below shows average execution times for 5 runs and the relative speed up achieved. Note that there is some variability in the execution times reported by Linux so user results will likely vary from these.

Size / execution unit
ARM Processor
NEON SIMD engine
Average Speed up

Execution time for the 4096 FFT is disproportionately longer than the increase in FFT size would typically indicate. While the number of raw calculations increases linearly with each increase in FFT size, the move from 1024 to 4096 introduces another variable; the size of L2 cache and the impact of cache misses. Based on 4096 samples and 32 bit complex numbers, the input data table alone would fill L2 cache. While the output table is smaller, the combination exceeds L2 cache and will cause some number of cache misses. This will increase the overall execution time more so than the linear move from 1024 to 4096 would indicate.

The ease of integration of a standard library of functions targeting the ARM processor and the SIMD engine can be seen. With multiple methods to control the ZC702, the ease of development and debug with the Xilinx SDK are demonstrated. These all contribute to faster time to system completion and time to market for complex compute intensive applications targeting the Zynbq-7000 AP SoC.

Saving the workspace

For ease of completing subsequent Tech Tips that use this completed build, it is wise to save the workspace so it can be restored later as a known starting point. Because only the /sw portion of the base TRD was modified, only that portion needs to be saved. If you choose to do this,

Select File -> Export or right click in the Project Explorer pane white space and select Export

In the Export dialogue box, expand the General line and select Archive file.

Click Next

The Export Archive dialogue will appear.

Click the Browse button and navigate to where you want to save the workspace and then create an appropriate file in which to save the workspace. In our case we are saving this as fftApp.zip

Be sure the "Save in zip format" box is selected unless you are on a Linux system in which case you might select the tar format.

Click Select All to export all of the items in the workspace.

Then click Finish.

The workspace will be saved in the specified archive file for later use.

© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy