Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip 2014.3
Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip 2014.3
Table of Contents
Modification of the Ne10 Library
Adding Command line arguments:
Document History
Date | Version | Author | Description of Revisions |
23 October 2014 | 1 | Faster Technology | Initial posting - updated to 2014.3 |
|
|
|
|
Date | Author | Comment | |
|---|---|---|---|
| |||
|
|
| |
|
|
| |
Description/Summary
Virtually all electronic systems today contain some signal processing as part of their fundamental capabilities. The Zynq-7000 AP SoC is ideally suited to handling many of these functions in a single chip solution as will be demonstrated in this Tech Tip.
In Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3" a library of complex filtering functions was obtained and built. This Tech Tip describes a signal processing application that uses the NE10 library built in that prior Tech Tip. The application documented in this Tech TIp performs a complex FFT on a sampled input signal executing on either the ARM processor alone or on the NEON SIMD engine. The application is constructed so it can be used stand alone from the command line or integrated into a larger program. The subsequent Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 7 - Building and Running a Qt Based GUI Tech Tip 2014.3" will demonstrate how to integrate this Tech Tip into a larger graphical user system.
In addition to demonstrating a speed up of 1.25 to 1.85 when using the NEON SIMD engine vs the ARM processor alone, this Tech Tip will show how to use a standard library of functions in an application and modification of that library for a specific need. All of these are facilitated by the standard implementation of the ARM processor system (PS) in the Zynq-7000 AP AoC, opening up the vast ecosystem of available software to the Xilinx development community. We will also see the power of the debug capabilities in the Xilinx SDK.
Describe what you are doing or what is the issue and how this example addresses it.
Implementation
Implementation Details | |
Design Type | PS Only |
SW Type | Linux |
CPUs | 1 CPU - standard ZC702 Frequency |
PS Features | ARM processor and NEON SIMD engine; standard peripherals used by PetaLinux operating system |
PL Cores | None |
Boards/Tools | ZC702 with standard peripherals used for Linux TRD operation and control |
Xilinx Tools Version | Vivado / SDK 2014.3 |
Other Details | Standard ZC702 setup for console terminal and Ethernet required |
Files Provided | |
FFT Application C source code | |
Tested starting point workspace for SDK | |
Block Diagram
Step by Step Instructions
A library of signal processing functions was built in the Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3" and tested in the Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 3 Accelerating Software - Running ARM Library Tests Tech Tip 2014.3". This Tech Tip is built starting with that compiled and tested library.
The application performs a Fast Fourier Transform (FFT) on sampled data from an input waveform. The input data is in a table in the processor memory space and the spectrum output of the FFT is also in a table in processor memory space. A register interface is used for controlling various parameters of the FFT process to facilitate integration of the FFT application with other software. An example of this integration is described in a subsequent Tech Tip "Zynq-7000 AP SoC <name> Tech Tip. The register values are also available in the command line version controlled by the following options:
-v --version Print program version
-h --help Print help message
-s --size Size of FFT
-t --type Type of FFT, real or complex
-i --input Input data type, int or float for 16 bit integers, or 32 bit floats
-o --output Output data type
-r --source Physical address of input data
-d --dest Physical address of output results
-a --arch Processor architecture, 0 = ARM, 1 = NEON, or 2 = CORE
-p --pipeline This is part of a continuous processing pipeline, and not a one time FFT
-g --debug Generate an impulse test pattern at location N-1 of the input table
-l --loop execution of the FFT for N iterations - for timing purposes only
Input and output buffer sizes are calculated from the FFT size, whether the FFT is real or complex and fixed or floating point.
The -a option in the application selects between using just the ARM processor for the computations or using the NEON SIMD engine. This enables the user to see the difference in execution time between these two software approaches. In a subsequent Tech Tip a hardware unit will be added in Programmable Logic to show the performance difference between the two software approaches and execution of the FFT in hardware. The Conclusions section at the end of this Tech Tip contains a simple table of execution time differences between the two software only approaches.
For this Tech Tip, the input source has additional code used in subsequent Tech Tips:
- code to enable using in a continuous sampling and processing system - pipeline mode
- an option to lock the FFT to a specific CPU in the PS (used in a demonstration system)
- code to use PL hardware to perform the FFT
Modification of the Ne10 Library
The Ne10 Library previously built and tested uses the following process to calculate the Complex FFT:
Input real and imaginary data:
x(n) = xa + j * ya
x(n+N/4 ) = xb + j * yb
x(n+N/2 ) = xc + j * yc
x(n+3N 4) = xd + j * yd
where N is length of FFT
Output real and imaginary data:
X(4r) = xa'+ j * ya'
X(4r+1) = xb'+ j * yb'
X(4r+2) = xc'+ j * yc'
X(4r+3) = xd'+ j * yd'
Twiddle factors for radix-4 FFT:
Wn = co1 + j * (- si1)
W2n = co2 + j * (- si2)
W3n = co3 + j * (- si3
Output from Radix-4 CFFT Results in Digit reversal order. Interchange middle two branches of every butterfly results in Bit reversed output.
Butterfly CFFT equations:
xa' = xa + xb + xc + xd
ya' = ya + yb + yc + yd
xc' = (xa+yb-xc-yd)* co1 + (ya-xb-yc+xd)* (si1)
yc' = (ya-xb-yc+xd)* co1 - (xa+yb-xc-yd)* (si1)
xb' = (xa-xb+xc-xd)* co2 + (ya-yb+yc-yd)* (si2)
yb' = (ya-yb+yc-yd)* co2 - (xa-xb+xc-xd)* (si2)
xd' = (xa-yb-xc+yd)* co3 + (ya+xb-yc-xd)* (si3)
yd' = (ya+xb-yc-xd)* co3 - (xa-yb-xc+yd)* (si3)
The "twiddle factors" in the Ne10 library are hard coded for the FFT sizes 16, 64, 256 and 1024. To expand the size to support 4096 size FFT, code was included in the application to calculate the twiddle factors and then bypass the hard coded tables in the original library. Line 117 in the fft-zynq.c source file is the start of this additional routine. The specifics of the calculations as well as the algorithm used for the CFFT in the Ne10 library are beyond the scope of this Tech Tip. A wealth of information is available on the web such as http://en.wikipedia.org/wiki/Fast_Fourier_transform.
Building the application
Download the C source file for this Tech Tip - "fft-zynq.c" and save it to a convenient location on your computer system. Note where it is saved. In our case, we saved it to G:\Projects.
This Tech Tip uses the workspace that resulted from building and testing the Ne10 library in the Tech Tips "Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3" and "Zynq-7000 AP SoC Spectrum Analyzer part 3 - Accelerating Software - Running ARM Library Tests Tech Tip 2014.3". If that workspace is available, skip to the instructions below to start SDK.
If the workspace is not available, or if there is a question if it was completed properly, the referenced file "Ne10TestBuild.zip" can be used to create a known working starting point for this Tech Tip.
Download the Zip file from the Ne10TestBuild2014dt3.zip link.
Create an empty directory where you will be implementing this Tech Tip. To be consistent with the balance of these step by step instructions, the directory could be:
G:\Projects\ZC702_Ne10
However, these steps to import a known workspace will work with any new folder of the user's choosing.
CAUTION:
Many users have unusual problems with SDK when using different directory structures and names. If you encounter any odd behaviors with SDK, it is advised to use the suggested directory structure and names.
Start SDK
Start -> All Programs -> Xilinx Design Tools -> Vivado 2014.3 -> Xilinx SDK 2014.3
In the Workspace Launcher, browse to and select G:\Projects\ZC702_Ne10 or the empty directory that you have created.
Click OK to continue.
If you are presented with a welcome tab, close it by clicking on the X on the tab.
SDK will start with a blank Project Explorer pane
Select File -> Import or right click on the white space in the Project Explorer pane and select Import.
The Import dialogue box will appear. Expand the General line and select Existing Projects into Workspace
Click Next
Click the Select archive file button. Then click Browse to navigate to the saved workspace file that you want to import and click Open. In our case this is Ne10TestBuild2014dt3.zip.
Click Finish
SDK will build the workspace automatically. Because SDK is already started and the workspace is in place, you can skip the following instructions to start SDK and go directly to after SDK is running with the workspace in place.
Old Workspace in Place
If you have the workspace from the previous Tech Tip work and have not already started SDK, do so by:
Start -> All Programs -> Xilinx Design Tools -> Vivado 2014.3 -> Xilinx SDK 2014.3
In the Workspace Launcher, browse to and select the existing workspace as G:\Projects\ZC702_Ne10
Click OK to continue
When presented with the Welcome screen, click the X in the Welcome tab to close that screen
The workspace should appear as:
The FFT application is a new project within SDK so we need to create it.
Select File -> New -> Project
TIP:
A new C project can also be created by right clicking in the white space of the Project Explorer pane and then selecting New -> C Project.
When the New Project dialogue box appears, expand the C/C++ line and select C Project
Click Next
The Project creation dialogue box opens. In the Project Name: box type fft-zynq to match the name of the source file.
Make sure the check box on the default location is checked.
In the Project type: box, select the "Xilinx ARM Linux Executable" type by clicking on it.
Click Finish.
We now see the new project in the Project Explorer.
We can now import the source code for the FFT application into SDK.
Right click on fft-zynq in the Project Explorer column and select Import.
The import window appears.
Be sure File System is highlighted (you may need to expand the General group), and click Next.
In the Import File System window, browse to the location where the C source file fft-zynq.c has been saved and select the file; check the box next to the source file.
Click Finish to add the source code to the project. The options can be left un-checked.
This source file is set up to support both software execution of the fft application as well as execution in the PL hardware fabric. Use of the hardware fft is controlled by a #define statement. Expand the fft-zynq project and double click on the fft-zynq.c file. This will open it in an editor window in SDK.
Change the number 1 in line 28 to zero so that line reads:
#define USE_FFT_CORE 0
Save the modified source file by clicking File -> Save from the top menu line.
We now need to add the proper include files or paths to them that will be used in the build process.
Right click on fft-zynq in the Project Explorer column of SDK and then select "C/C++ Build Settings".
Expand the C/C++ General line in the left column and select the Paths and Symbols item.
Be sure the Configuration: option is set to [ All Configurations ] unless you have a specific reason to have the debug compiled differently than the release.
With the "Includes" tab selected, click Add (in the right column).
Check both the option boxes for "Add to all languages" and "Is a workspace path" and click the workspace button on the right side of the dialog box.
In the Folder Selection box, expand the Ne10-master item and select common.
Click OK to return to the Add directory path dialog box.
Click OK to add this path.
Repeat the process to add the following paths:
- Ne10-master / inc
- Ne10-master / modules / math
- Ne10-master / test / include
When completed you should have the following include directories set.
We now need to add some library paths for tools to find all of the required components.
In the left column of the Properties for fft-zynq window, expand C/C++ Build options, and select Settings. In the left portion of the right side of the window, under ARM Linux gcc linker, select Libraries.
At the top of the Libraries (-l) area, click on the add icon (looks like a sheet of paper with a plus sign on it).
In the text entry box, type Ne10 and click OK. Then do the same steps for Linux libraries "m" and "rt".
In the Library search path pane, click the add icon (the paper with the plus sign as before) to get a pop up window:
Click Workspace and then select Ne10/Release.
Click OK.
You should now have the Properties for fft-zynq completed as follows:
Click Apply, then click OK.
We can now build the project.
To assure the fastest execution times, we want to use the Release default build settings. To enable these,
Right click on fft-zynq in the Project Explorer pane
Select Build Configurations > Set Active > Release
With the build set to use the Release options, we can now build the project.
In the Project Explorer pane, right click on the fft-zynq project and select Build Project. If Console is selected in the bottom middle pane, the progress will be displayed. The result should be a completed build although there will be some warnings.
In the Project Explorer pane, expand the fft-zynq project label and then expand the Binaries line under it. The fft-zynq.elf file that resulted from the build will be shown. The same binary file will be shown in the Release folder under the fft-zynq project as well. This assures us that not only did it build, but that it was built with the Release compiler options in place.
Expected Results