Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip

Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip

 

Zynq-7000 AP SoC Spectrum Analyzer part 4 - Accelerating Software - Building and Running an FFT Tech Tip

 

Table of Contents

Document History

Description/Summary

Implementation

Block Diagram

Step by Step Instructions

Modification of the Ne10 Library

Building the application

Expected Results

Testing the Application

Adding Command line arguments:

Remote Terminal Perspective:

Defaults and other options

Conclusions

Saving the workspace

Document History

Date

Version

Author

Description of Revisions

September 15, 2013

1

Faster Technology

Initial posting

February 28, 2014

1.1

Faster Technology

Update to 2013.4 release

Date

Author

Comment




Description/Summary



Virtually all electronic systems today contain some signal processing as part of their fundamental capabilities. The Zynq-7000 AP SoC is ideally suited to handling many of these functions in a single chip solution as will be demonstrated in this Tech Tip.

In Tech Tip "Zynq Ne10 Library Tech Tip" a library of complex filtering functions was obtained and built. This Tech Tip describes a signal processing application that uses the NE10 library built in that prior Tech Tip. The application documented in this Tech TIp performs a complex FFT on a sampled input signal executing on either the ARM processor alone or on the NEON SIMD engine. The application is constructed so it can be used stand alone from the command line or integrated into a larger program. A subsequent Tech Tip will demonstrate how to integrate this Tech Tip into a larger graphical user system.

In addition to demonstrating a speed up of 1.25 to 1.85 when using the NEON SIMD engine vs the ARM processor alone, this Tech Tip will show how to use a standard library of functions in an application and modification of that library for a specific need. All of these are facilitated by the standard implementation of the ARM processor system (PS) in the Zynq-7000 AP AoC, opening up the vast ecosystem of available software to the Xilinx development community. We will also see the power of the debug capabilities in the Xilinx SDK.

Implementation

Implementation Details

Design Type

PS Only

SW Type

Linux

CPUs

1 CPU - standard ZC702 Frequency

PS Features

ARM processor and NEON SIMD engine

PL Cores

None

Boards/Tools

ZC702

Xilinx Tools Version

Vivado / SDK 2013.4

Other Details

Standard ZC702 setup for console terminal and Ethernet required



Files Provided

fft-zynq.c

FFT Application C source code

Ne10TestBuild.zip

Tested starting point workspace for SDK

Block Diagram



Step by Step Instructions



A library of signal processing functions was built in the Tech Tip "Zynq Ne10 Library Tech Tip" and tested in the Tech Tip "Zynq Ne10 Testing Tech Tip". This Tech Tip is built starting with that compiled and tested library.

The application performs a Fast Fourier Transform (FFT) on sampled data from an input waveform. The input data is in a table in the processor memory space and the spectrum output of the FFT is also in a table in processor memory space. A register interface is used for controlling various parameters of the FFT process to facilitate integration of the FFT application with other software. An example of this integration is described in a subsequent Tech Tip "Zynq-7000 AP SoC <name> Tech Tip. The register values are also available in the command line version controlled by the following options:
-v --version Print program version
-h --help Print help message
-s --size Size of FFT
-t --type Type of FFT, real or complex
-i --input Input data type, int or float for 16 bit integers, or 32 bit floats
-o --output Output data type
-r --source Physical address of input data
-d --dest Physical address of output results
-a --arch Processor architecture, 0 = ARM, 1 = NEON, or 2 = CORE
-p --pipeline This is part of a continuous processing pipeline, and not a one time FFT
-g --debug Generate an impulse test pattern at location N-1 of the input table
-l --loop execution of the FFT for N iterations - for timing purposes only

Input and output buffer sizes are calculated from the FFT size, whether the FFT is real or complex and fixed or floating point.
The -a option in the application selects between using just the ARM processor for the computations or using the NEON SIMD engine. This enables the user to see the difference in execution time between these two software approaches. In a subsequent Tech Tip a hardware unit will be added in Programmable Logic to show the performance difference between the two software approaches and execution of the FFT in hardware. The Conclusions section at the end of this Tech Tip contains a simple table of execution time differences between the two software only approaches.

For this Tech Tip, the input source has additional code used in subsequent Tech Tips:
- code to enable using in a continuous sampling and processing system - pipeline mode
- an option to lock the FFT to a specific CPU in the PS (used in a demonstration system)


Modification of the Ne10 Library


The Ne10 Library previously built and tested uses the following process to calculate the Complex FFT:

Input real and imaginary data:

x(n) = xa + j * ya
x(n+N/4 ) = xb + j * yb
x(n+N/2 ) = xc + j * yc
x(n+3N 4) = xd + j * yd
where N is length of FFT

Output real and imaginary data:
X(4r) = xa'+ j * ya'
X(4r+1) = xb'+ j * yb'
X(4r+2) = xc'+ j * yc'
X(4r+3) = xd'+ j * yd'

Twiddle factors for radix-4 FFT:

Wn = co1 + j * (- si1)
W2n = co2 + j * (- si2)
W3n = co3 + j * (- si3

Output from Radix-4 CFFT Results in Digit reversal order. Interchange middle two branches of every butterfly results in Bit reversed output.

Butterfly CFFT equations:

xa' = xa + xb + xc + xd
ya' = ya + yb + yc + yd
xc' = (xa+yb-xc-yd)* co1 + (ya-xb-yc+xd)* (si1)
yc' = (ya-xb-yc+xd)* co1 - (xa+yb-xc-yd)* (si1)
xb' = (xa-xb+xc-xd)* co2 + (ya-yb+yc-yd)* (si2)
yb' = (ya-yb+yc-yd)* co2 - (xa-xb+xc-xd)* (si2)
xd' = (xa-yb-xc+yd)* co3 + (ya+xb-yc-xd)* (si3)
yd' = (ya+xb-yc-xd)* co3 - (xa-yb-xc+yd)* (si3)

The "twiddle factors" in the Ne10 library are hard coded for the FFT sizes 16, 64, 256 and 1024. To expand the size to support 4096 size FFT, code was included in the application to calculate the twiddle factors and then bypass the hard coded tables in the original library. Line 117 in the fft-zynq.c source file is the start of this additional routine. The specifics of the calculations as well as the algorithm used for the CFFT in the Ne10 library are beyond the scope of this Tech Tip. A wealth of information is available on the web such as http://en.wikipedia.org/wiki/Fast_Fourier_transform.


Building the application


Download the C source file for this Tech Tip - "fft-zynq.c" and save it to a convenient location on your computer system. Note where it is saved. In our case, we saved it to G:\Projects.

This Tech Tip uses the workspace that resulted from building and testing the Ne10 library in the Tech Tips "Zynq Ne10 Library Tech Tip" and "Zynq Ne10 Testing Tech Tip". If that workspace is available, skip to the instructions below to start SDK.

If the workspace is not available, or if there is a question if it was completed properly, the referenced file "Ne10TestBuild.zip" can be used to create a known working starting point for this Tech Tip.

Download the Zip file from the Ne10TestBuild.zip link.

Create an empty directory where you will be implementing this Tech Tip. To be consistent with the balance of these step by step instructions, the directory could be:

G:\ZC702fft\zc702-zvik-base-trd-ref0286\sw

However, these steps to import a known workspace will work with any new folder of the user's choosing.

CAUTION:
Many users have unusual problems with SDK when using different directory structures and names. If you encounter any odd behaviors with SDK, it is advised to use the suggested directory structure and names.

Start SDK

Start -> All Programs -> Xilinx Design Tools -> Vivado 2013.4 -> SDK -> Xilinx SDK 2013.4

In the Workspace Launcher, browse to and select ZC702fft\zc702-zvik-base-trd-rdf0286\sw or the empty directory that you have created.



Click OK to continue.

If you are presented with a welcome tab, close it by clicking on the X on the tab.



SDK will start with a blank Project Explorer pane

Select File -> Import or right click on the white space in the Project Explorer pane and select Import.

The Import dialogue box will appear. Expand the General line and select Existing Projects into Workspace



Click Next

Click the Select archive file button. Then click Browse to navigate to the saved workspace file that you want to import and click Open. In our case this is Ne10TestBuild.zip.



Click Finish

SDK will build the workspace automatically. Because SDK is already started and the workspace is in place, you can skip the following instructions to start SDK and go directly to after SDK is running with the workspace in place.

Old Workspace in Place

If you have not already started SDK, do so by:

Start -> All Programs -> Xilinx Design Tools -> Vivado 2013.4 -> SDK -> Xilinx SDK 2013.4

In the Workspace Launcher, browse to and select the existing workspace as ZC702fft\zc702-zvik-base-trd-rdf0286\sw



Click OK to continue

When presented with the Welcome screen, click the X in the Welcome tab to close that screen



The workspace should appear as:



The FFT application is a new project within SDK so we need to create it.

Select File -> New -> Project

TIP:
A new C project can also be created by right clicking in the white space of the Project Explorer pane and then selecting New -> C Project.

When the New Project dialogue box appears, expand the C/C++ line and select C Project



Click Next

The Project creation dialogue box opens. In the Project Name: box type fft-zynq to match the name of the source file.

Make sure the check box on the default location is checked.

In the Project type: box, select the "Xilinx ARM Linux Executable" type by clicking on it.



Click Finish.

We now see the new project in the Project Explorer.



We can now import the source code for the FFT application into SDK.

Right click on fft-zynq in the Project Explorer column and select Import.

The import window appears.



Be sure File System is highlighted (you may need to expand the General group), and click Next.

In the Import File System window, browse to the location where the C source file fft-zynq.c has been saved and select the file; check the box next to the source file.



Click Finish to add the source code to the project. The options can be left un-checked.

We now need to add the proper include files or paths to them that will be used in the build process.

Right click on fft-zynq in the Project Explorer column of SDK and then select "C/C++ Build Settings".

Expand the C/C++ General line in the left column and select the Paths and Symbols item.



Be sure the Configuration: option is set to [ All Configurations ] unless you have a specific reason to have the debug compiled differently than the release.

With the "Includes" tab selected, click Add (in the right column).



Check both the option boxes for "Add to all languages" and "Is a workspace path" and click the workspace button on the right side of the dialog box.

In the Folder Selection box, expand the Ne10-master item and select common.



Click OK to return to the Add directory path dialog box.



Click OK to add this path.

Repeat the process to add the following paths:
- Ne10-master / inc
- Ne10-master / modules / math
- Ne10-master / test / include

When completed you should have the following include directories set.



We now need to add some library paths for tools to find all of the required components.

In the left column of the Properties for fft-zynq window, expand C/C++ Build options, and select Settings. In the left portion of the right side of the window, under ARM Linux gcc linker, select Libraries.

At the top of the Libraries (-l) area, click on the add icon (looks like a sheet of paper with a plus sign on it).



In the text entry box, type Ne10 and click OK. Then do the same steps for Linux libraries "m" and "rt".

In the Library search path pane, click the add icon (the paper with the plus sign as before) to get a pop up window:

Click Workspace and then select Ne10/Release.




Click OK.

You should now have the Properties for fft-zynq completed as follows:



Click Apply, then click OK.

We can now build the project.

To assure the fastest execution times, we want to use the Release default build settings. To enable these,

Right click on fft-zynq in the Project Explorer pane

Select Build Configurations > Set Active > Release

With the build set to use the Release options, we can now build the project.

In the Project Explorer pane, right click on the fft-zynq project and select Build Project. If Console is selected in the bottom middle pane, the progress will be displayed. The result should be a completed build although there will be some warnings.



In the Project Explorer pane, expand the fft-zynq project label and then expand the Binaries line under it. The fft-zynq.elf file that resulted from the build will be shown.

Expected Results


Testing the Application


With the application built, we can run it on the ZC702 and test that it operates as expected. Because the application is intended to be used in a larger system, testing at this point will be somewhat limited. See the Tech Tip Zynq FFT Signal Analyzer GUI Tech Tip describing integration into a graphics framework for demonstration purposes.

As noted in the Tech Tip "Zynq Ne10 Testing Tech Tip" there are multiple ways to load the file into the ZC702 and execute it. For the balance of this Tech Tip, Remote System Explorer (RSE) will be used to control the ZC702 and execute various tests to demonstrate that the fft-zynq program is operating as expected.

We will access the ZC702 over an Ethernet connection from the PC where SDK is running. The ZC702 has a default IP address of 192.168.1.10 so be sure your computer can reach that sub-net. Connect your ZC702 to your computer or network with an Ethernet cable.

The ZC702 must be set to boot from the SD-MMC card supplied with it. This contains the base TRD Linux system required to run this application.

Set the boot select switches as shown below, then power on your ZC702.



With the ZC702 running, we will set up a Remote System Explorer connection to the ZC702 board to allow us to download and run this application.

Right click in the Project Explorer window, select New -> Other. In the pop up window, expand Remote System Explorer, highlight Connection, and click Next.



In the next window, select SSH Only, and click Next.



Use the IP address of the ZC702 (192.168.1.10) as the Host name and Connection name. Fill in the description field if you wish then click Finish.



With the connection established, we can now run the fft-zynq application program.

Right click on fft-zynq in the Project Explorer pane, select Run As and then Run Configurations from the expanded list. The Run Configurations dialog box will appear. NOTE: If RSE has been used for running the Ne10 tests as described in previous Tech Tips in this series, an existing configuration under Remote ARM Linux Application for NE10-test will be shown. For testing the FFT application we will set up a new run configuration that manages the fft application program. Click Remote ARM Linux Application in the left pane, then click on the New launch configuration icon (the piece of paper with the + in the upper right corner). A new Remote ARM Linux Application will be created.