Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip

Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip

 

Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip

 

Document History

Table of Contents

Document History

Description/Summary

Implementation

Step by Step Instructions

Obtaining the Library and Setting Up the Workspace

Building the Library

Saving the workspace

Conclusions:

Date

Version

Author

Description of Revisions

30 Aug 2013

1.0

Faster Technology

Initial posting

23 Feb 2014

1.1

Faster Technology

Update to 2013.4

6 Aug 2014

1.2

Faster Technology

Update to recent Ne10 Library

Date

Author

Comment








Description/Summary


Many systems that can take advantage of the processing capabilities of the Zynq-7000 AP SOC involve complex calculations used in filtering, video manipulation, and signal processing in general. To demonstrate the performance gains by optimizing complex calculations for the Neon SIMD engine included in the Zynq Processing System (PS) of the Zynq family of devices versus running it on the base ARM 9 processor requires finding or creating a set of appropriate functions of sufficient complexity. This Tech Tip describes the process of obtaining and building a set of filtering functions targeting the Zynq-7000 ZC702 demonstration platform. In subsequent Tech Tips, the download and building of this library will be verified and then used to build an application.

An open source project within the ARM community is in place to provide a "library of common useful functions accelerated by NEON that applications developers could just pick up and use". This has evolved into the Ne10 Project and the Ne10 library. Information on the project can be found at
http://www.projectne10.org/.


The library contains the following functions:

Math Functions


Vector and Matrix algebra functions


Vector Add

Vector Add for Float data - 2 2D, 3D, 4D vectors


Matrix Add

Matrix Add for Float data - 2X2 to 4X4


Vector Sub

Vector Sub for Float data - 2 2D, 3D, 4D vectors


Vector RSBC

Vector sub from a constant - 2D, 3D, 4D vectors


Matrix Sub

Matrix Sub for Float data - 2X2 to 4X4


Vector Multiply

Vector Multiply by constant or other Vector


Vector Multiply - Accumulator

Vector Multiply / Accumulate for Float data


Matrix Multiply

Matrix Multiply for Float data - 2X2 to 4X4


Matrix Vector Multiply

Multiply Matrix by Vector X2, X3 or X4


Vector Div

Vector Divide by constant or other Vector


Matrix Div

Matrix Divide for Float data - 2X2 to 4X4


Vector Setc

Sets Vector to a constant or Constant Vector


Vector Len

Length of 2D, 3D and 4D vectors


Vector Normalize

Normalizes array of Vectors - 2D, 3D or 4D


Vector Abs

Absolute value of Input Vector(s)


Vector Dot

Performs Dot product of two vectors - 2D, 3D, 4D


Vector Cross

Cross product of two vectors


Matrix Determinant

Determinant for 2X2, 3X3 or 4X4 Matrix


Matrix Invertible

Find Invertible of Matrix for 2X2, 3X3 or 4X4


Matrix Transpose

Find transpose of Matrix for 2X2, 3X3 or 4X4


Matrix Identity

Find Identity Matrix for 2X2, 3X3 or 4X4

Signal Processing




Complex FFT

CFFT/CIFFT for Radix 4 binary lengths from 4 to 32768


FIR Filters

Finite Impulse Response Filter on Float data


FIR Decimator

Optimized FIR and Decimator function


FIR Interpolator

FIR and Interpolator using Polyphase structure


FIR Lattice Filters

FIR Lattice using feedforward structure


FIR Sparse Filters

Sparse FIR for simulating reflections, etc.


IIR Lattice Filters

IIR Filter with feedforward and feedback


Real FFT

Real FFT and Real IFFT for Float data

Image Processing




Hresize

Interpolation of horizontal data


Vresize

Interpolation of vertical data


Img_rotate

Rotate image by 90 degrees


Box Filter


Physics

AABB

Compute AABB for a Polygon

Sample Functions


Code samples for calling NEON, etc.




Implementation

Implementation Details

Design Type

PS Only

SW Type

OSL Linux

CPUs

1 CPU - standard ZC702 Frequency

PS Features

ARM Processor and NEON SIMD engine

PL Cores

None

Boards/Tools

ZC702

Xilinx Tools Version

Vivado / SDK 2013.4

Other Details

No other cables or connectivity required



Step by Step Instructions


Obtaining the Library and Setting Up the Workspace

From the link to the ProjectNe10 site mentioned previously, the library of functions discussed on the site can be found at

https://github.com/projectNe10/Ne10

Using your favorite web browser we review the library before downloading it.



If we look inside the modules/dsp folders we can see that there are FFT, IIR and FIR filter routines with both standard C and Neon specific implementations as well as a test suite provided in the /modules/dsp/test folder. Based on the filtering functions listed, this looks like a good starting point for our subsequent application development so we will download it.
Click on the "Download ZIP" button on the right side to download the zip of the full archive as shown below. For Linux users a tarball is available at https://github.com/projectNe10/Ne10/tarball/master




To keep this Library development project separate, we created a new project off the root of our G: drive at G:/ZC702_Ne10.:




We can now unzip the downloaded library into our new project folder



NOTE: The library name now includes a designation that can be used to determine the revision of the library. The zip file of the library has the revision embedded in the file name as shown in the list above. This version is v1.1.2-5.

Documentation for the library is embedded in the structure of the library using a process supported by the doxygen tools (http://www.stack.nl/~dimitri/doxygen/). A README.txt file at the root of the library has basic instructions on using doxygen to extract the full documentation set from the library. The doxygen tools must be installed to use the embedded documentation.



In a subsequent Tech Tip we will be building a signal processing system using the FFT functions so this is an appropriate library. Many other uses are possible so we will build the complete library as opposed to just the subsequently required FFT functions.

Before proceeding, we need to make a slight modification to the library so it will build properly within the SDK environment.

Open the dsp folder in the modules folder to reveal the contents. Note the three source files highlighted in the listing below.



These files have the same base file name as their .c counterparts. The .s versions are hand coded assembly routines specific to the NEON SIMD engine in the ARM processor system in the Zynq-7000 AP SoC device. In the build environment assumed by the Ne10 project, the c files (.c extension) and assembly files (.s extension) are processed separately and forced to remain separate throughout the process. In SDK, there is no readily available means to accomplish this separation. As a result, depending on the order of processing within the build, the assembly versions may not get handled properly resulting in loss of the NEON specific assembly language routines and the performance gains they would enable.

To circumvent this potential issue, we simply add ".asm" to the .s files to force them to be handled separately. Simply edit the file names so they appear as shown below.



Since we are targeting the ZC702 demonstration platform as our test platform, we need a base Linux on which to build this library and the test program. The Xilinx support pages have several Targeted Reference Design versions available. Because of regular updates to the Xilinx product and support pages, the following screens that illustrate the location and download of the TRD may change slightly. These are included as an example of the general process, including the license process that is required for all design files.

http://www.xilinx.com/support/index.html/content/xilinx/en/supportNav/boards_and_kits.html



NOTE:
We need to use the 14.5 version of the Targeted Reference Design as this is the last version that supports OSL Linux on the ZC702. Expand the 14.5 line (shown above) and click on Targeted Reference Designs. When presented with the results it should look like the following: