Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3

Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3

 

Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3

 

Table of Contents

Document History

Description/Summary

Implementation

Step by Step Instructions

Obtaining the Library and Setting up the Workspace

Base TRD

Building Ne10 Library

Saving the workspace

Conclusions:

Document History

Date

Version

Author

Description of Revisions

23 October 2014

1.0

Faster Technology

Initial posting - updated to 2014.3





Date

Author

Comment








Description/Summary


Many systems that can take advantage of the processing capabilities of the Zynq-7000 AP SOC involve complex calculations used in filtering, video manipulation, and signal processing in general. To demonstrate the performance gains by optimizing complex calculations for the Neon SIMD engine included in the Zynq Processing System (PS) of the Zynq family of devices versus running it on the base ARM 9 processor requires finding or creating a set of appropriate functions of sufficient complexity. This Tech Tip describes the process of obtaining and building a set of filtering functions targeting the Zynq-7000 ZC702 demonstration platform. In subsequent Tech Tips, the download and building of this library will be verified and then used to build an application.



An open source project within the ARM community is in place to provide a "library of common useful functions accelerated by NEON that applications developers could just pick up and use". This has evolved into the Ne10 Project and the Ne10 library. Information on the project can be found at

http://www.projectne10.org/.

The library contains the following functions:

Math Functions


Vector and Matrix algebra functions


Vector Add

Vector Add for Float data - 2 2D, 3D, 4D vectors


Matrix Add

Matrix Add for Float data - 2X2 to 4X4


Vector Sub

Vector Sub for Float data - 2 2D, 3D, 4D vectors


Vector RSBC

Vector sub from a constant - 2D, 3D, 4D vectors


Matrix Sub

Matrix Sub for Float data - 2X2 to 4X4


Vector Multiply

Vector Multiply by constant or other Vector


Vector Multiply - Accumulator

Vector Multiply / Accumulate for Float data


Matrix Multiply

Matrix Multiply for Float data - 2X2 to 4X4


Matrix Vector Multiply

Multiply Matrix by Vector X2, X3 or X4


Vector Div

Vector Divide by constant or other Vector


Matrix Div

Matrix Divide for Float data - 2X2 to 4X4


Vector Setc

Sets Vector to a constant or Constant Vector


Vector Len

Length of 2D, 3D and 4D vectors


Vector Normalize

Normalizes array of Vectors - 2D, 3D or 4D


Vector Abs

Absolute value of Input Vector(s)


Vector Dot

Performs Dot product of two vectors - 2D, 3D, 4D


Vector Cross

Cross product of two vectors


Matrix Determinant

Determinant for 2X2, 3X3 or 4X4 Matrix


Matrix Invertible

Find Invertible of Matrix for 2X2, 3X3 or 4X4


Matrix Transpose

Find transpose of Matrix for 2X2, 3X3 or 4X4


Matrix Identity

Find Identity Matrix for 2X2, 3X3 or 4X4

Signal Processing




Complex FFT

CFFT/CIFFT for Radix 4 binary lengths from 4 to 32768


FIR Filters

Finite Impulse Response Filter on Float data


FIR Decimator

Optimized FIR and Decimator function


FIR Interpolator

FIR and Interpolator using Polyphase structure


FIR Lattice Filters

FIR Lattice using feedforward structure


FIR Sparse Filters

Sparse FIR for simulating reflections, etc.


IIR Lattice Filters

IIR Filter with feedforward and feedback


Real FFT

Real FFT and Real IFFT for Float data

Image Processing




Hresize

Interpolation of horizontal data


Vresize

Interpolation of vertical data


Img_rotate

Rotate image by 90 degrees


Box Filter


Physics

AABB

Compute AABB for a Polygon

Sample Functions


Code samples for calling NEON, etc.



Implementation

Implementation Details

Design Type

PS Only

SW Type

PetaLinux Linux

CPUs

1 CPU - standard ZC702 Frequency

PS Features

ARM Processor and NEON SIMD enginer

PL Cores

None

Boards/Tools

ZC702 with standard peripherals connected for base TRD demonstration

Xilinx Tools Version

Vivado / SDK 2014.3

Other Details

No other cables or connectivity required




Step by Step Instructions


Obtaining the Library and Setting up the Workspace


From the link to the ProjectNe10 site mentioned previously, the library of functions discussed on the site can be found at

https://github.com/projectNe10/Ne10


Using your favorite web browser we review the library before downloading it.




If we look inside the modules/dsp folders we can see that there are FFT, IIR and FIR filter routines with both standard C and Neon specific implementations as well as a test suite provided in the /modules/dsp/test folder. Based on the filtering functions listed, this looks like a good starting point for our subsequent application development so we will download it.

Click on the "Download ZIP" button on the right side to download the zip of the full archive as shown below.

CAUTION:

Be sure to select the link from the page shown above. If you are on the main page for the Ne10 project, be sure to select the link that is associated with the word "zip" in the line:
You can download this project in either zip or tar formats . If the text link to use git for the download is selected, problems may arise later in this Tech Tip.

For Linux users a tarball is available at https://github.com/projectNe10/Ne10/tarball/master



To keep this Library development project separate, we created a new project off the root of our G: drive at G:/ZC702_Ne10.:




We can now unzip the downloaded library into our new project folder




NOTE: The library name now includes a designation that can be used to determine the revision of the library. The zip file of the library has the revision embedded in the file name as shown in the list above. This version is v1.1.2-5. If you do not have the version with the embedded version number, go back to the download process and be sure that you have chosen the proper link to the zip file.

Documentation for the library is embedded in the structure of the library using a process supported by the doxygen tools (http://www.stack.nl/~dimitri/doxygen/). A README.txt file at the root of the library has basic instructions on using doxygen to extract the full documentation set from the library. The doxygen tools must be installed to use the embedded documentation



In a subsequent Tech Tip we will be building a signal processing system using the FFT functions so this is an appropriate library. Many other uses are possible so we will build the complete library as opposed to just the subsequently required FFT functions.

Before proceeding, we need to make a slight modification to the library so it will build properly within the SDK environment.

Open the dsp folder in the modules folder to reveal the contents. Note the three source files highlighted in the listing below.



These files have the same base file name as their .c counterparts. The .s versions are hand coded assembly routines specific to the NEON SIMD engine in the ARM processor system in the Zynq-7000 AP SoC device. In the build environment assumed by the Ne10 project, the c files (.c extension) and assembly files (.s extension) are processed separately and forced to remain separate throughout the process. In SDK, there is no readily available means to accomplish this separation. As a result, depending on the order of processing within the build, the assembly versions may not get handled properly resulting in loss of the NEON specific assembly language routines and the performance gains they would enable.

To circumvent this potential issue, we simply add ".asm" to the .s files to force them to be handled separately. Simply edit the file names so they appear as shown below.



Base TRD


This Tech Tip and subsequent Tech Tips in the series depend on a specific version of the TRD running on the ZC702. Specifically the PetaLinux based 2014.2 version.

The specific contents of the TRD are not required at this time. However, the user MUST have downloaded and verified operation of the ZC702 with the 2014.2 version of the TRD. The balance of the Tech Tips in this series assume that the ZC702 is running the 2014.2 version from the SD card. Instructions on how to download the TRD, build the proper SD card image and verify proper operation with the ZC702 can be found in the Technical Article Zynq Base TRD 2014.2

© 2025 Advanced Micro Devices, Inc. Privacy Policy