Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3
Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3
Table of Contents
Obtaining the Library and Setting up the Workspace
Document History
Date | Version | Author | Description of Revisions | |
23 October 2014 | 1.0 | Faster Technology | Initial posting - updated to 2014.3 | |
|
|
|
| |
Date | Author | Comment | ||
|---|---|---|---|---|
| ||||
|
|
| ||
|
|
| ||
Description/Summary
Many systems that can take advantage of the processing capabilities of the Zynq-7000 AP SOC involve complex calculations used in filtering, video manipulation, and signal processing in general. To demonstrate the performance gains by optimizing complex calculations for the Neon SIMD engine included in the Zynq Processing System (PS) of the Zynq family of devices versus running it on the base ARM 9 processor requires finding or creating a set of appropriate functions of sufficient complexity. This Tech Tip describes the process of obtaining and building a set of filtering functions targeting the Zynq-7000 ZC702 demonstration platform. In subsequent Tech Tips, the download and building of this library will be verified and then used to build an application.
An open source project within the ARM community is in place to provide a "library of common useful functions accelerated by NEON that applications developers could just pick up and use". This has evolved into the Ne10 Project and the Ne10 library. Information on the project can be found at
http://www.projectne10.org/.
The library contains the following functions:
Math Functions |
| Vector and Matrix algebra functions |
| Vector Add | Vector Add for Float data - 2 2D, 3D, 4D vectors |
| Matrix Add | Matrix Add for Float data - 2X2 to 4X4 |
| Vector Sub | Vector Sub for Float data - 2 2D, 3D, 4D vectors |
| Vector RSBC | Vector sub from a constant - 2D, 3D, 4D vectors |
| Matrix Sub | Matrix Sub for Float data - 2X2 to 4X4 |
| Vector Multiply | Vector Multiply by constant or other Vector |
| Vector Multiply - Accumulator | Vector Multiply / Accumulate for Float data |
| Matrix Multiply | Matrix Multiply for Float data - 2X2 to 4X4 |
| Matrix Vector Multiply | Multiply Matrix by Vector X2, X3 or X4 |
| Vector Div | Vector Divide by constant or other Vector |
| Matrix Div | Matrix Divide for Float data - 2X2 to 4X4 |
| Vector Setc | Sets Vector to a constant or Constant Vector |
| Vector Len | Length of 2D, 3D and 4D vectors |
| Vector Normalize | Normalizes array of Vectors - 2D, 3D or 4D |
| Vector Abs | Absolute value of Input Vector(s) |
| Vector Dot | Performs Dot product of two vectors - 2D, 3D, 4D |
| Vector Cross | Cross product of two vectors |
| Matrix Determinant | Determinant for 2X2, 3X3 or 4X4 Matrix |
| Matrix Invertible | Find Invertible of Matrix for 2X2, 3X3 or 4X4 |
| Matrix Transpose | Find transpose of Matrix for 2X2, 3X3 or 4X4 |
| Matrix Identity | Find Identity Matrix for 2X2, 3X3 or 4X4 |
Signal Processing |
|
|
| Complex FFT | CFFT/CIFFT for Radix 4 binary lengths from 4 to 32768 |
| FIR Filters | Finite Impulse Response Filter on Float data |
| FIR Decimator | Optimized FIR and Decimator function |
| FIR Interpolator | FIR and Interpolator using Polyphase structure |
| FIR Lattice Filters | FIR Lattice using feedforward structure |
| FIR Sparse Filters | Sparse FIR for simulating reflections, etc. |
| IIR Lattice Filters | IIR Filter with feedforward and feedback |
| Real FFT | Real FFT and Real IFFT for Float data |
Image Processing |
|
|
| Hresize | Interpolation of horizontal data |
| Vresize | Interpolation of vertical data |
| Img_rotate | Rotate image by 90 degrees |
| Box Filter |
|
Physics | AABB | Compute AABB for a Polygon |
Sample Functions |
| Code samples for calling NEON, etc. |
Implementation
Implementation Details | |
Design Type | PS Only |
SW Type | PetaLinux Linux |
CPUs | 1 CPU - standard ZC702 Frequency |
PS Features | ARM Processor and NEON SIMD enginer |
PL Cores | None |
Boards/Tools | ZC702 with standard peripherals connected for base TRD demonstration |
Xilinx Tools Version | Vivado / SDK 2014.3 |
Other Details | No other cables or connectivity required |
Step by Step Instructions
Obtaining the Library and Setting up the Workspace
From the link to the ProjectNe10 site mentioned previously, the library of functions discussed on the site can be found at
https://github.com/projectNe10/Ne10
Using your favorite web browser we review the library before downloading it.
If we look inside the modules/dsp folders we can see that there are FFT, IIR and FIR filter routines with both standard C and Neon specific implementations as well as a test suite provided in the /modules/dsp/test folder. Based on the filtering functions listed, this looks like a good starting point for our subsequent application development so we will download it.
Click on the "Download ZIP" button on the right side to download the zip of the full archive as shown below.
CAUTION:
Be sure to select the link from the page shown above. If you are on the main page for the Ne10 project, be sure to select the link that is associated with the word "zip" in the line:
You can download this project in either zip or tar formats . If the text link to use git for the download is selected, problems may arise later in this Tech Tip.
For Linux users a tarball is available at https://github.com/projectNe10/Ne10/tarball/master
To keep this Library development project separate, we created a new project off the root of our G: drive at G:/ZC702_Ne10.:
We can now unzip the downloaded library into our new project folder
NOTE: The library name now includes a designation that can be used to determine the revision of the library. The zip file of the library has the revision embedded in the file name as shown in the list above. This version is v1.1.2-5. If you do not have the version with the embedded version number, go back to the download process and be sure that you have chosen the proper link to the zip file.
Documentation for the library is embedded in the structure of the library using a process supported by the doxygen tools (http://www.stack.nl/~dimitri/doxygen/). A README.txt file at the root of the library has basic instructions on using doxygen to extract the full documentation set from the library. The doxygen tools must be installed to use the embedded documentation
In a subsequent Tech Tip we will be building a signal processing system using the FFT functions so this is an appropriate library. Many other uses are possible so we will build the complete library as opposed to just the subsequently required FFT functions.
Before proceeding, we need to make a slight modification to the library so it will build properly within the SDK environment.
Open the dsp folder in the modules folder to reveal the contents. Note the three source files highlighted in the listing below.
These files have the same base file name as their .c counterparts. The .s versions are hand coded assembly routines specific to the NEON SIMD engine in the ARM processor system in the Zynq-7000 AP SoC device. In the build environment assumed by the Ne10 project, the c files (.c extension) and assembly files (.s extension) are processed separately and forced to remain separate throughout the process. In SDK, there is no readily available means to accomplish this separation. As a result, depending on the order of processing within the build, the assembly versions may not get handled properly resulting in loss of the NEON specific assembly language routines and the performance gains they would enable.
To circumvent this potential issue, we simply add ".asm" to the .s files to force them to be handled separately. Simply edit the file names so they appear as shown below.
Base TRD
This Tech Tip and subsequent Tech Tips in the series depend on a specific version of the TRD running on the ZC702. Specifically the PetaLinux based 2014.2 version.
The specific contents of the TRD are not required at this time. However, the user MUST have downloaded and verified operation of the ZC702 with the 2014.2 version of the TRD. The balance of the Tech Tips in this series assume that the ZC702 is running the 2014.2 version from the SD card. Instructions on how to download the TRD, build the proper SD card image and verify proper operation with the ZC702 can be found in the Technical Article Zynq Base TRD 2014.2
© 2025 Advanced Micro Devices, Inc. Privacy Policy