Zynq UltraScale+ MPSoC ZCU106 VCU Multi-Stream ROI TRD using Avnet Quad Sensor 2021.1

This page provides all the information related to VCU Multi-Stream ROI TRD using Avnet Quad Sensor design for ZCU106.

Table of Contents

1 Overview

The primary goal of this VCU Multi-Stream ROI TRD using Avnet Quad Sensor design is to demonstrate the use of Deep learning Processor Unit (DPU) block for extracting the Region of Interest (ROI) from input video frames and to use this information to perform ROI based encoding using Video Codec Unit (VCU) encoder hard block present in Zynq UltraScale+ EV devices. Video captures from the quad sensor connected through MIPI CSI-2 Rx which is implemented in the PL. The Avnet Multi-Camera FMC module is used to capture four video streams through a MIPI CSI-2 interface.

The design will serve as a platform to accelerate Deep Neural Network inference algorithms using DPU and demonstrate the ROI feature of VCU encoder. The design uses a Deep Convolutional Neural Network (CNN) named Densebox, running on DPU to extract ROI Information (e.g. ‘face’ in this case). The design will also use Vitis Video Analytics SDK (VVAS) framework to take leverage of its rich set of highly optimized and ready to use - Kernels and GStreamer plugins.

The Design will use Vivado IPI flow for building the hardware platform and Xilinx Yocto Petalinux flow for software design. It will use Xilinx IP and Software driver to demonstrate the capabilities of different components.

The Vitis platform will be created from the Vivado/PetaLinux build artifacts, and then using the Vitis acceleration flow will be used to insert the DPU into the platform to create the final bitstream.

The following figure shows streaming pipeline use-case with enhanced ROI + face detection model on ZCU106. For a detailed view of VVAS block, please refer to Section 1.3.3 - GStreamer Pipeline Flow.

Streaming: Face detection with enhanced ROI on ZCU106.

This ZCU106 VCU Multi Stream ROI TRD design supports only encoding feature of VCU. For decoding on ZCU106 Board-2 setup, needs to use VCU HDMI Single Stream ROI TRD design.

1.1 System Architecture

The following figure shows the block diagram of the ROI design

1.2 Hardware Architecture

This section gives a detailed description of the blocks used in the hardware design. The functional block diagram of the design is shown in the below figure.

There are Five primary Sections in the design.

  • MIPI Capture Pipeline:

    • Captures video frame buffers from Capture source (Avnet Quad Sensor ) at 1080p30 resolution

    • The AXI Switch sends the Captured videos to Multiple streams in Round-Robin method

    •  Each stream writes the frame buffers into DDR Memory with Frame Buffer Write IP

  • Multi-Scaler Block:

    • Reads the Video Buffers from DDR Memory only for first two sensors

    • Scales down the buffer to the 640x360 size (suitable for DPU)

    • Converts the format from NV12 to BGR

    • Writes the downscaled buffer to DDR Memory

  • DPU Block: 

    • Reads the downscaled buffers from DDR Memory only for first two sensors

    • Runs the Densebox algorithm to generate the ROI information for each frame buffer

    • Passes the ROI information to VCU Encoder

  • VCU Encoder: 

    • Reads the 4 x 1080p30 NV12 Buffer from DDR Memory

    • Receives the ROI metadata from DPU IP only from first two sensors.

    • Encodes the video buffers based on the ROI Information for first two sensors

    • Encodes the video buffers for other two sensors

    • Finally writes the encoded stream to DDR Memory

  • PS GEM:

    • Reads the Encoder stream from DDR Memory

    • Stream-out the encoded stream via Ethernet

This design supports the following video interfaces:

Sources

  • MIPI capture pipeline implemented in the PL

Sinks

  • Stream-out on network or internet

VCU Codec

  • Video Encoder capability using VCU hard block in PL 

  • H.264/H.265 encoding

  • Encoder parameter configuration using OMX interface

DPU

Zynq DPU IP

Streaming Interfaces

1G Ethernet PS GEM

Video Format

NV12

Supported Resolution

4 x 1080P30

1.3 VCU ROI Software

1.3.1 Vitis Video Analytics SDK (VVAS)

VVAS is being developed to provide easy to use and scalable framework using which users will be able to build their solutions on Xilinx FPGA. VVAS provides infrastructure that will be covering a wide variety of applications in Embedded, Vision, Datacenter, Machine Learning, Automotive and many other domains.

VVAS Provides a set of generic framework plugins that abstracts the complexities of writing GStreamer plugin. These framework plugins interacts with the kernel libraries through a simple VVAS kernel interface. Using this VVAS Kernel interface, user can easily integrate and test their kernels in GStreamer framework.

VVAS also provides a rich set of highly optimized, ready to use, Kernels and GStreamer plugins, like video encoder, video decoder, multiscalar, ML, bounding box cropping etc. so that users can create his/her applications in very short span of time.

VVAS will also provide the infrastructure needed to bridge the gap between Edge and Cloud solutions.

In VCU Multi Stream ROI TRD, which is using Avnet Quad Sensor design, we’ve used the following VVAS plugins:

  • ivas_xfilter: The ivas_xfilter efficiently works with hard-kernel/soft-kernel/software (user-space) acceleration software library types. It can operate in Passthrough/in-place/transform mode. In Multi-Stream ROI TRD using Avnet Quad Sensor design, it was used as in-place mode so that the acceleration software library can alter the input buffer.

  • ivas_xmetaaffixer: It is used to scale the incoming metadata information for the different resolutions, where metadata received on the master sink pad is scaled in relation to the resolution output slave pads.

  • ivas_xroigen: This plug-in generates ROI metadata information, which is expected by GStreamer OMX encoder plug-ins to encode raw frames with the desired quality parameters (QP) values/ level for specified ROIs.

Refer to VVAS document for more detail on VVAS.

VVAS Top-level Block diagram

1.3.2 Deep Learning Processor Unit (DPU)

DPU is a programmable engine optimized for deep neural networks. It is a group of parameterizable IP cores pre-implemented on the hardware with no place and route required. The DPU is released with the Vitis AI specialized instruction set, allowing efficient implementation of many deep learning networks.

Refer to DPU IP PG338 and UG1354 to know more details on DPU.

The following figure shows the DPU Top-Level Block Diagram.

DPU Top-level Block Diagram

PE - Processing Engine, DPU - Deep Learning Processor Unit, APU - Application Processing Unit

The DPU IP can be implemented in the programmable logic (PL) of the selected Zynq® UltraScale+™ MPSoC device with direct connections to the processing system (PS). The DPU requires instructions to implement a neural network and accessible memory locations for input images as well as temporary and output data. A program running on the Application Processing Unit (APU) is also required to service interrupts and coordinate data transfers.

The following figure shows the sequence of operations performed on the DPU device.

The following sequence of steps are performed to access and run face detection using the DPU device:

  1. DPU device is initialized

  2. Instantiate a DPU Task from DPU Kernel and allocate corresponding DPU memory buffer

  3. Set the input image to created DPU task

  4. Run the DPU task to find the faces from the input image

  5. DPU device is uninitialized

1.3.3 GStreamer Pipelines Flow

The GStreamer plugin demonstrates the DPU capabilities with Xilinx VCU encoder’s ROI (Region of Interest) feature. The plugin will detect ROI (i.e. face co-ordinates) from input frames using DPU IP and pass the detected ROI information to the Xilinx VCU encoder. The following figure shows the data flow for GStreamer pipeline of stream-out use case.

Block Diagram of Stream-out Pipeline

fd = v4l2 frame data, fd' = DPU compatible frame data

As shown in the above figure, the stream-out GStreamer pipeline performs the below list of operations:

  1. Sensors capture the stream through camera and pass it to the FMC module which further pass it to the MIPI CSI-2 Rx interface on ZCU106 Board.

  2. MIPI CSI-2 Rx interface will capture the data in NV12 format and pass it to the tee element, which will split the input stream to metaaffixer and preprocessor elements.

  3. Preprocessors (v4l2convert GStreamer plugin) will scale-down the input frame resolution to 640x360 and convert the data into BGR format as per the input requirement of DPU.

  4. 360p BGR frame will provide to DPU IP (via xfilter plugin) as an input to find ROI(i.e. face co-ordinates).

  5. Extracted ROI information will be passed to VVAS Metaaffixer plug-in along with the original capture stream (via tee), which will embed ROI metadata with the original stream.

  6. ROI Generator will generate ROI SEI events in the stream based on ROI metadata and it is given to VCU encoder, which will encode the input data by encoding ROI regions with high quality as compared to non-ROI region using received ROI information.

  7. Stream-out the encoded data using RTP protocol.

The following figure shows the data flow for the GStreamer pipeline of stream-in use case.

Block Diagram of Stream-in Pipeline

As shown in the above figure, the stream-in GStreamer pipeline performs the below list of operations:

  1. Stream-in the encoded data using RTP protocol

  2. The Xilinx VCU decoder will decode the data

  3. Display the decoded data on HDMI-Tx display

1.4 Software Tools and System Requirements

Hardware

Required:

  • Two ZCU106 evaluation board rev 1.0 with power cable

  • Monitor with HDMI input supporting 3840x2160 resolution or 1920x1080 resolution (e.g. LG 27UD88, Samsung LU28ES90DS/XL)

  • HDMI 2.0 certified cable

  • Class-10 SD card

  • Avnet Multi-Camera FMC module

  • Ethernet cable

Optional:

  • USB pen drive formatted with the FAT32 file system and hub

  • SATA drive formatted with the FAT32 file system, external power supply, and data cable

Software Tools

Required:

Download, Installation, and Licensing of Vivado Design Suite 2021.1

The Vivado Design Suite User Guide explains how to download and install the Vivado® Design Suite tools, which include the Vivado Integrated Design Environment (IDE), High-Level Synthesis tool, and System Generator for DSP. This guide also provides information about licensing and administering evaluation and full copies of Xilinx design tools and intellectual property (IP) products. The Vivado Design Suite can be downloaded from here.

LogiCORE IP Licensing

The following IP cores require a license to build the design.

  • Video Processing Subsystem (VPSS) - Included with Vivado - PG231

  • MIPI CSI Controller Subsystems (mipi_csi2_rx_subsystem) - Purchase license (Hardware evaluation available) - PG232

To obtain the LogiCORE IP license, please visit the respective IP product page and get the license.

The below table provides the performance information:

Resolution

FPS Achieved

Resolution

FPS Achieved

4 x 1080p30

30

1.5 Board Setup

The below section will provide the information on the ZCU106 board setup for running ROI design.

  1. Connect the Micro USB cable into the ZCU106 Board Micro USB port J83, and the other end into an open USB port on the host PC. This cable is used for UART over USB communication.

  2. Insert the SD card with the images copied into the SD card slot J100. Please find here how to prepare the SD card for a specific design.

  3. Set the SW6 switches as shown in the below Figure. This configures the boot settings to boot from SD.

  4. Connect 12V Power to the ZCU106 6-Pin Molex connector

  5. For a USB storage device, connect the USB hub along with the mouse. (Optional)

  6. For SATA storage device, connect SATA data cable to SATA 3.0 port. (Optional)

  7. For MIPI CSI-2, Insert the Avnet Multi-Camera FMC module into the FMC0 connector and set VADJ to 1.2V

  8. Set up a terminal session between a PC COM port and the serial port on the evaluation board (See the Determine which COM to use to access the USB serial port on the ZCU106 board for more details).

  9. Copy the VCU Multi Stream ROI TRD images into the SD card and insert the SD card on the board

  10. The below images will show how to connect interfaces on the ZCU106 board

1.6 Run Flow

The VCU Multi Stream ROI TRD package is released with the source code, Vivado project, Petalinux BSP, and SD card image that enables the user to run the demonstration. It also includes the binaries necessary to configure and boot the ZCU106 board. Prior to running the steps mentioned in this wiki page, download the VCU Multi Stream ROI TRD package and extract its contents to a directory referred to as $TRD_HOME which is the home directory.

Refer below link to download the VCU Multi Stream ROI TRD package.

  • Zynq UltraScale+ MPSoC VCU Multi Stream ROI TRD 2021.1 Download zip

TRD package contents are placed in the following directory structure.

rdf0617-zcu106-vcu-multi-stream-roi-2021-1/ ├── apu │   └── vcu_petalinux_bsp │   └── xilinx-vcu-multi-stream-roi-zcu106-v2021.1-final.bsp ├── dpu │   ├── 0001-Added-ZCU106-configuration-to-support-DPU-in-ZCU106.patch │   ├── dpu_conf.vh │   └── vitis_platform │   └── zcu106_dpu ├── image │   ├── bootfiles │   │   ├── bl31.elf │   │   ├── linux.bif │   │   ├── pmufw.elf │   │   ├── system.bit │   │   ├── system.dtb │   │   ├── u-boot.elf │   │   └── zynqmp_fsbl.elf │   ├── license_zcu106_multistream_roi_trd_dpu_xclbin.txt │   ├── README.txt │   ├── sd_card │   │   ├── boot │   │   └── root │   └── sd_card.img ├── pl │   ├── constrs │   │   ├── quad_mipi_rx_ROI.xdc │   │   └── quad_sensor_async.xdc │   ├── designs │   │   └── zcu106_Quad_Sensor_ROI │   ├── prebuild │   │   └── zcu106_Quad_Sensor_ROI_wrapper.xsa │   ├── README.md │   └── srcs │   └── hdl ├── README.txt └── zcu106_vcu_multistream_roi_trd_sources_and_licenses.tar.gz 17 directories, 19 files

The below snippet shows the directory structure of various binary files placed in the $TRD_HOME/image/sd_card/boot directory.

├── image └──sd_card └──boot ├── autostart.sh ├── bd.hwh ├── BOOT.BIN ├── boot.scr ├── dpu.xclbin   ├── Image   ├── quad_sensor_isp_tuning.sh ├── quad_sensor_media_graph_setting.sh ├── setup.sh ├── system.dtb ├── vcu │   └── configure_qos.sh ├── vitis │   └── densebox_640_360-zcu102_zcu104_kv260-r1.4.0.tar.gz └── vvas └── json └── kernel_ML.json

1.6.1 Preparing the SD card

There are three ways to prepare the SD card for booting. Each method is detailed below.

Using ready to test image

Using Pre-built images

  • To Create SD Card with two partitions: Boot(FAT32+Bootable) and Root(EXT4) Refer this Link.

  • Copy boot content fromrdf0617-zcu106-vcu-multi-stream-roi-2021-1/image/sd_card/boot to Boot partition in SD Card

  • Extract rootfs.ext4 from rdf0617-zcu106-vcu-multi-stream-roi-2021-1/image/sd_card/rootto Root partition in SD Card using

  • Boot the board with Flashed SD Card

Use the Output of the Build Flow

  • To Create SD Card with two partitions: Boot(FAT32+Bootable) and Root(EXT4) Refer this Link.

  • For Build Flow refer this steps and copy mentioned generated dpu build images bd.hwh BOOT.BIN boot.scr dpu.xclbin Image system.dtb into BOOT partition of the SD card and extract generated rootfs.ext4 into ROOT partition of SD Card

  • Copy the mentioned boot content vcu, vitis, vvas, autostart.sh, setup.sh from rdf0617-zcu106-vcu-multi-stream-roi-2021-1/image/sd_card/boot/ directory to Boot partition in SD Card

  • Boot the board with Flashed SD Card

1.6.2 GStreamer Pipelines using mediasrcbin plugin

This section covers the GSreamer pipelines using mediasrcbin plugin for serial and streaming ROI use-cases. This mediasrcbin plugin is Xilinx specific plugin which is a bin element on top of v4l2src. It parses and configures the media graph of a media device automatically.

Stream-out ( Server ):
→ v4l2convert → ivas_xfilter (DPU) →
Capture (Sensor-1) → tee -| |- ivas_xmetaaffixer → ivas_xroigen → Encode → Stream-out
→→→→→→→→→→→→→→→→→→→

→ v4l2convert → ivas_xfilter (DPU) →
Capture (Sensor-2) → tee -| |- ivas_xmetaaffixer → ivas_xroigen → Encode → Stream-out
→→→→→→→→→→→→→→→→→→→

Capture (Sensor-3) → Encode → Stream-out

Capture (Sensor-4) → Encode → Stream-out

  • Set IP address for server:

    ifconfig eth0 192.168.25.90
  • Run the following gst-launch-1.0 command for stream-out pipeline

    • Stream-out Pipeline

Here 192.168.25.89 is host/client IP address and 5004, 5008, 5012 & 5016 are port no.

Stream-in ( Client ): Stream-in→ Decode → Display

  • Set IP address for the client:

  • Run the following gst-launch-1.0 command for stream-in pipeline where 5004 is port number

    • Stream-in Pipeline

1.6.3 GStreamer Pipelines using v4l2src plugin

This section covers the GStreamer pipelines using v4l2src plugin for serial and streaming ROI use-cases.

  • Make sure MIPI CSI-2 Rx media pipeline is configured for 1080p resolution and source/sink have the same color format. Run below script to set resolution and format of MIPI CSI-2 Rx media pipeline nodes where "media0" indicates media node for MIPI CSI-2 Rx input source.

Stream-out ( Server ):
→ v4l2convert → ivas_xfilter (DPU) →
Capture (Sensor-1) → tee -| |- ivas_xmetaaffixer → ivas_xroigen → Encode → Stream-out
→→→→→→→→→→→→→→→→→→→

→ v4l2convert → ivas_xfilter (DPU) →
Capture (Sensor-2) → tee -| |- ivas_xmetaaffixer → ivas_xroigen → Encode → Stream-out
→→→→→→→→→→→→→→→→→→→

Capture (Sensor-3) → Encode → Stream-out

Capture (Sensor-4) → Encode → Stream-out

  • Set IP address for server:

  • Run the following gst-launch-1.0 command for stream-out pipeline

    • Stream-out Pipeline

Here 192.168.25.89 is host/client IP address and 5004, 5008, 5012 & 5016 are port no.

Stream-in ( Client ): Stream-in→ Decode → Display

  • Set IP address for the client:

  • Run the following gst-launch-1.0 command for stream-in pipeline where 5004 is port number

    • Stream-in Pipeline

1.7 Build Flow

Refer below link to download the VCU Multi Stream ROI TRD package.

  • Zynq UltraScale+ MPSoC VCU Multi Stream ROI TRD 2021.1 Download zip

Unzip the released package.

The following tutorials assume that the $TRD_HOME environment variable is set as given below.

1.7.1 Hardware Build Flow

This section explains the steps to build the hardware platform and generate XSA using the Vivado tool.

Refer to the Vivado Design Suite User Guide: Using the Vivado IDE, UG893, for setting up the Vivado environment.

Refer to the vivado-release-notes-install-license(UG973) for installation.

Make sure that the necessary IP licenses are in place

On Linux:

  • Open a Linux terminal

  • Change directory to $TRD_HOME/pl folder

  • Source Vivado settings.sh

  • Run the following command to create the Vivado IPI project and invoke the GUI and generate XSA required for the platform

  • The project.tcl script does the following

    • Creates project in the ../pl/build/zcu106_Quad_Sensor_ROI directory

    • Creates IPI Block design with platform interfaces

    • Runs Synthesis and Implementation

    • Builds bitstream with no accelerators

    • Export the HW to XSA (zcu106_Quad_Sensor_ROI_wrapper.xsa)

  • zcu106_Quad_Sensor_ROI_wrapper.xsa is stored at location $TRD_HOME/pl/build/zcu106_Quad_Sensor_ROI/zcu106_Quad_Sensor_ROI.xsa/

  • This XSA is used by Petalinux for platform creation and also by the Vitis Tool for DPU Kernel Integration.

After executing the script, the Vivado IPI block design comes up as shown in the below figure.

 

 

The Platform Setup tab has the settings and AXI Ports, as shown in below image

1.7.1.1 Platform Interfaces

The screenshots below show the platform interfaces that have been made available to the Vitis tool for linking the acceleration IP dynamically

In the case of this reference design, the DPU Kernel will be inserted.

 

After the DPU Kernel is integrated dynamically with the platform using Vitis Flow, the connections are as shown below

  • The DPU Data ports are connected to the HP0 Port(S_AXI_HP0_FPD) of PS .

  • The DPU Instruction port is connected to the S_AXI_HPC1 port of PS

  • The DPU S_AXI_Control port is connected to the M_AXI_HPM0_LPD port of PS through interconnect_hpm0_lpd

  • The DPU interrupt is connected to the axi interrupt controller dynamically

1.7.2 Petalinux build Flow

This tutorial shows how to build the Linux image and boot image using the PetaLinux build tool.

PetaLinux Installation: Refer to the PetaLinux Tools Documentation (UG1144) for installation.

Kernel patches Documentation: Refer this article for Kernel patches required for ZCU106 VCU Multi-Stream ROI TRD using Avnet Quad Sensor BSP.

  • Source Petalinux settings.sh

  • Create PetaLinux project

  • Configure the PetaLinux project

    • For e.g.

    • using the prebuild XSA

    • using the XSA