Zynq UltraScale+ MPSoC ZCU106 VCU HDMI ROI TRD 2020.2
This page provides all the information related to VCU HDMI ROI TRD design for ZCU106.
Table of Contents
1 Overview
The primary goal of this VCU ROI design is to demonstrate the use of DPU (Deep learning Processor Unit) block for extracting the ROI (Region of Interest) from input video frames and to use this information to perform ROI based encoding using VCU (Video Codec Unit) encoder hard block present in Zynq UltraScale+ EV devices.
The design will serve as a platform to accelerate Deep Neural Network inference algorithms using DPU and demonstrate the ROI feature of VCU encoder. The design uses a Deep Convolutional Neural Network (CNN) named Densebox, running on DPU to extract ROI Information (e.g. ‘face’ in this case).
The Design will use Vivado IPI flow for building the hardware platform and Xilinx Yocto Petalinux flow for software design. It will use Xilinx IP and Software driver to demonstrate the capabilities of different components.
The Vitis platform will be created from the Vivado/PetaLInux build artifacts, and then using the Vitis acceleration flow will be used to insert the DPU into the platform to create the final bitstream.
The following figure shows one of the use cases (Serial pipeline) with face detection with enhanced ROI on ZCU106.
Serial: Face detection with enhanced ROI on ZCU106.
The following figure shows one of the use cases (streaming pipeline) with face detection with enhanced ROI on ZCU106.
Streaming: Face detection with enhanced ROI on ZCU106.
1.1 System Architecture
The following figure shows the block diagram of the ROI design
1.2 Hardware Architecture
This section gives a detailed description of the blocks used in the hardware design. The functional block diagram of the design is shown in the below figure.
There are seven primary Sections in the design.
HDMI Capture Pipeline:
Captures video frame buffers from Capture source in 4K Resolution, NV12 Format
Writes the buffers into DDR Memory with Frame Buffer Write IP
Multi-scaler Block:
Reads the Video Buffers from DDR Memory
Scales down the buffer to the 640x360 size (suitable for dpu)
Converts the format from NV12 to BGR
Writes the Down-scaled buffer to DDR Memory
DPU Block:
Reads the downscaled buffers from DDR Memory
Runs the Densebox algorithm to generate the ROI information for each frame buffer
Passes the ROI information to VCU Encoder
VCU Encoder:
Reads the 4K NV12 Buffer from DDR Memory
Receives the ROI metadata from DPU IP
Encodes the video buffers based on the ROI Information
Finally writes the encoded stream to DDR Memory
PS GEM:
Reads the Encoder stream from DDR Memory
Streams out the encoded stream via Ethernet
VCU Decoder:
Decodes the received encoded frame and writes to memory
HDMI-Tx:
Displays the decoded frames on HDMI Display
This design supports the following video interfaces:
Sources |
|
---|---|
Sinks |
|
VCU Codec |
|
DPU | |
Streaming Interfaces | 1G Ethernet PS GEM |
Video Format | NV12 |
Supported Resolution | 4Kp30 |
1.3 VCU ROI Software
1.3.1 GStreamer Pipeline
The GStreamer plugin demonstrates the DPU capabilities with Xilinx VCU encoder’s ROI(Region of Interest) feature. The plugin will detect ROI (i.e. face co-ordinates) from input frames using DPU IP and pass the detected ROI information to the Xilinx VCU encoder. The following figure shows the data flow for GStreamer pipeline of stream-out use case.
Block Diagram of Stream-out Pipeline
fd = v4l2 frame data, fd' = DPU compatible frame data
As shown in the above figure, the stream-out GStreamer pipeline performs the below list of operations:
v4l2src captures the data from HDMI-Rx in NV12 format and pass to xlnxroivideo1detect GStreamer plugin
xlnxroivideo1detect GStreamer plugin will scale down to 640x360 resolution and convert the data into BGR format
640x360 BGR frame will be provided to DPU IP as an input to find ROI (i.e. face co-ordinates)
Extracted ROI information will be passed to VCU encoder
The encoder will encode the input data by encoding ROI regions with high quality as compared to non-ROI region using received ROI information
Stream-out the encoded data using RTP protocol
The following figure shows the data flow for the GStreamer pipeline of stream-in use cases.
Block Diagram of Stream-in Pipeline
fd = Gst-Omx Frame data
As shown in the above figure, the stream-in GStreamer pipeline performs the below list of operations:
Stream-in the encoded data using RTP protocol
The Xilinx VCU decoder will decode the data
Display the decoded data on HDMI-Tx display
The below figure shows the xlnxroivideo1detect GStreamer plugin data flow.
As shown in the above figure, our xlnxroivideo1detect GStreamer plugin will perform below the list of operations:
DPU is initialized
DPU GStreamer plugin receives the data frame from HDMI-Rx through a v4l2src plugin
Create the DPU task
Scale the input frame to 640x360 resolution using Xilinx Scaler IP
Convert the input frame data format from NV12 to BGR format using Xilinx Color Space Converter(CSC) soft IP
Prepare the OpenCV image using BGR data
Pass the intermediate OpenCV image to the DPU
Run the DPU task
Extract the ROI(face) co-ordinates from the DPU output
Map the detected face co-ordinates to the original input frame resolution
Fill the ROI metadata buffer using extracted ROI (face) co-ordinates
Pass the ROI metadata buffer and input NV12 frame data buffer to the Xilinx VCU encoder
De-initialize the DPU task
1.3.2 DPU(Deep Learning Processor Unit)
DPU is a programmable engine optimized for deep neural networks. It is a group of parameterizable IP cores pre-implemented on the hardware with no place and route required. The DPU is released with the Vitis AI specialized instruction set, allowing efficient implementation of many deep learning networks.
Refer to DPU IP PG338 and UG1354 to know more details on DPU.
The following figure shows the DPU Top-Level Block Diagram.
DPU Top-level Block Diagram
The DPU IP can be implemented in the programmable logic (PL) of the selected Zynq® UltraScale+™ MPSoC device with direct connections to the processing system (PS). The DPU requires instructions to implement a neural network and accessible memory locations for input images as well as temporary and output data. A program running on the application processing unit (APU) is also required to service interrupts and coordinate data transfers.
The following figure shows the sequence of operations performed on the DPU device.
The following sequence of steps are performed to access and run face detection using the DPU device:
DPU device is initialized
Instantiate a DPU Task from DPU Kernel and allocate corresponding DPU memory buffer
Set the input image to created DPU task
Run the DPU task to find the faces from the input image
DPU device is uninitialized
1.4 Software Tools and System Requirements
Hardware
Required:
ZCU106 evaluation board (rev C/D/E/F/1.0) with power cable
Monitor with HDMI input supporting 3840x2160 resolution or 1920x1080 resolution
HDMI cable 2.0 certified
Class-10 SD card
Ethernet cable
Optional:
USB pen drive formatted with the FAT32 file system and hub
SATA drive formatted with the FAT32 file system, external power supply, and data cable
Software Tools
Required:
Linux host machine for all tool flow tutorials (see UG1144 for detailed OS requirements)
PetaLinux Tools version 2020.2 (see UG1144 for installation instructions)
Git a distributed version control system
Serial terminal emulator e.g. teraterm
Download, Installation, and Licensing of Vivado Design Suite 2020.2
The Vivado Design Suite User Guide explains how to download and install the Vivado® Design Suite tools, which include the Vivado Integrated Design Environment (IDE), High-Level Synthesis tool, and System Generator for DSP. This guide also provides information about licensing and administering evaluation and full copies of Xilinx design tools and intellectual property (IP) products. The Vivado Design Suite can be downloaded from here.
LogiCORE IP Licensing
The following IP cores require a license to build the design.
Video Mixer- Included with Vivado - PG243
Video PHY Controller - Included with Vivado - PG203
HDMI-Rx/Tx Subsystem - Purchase license (Hardware evaluation available) - PG235 & PG236
Video Processing Subsystem (VPSS) - Included with Vivado - PG231
To obtain the LogiCORE IP license, please visit the respective IP product page and get the license.
AR# 44029 - Licensing - LogiCORE IP Core licensing questions
Compatibility
The reference design has been tested successfully with the following user-supplied components.
HDMI Monitor:
Make/Model | Resolutions |
---|---|
LG 27UD88 | 3840 x 2160 @ 30Hz |
Samsung LU28ES90DS/XL | 3840 x 2160 @ 30Hz |
Cable:
HDMI 2.0 compatible cable
The below table provides the performance information:
Resolution | FPS Achieved |
---|---|
4Kp30 | 27 - 30 |
1080p30 | 30 |
Above FPS are measured withgop-mode=basic gop-length=60 b-frames=0 target-bitrate=1500 num-slices=8 control-rate=constant prefetch-buffer=true low-bandwidth=false qp-mode=roi
encoder parameters for AVC and HEVC.
1.5 Board Setup
The below section will provide the information on the ZCU106 board setup for running ROI design.