The primary goal of the VCU ROI design is to demonstrate the use of the Xilinx Deep learning Processor Unit (DPU) block for extracting the Region of Interest (ROI) data from input video frames. This information is then used to perform ROI-based encoding using the Video Codec Unit (VCU) encoder hard block present in Zynq UltraScale+ EV devices.
To run the VCU ROI TRD demo you'll need to connect an HDMI input source to the bottom HDMI connector on the board, and an HDMI Monitor to the top HDMI connector. The 2020.2 VCU HDMI ROI TRD wiki page has more information about the VCU ROI TRD and setting up the board.
The following figure outlines the “Serial Pipeline” usage model with face detection based on ROI detection on ZCU106. This use case is the foundation of this section.
Running the Demo
There are two ways to run the VCU ROI TRD Demo:
Using discrete commands
Using the xlnx-vcu-roi-trd snap from the Xilinx Snap Store on Snapcraft.io. (coming soon!)
Using Discrete Commands
This demonstration assumes that both the HDMI source and HDMI monitor run at 1080p30 (1920x1080 @ 30fps).
This demo can also run at up to 4Kp30 on the ZCU106.
This demo only supports 1080p30 on the ZCU104.
Before getting started, be sure to do the following:
Set the HDMI output resolution
Launch the gstreamer pipeline
Depending on whether or not you have a USB camera plugged into your system, /dev/media0 in the following steps may need to change to /dev/media1. Use the command media-ctl -p -d /dev/media[X] to determine which device to use for HDMI Rx.
Setting the HDMI Output Resolution
It is important to set up the video pipeline with a resolution and refresh rate that matches something your monitor is able to support. In order to find out which resolutions your monitor can support, use the modetest command with the syntax below. It is important to use the device a00c0000.v_mix as this corresponds to the video mixer IP.
sudo modetest -cD a00c0000.v_mix | head -n 32
Sample output from this command appears below. Note that in Connector #0, the supported resolution is email@example.comHz. The output of your monitor’s connector section will likely be different.
From the command line, issue the following commands to set the HDMI Monitor to 1080p30 in the AR24 color space. The AR24 color space is a good default to assume. Other common alternatives are BG24 and NV12.
The -s flag actively sets the mode with the syntax -s <connector_id>[,<connector_id>][@<crtc_id>]:[#<mode index>]<mode>[-<vrefresh>][@<format>]. In our example, 40: is the connector ID, 1920x1080-30 is the resolution and refresh rate, and @AR24 is the color space mode. For more details, consult the output of modetest --help.
If you are having trouble finding a valid resolution and color space, re-run the modetest command without piping the output to the head command in order to get the full output, including a list of supported color spaces.
Once you have found a valid resolution and color space, you should see the default video test pattern displayed on the screen after issuing the modetest command.
Next, set the display to initialize the display and wait for further input:
mediasrcbin - The following gstreamer command line uses the mediasrcbin plugin. This mediasrcbin plugin is Xilinx-specific plugin which is an element on top of v4l2src. It parses and configures the media graph of a media device automatically.
GET_DMA_FD: Cannot allocate memory [2021-06-09 19:14:17.541401720] [module_dec.cpp:467] [AllocateDMA] No more memory [2021-06-09 19:14:17.541861042] [omx_component_dec.cpp:261] [AllocateBuffer] OMX_ErrorInsufficientResources
This is usually due to running the ZCU104 VCU in 4K mode, (gstreamer command line has width=3840,height=1080). This mode is not supported on the ZCU104.
Using the xlnx-vcu-roi-trd Snap
Once the gstreamer command or snap starts successfully the output of the HDMI input source is shown on the HDMI monitor. Any faces detected in the input stream will be marked by a red bounding box. If you look closely, you will see that the area inside the bounding box is clearer and includes less compression artifacts than the rest of the image.
monitor output with bounding boxes
close-up of monitor showing higher resolution inside region-of-interest