Zynq UltraScale+MPSoC VCU TRD 2019.2 - Xilinx Low Latency PL DDR XV20 HDMI Video Capture and Display
Zynq UltraScale+MPSoC VCU TRD 2019.2 - Xilinx Low Latency PL DDR XV20 HDMI Video Capture and Display
Table of Contents
1 Overview
This module enables capture of video from an HDMI Rx subsystem implemented in the PL. The video can be displayed through the HDMI Tx subsystem implemented in the PL. The module can stream-out and stream-in live captured video frames through an Ethernet interface at ultra-low latencies using Sync IP. This module supports multi-stream for XV20 pixel format. In this design PL_DDR is used for decoding and PS_DDR for encoding so that DDR bandwidth would be enough to support high bandwidth VCU applications requiring simultaneous encoder and decoder operations and transcoding at 4k @60 FPS.
The VCU encoder and decoder operate in slice mode. An input frame is divided into multiple slices (8 or 16) horizontally. The encoder generates a slice_done interrupt at every end of the slice. Generated NAL unit data can be passed to a downstream element immediately without waiting for the frame_done interrupt. The VCU decoder also starts processing data as soon as one slice of data is ready in its circular buffer instead of waiting for complete frame data. The Sync IP does an AXI transaction-level tracking so that the producer and consumer can be synchronized at the granularity of AXI transactions instead of granularity at the video buffer level. Sync IP is responsible for synchronizing buffers between Capture DMA and VCU encoder as both work on same buffer.
The capture element (FB write DMA) writes video buffers in raster-scan order. SyncIP monitors the buffer level while the capture element is writing into DRAM and allows the encoder to read input buffer data if the requested data is already written by DMA, otherwise it blocks the encoder until DMA completes its writes. On the decoder side, the VCU decoder writes decoded video buffer data into DRAM in block-raster scan order and displays reads data in raster-scan order. To avoid display under-run problems, software ensures a phase difference of "~frame_period/2", so that decoder is ahead compare to display.
This design supports the following video interfaces:
Sources:
- HDMI-Rx capture pipeline implemented in the PL.
- Stream-In from network or internet.
Sinks:
- HDMI-Tx display pipeline implemented in the PL.
VCU Codec:
- Video Encode/Decode capability using VCU hard block in PL.
- AVC/HEVC encoding
- Encoder/decoder parameter configuration.
Video format:
- XV20
Supported Resolution:
The table below provides the supported resolution from command line app only in this design.
Resolution | Command Line | |
Single Stream | Multi-stream | |
4kp60 | √ | NA |
4kp30 | √ | √ (Max 2) |
1080p60 | √ | √ (Max 2) |
√ - Supported
NA – Not applicable
x – Not supported
The below table gives information about the features supported in this design.
Pipeline | Input source | Format | Output Type | Resolution | VCU codec |
---|---|---|---|---|---|
Capture--> Encode--> Decode--> Display | HDMI-Rx | XV20 | HDMI-Tx | 4kp60/4kp30/1080p60 | HEVC/AVC |
Stream-Out pipeline | HDMI-Rx | XV20 | Stream-Out | 4kp60/4kp30/1080p60 | HEVC/AVC |
Stream-in pipeline | Stream-In | XV20 | HDMI-Tx | 4kp60/4kp30/1080p60 | HEVC/AVC |
The below figure shows the Xilinx Low Latency PL DDR XV20 HDMI design hardware block diagram.
The below figure shows the Xilinx Low Latency PL DDR XV20 HDMI design software block diagram.
1.1 Board Setup
Refer below link for Board Setup
1.2 Run Flow
The TRD package is released with the source code, Vivado project, Petalinux BSP, and SD card image that enables the user to run the demonstration. It also includes the binaries necessary to configure and boot the ZCU106 board. Prior to running the steps mentioned in this wiki page, download the TRD package and extract its contents to a directory referred to as ‘TRD_HOME' which is the home directory.
Refer below link to download all TRD contents.
TRD package contents are placed in the following directory structure. The user needs to copy all the files from the $TRD_HOME/images/vcu_llp2_hdmi_xv20/ to FAT32 formatted SD card directory.
rdf0428-zcu106-vcu-trd-2019-2 ├── apu │ └── vcu_petalinux_bsp ├── images │ ├── vcu_10g │ ├── vcu_audio │ ├── vcu_hdmi_multistream_xv20 │ ├── vcu_hdmi_rx │ ├── vcu_hdmi_tx │ ├── vcu_llp2_hdmi_nv12 │ ├── vcu_llp2_hdmi_nv16 │ ├── vcu_llp2_hdmi_xv20 │ ├── vcu_llp2_sdi_xv20 │ ├── vcu_multistream_nv12 │ ├── vcu_pcie │ ├── vcu_sdirx │ ├── vcu_sditx │ └── vcu_sdi_xv20 ├── pcie_host_package │ ├── COPYING │ ├── etc │ ├── include │ ├── libxdma │ ├── LICENSE │ ├── README.md │ ├── tools │ └── xdma ├── pl │ ├── constrs │ ├── designs │ ├── prebuild │ ├── README.md │ └── srcs └── README.txt
TRD package contents specific to VCU Xilinx Low Latency PL DDR XV20 HDMI design are placed in the following directory structure.
rdf0428-zcu106-vcu-trd-2019-2 ├── apu │ └── vcu_petalinux_bsp │ └── xilinx-vcu-zcu106-v2019.2-final.bsp ├── images │ ├── vcu_llp2_hdmi_xv20 │ │ ├── autostart.sh │ │ ├── bin │ │ ├── BOOT.BIN │ │ ├── config │ │ ├── image.ub │ │ ├── system.dtb │ │ └── vcu ├── pcie_host_package ├── pl │ ├── constrs │ ├── designs │ │ ├── zcu106_llp2_xv20 │ ├── prebuild │ │ ├── zcu106_llp2_xv20 │ ├── README.md │ └── srcs │ ├── hdl │ └── ip └── README.txt
Configuration files(input.cfg) for various resolutions are placed in the following directory structure in /media/card.
config/ ├── 1-4kp60 │ ├── Display │ └── Stream-out ├── 2-1080p60 │ ├── Display │ └── Stream-out ├── 2-4kp30 │ ├── Display │ └── Stream-out └── input.cfg
1.2.1 GStreamer Application (vcu_gst_app)
The vcu_gst_app is a command line multi-threaded linux application. The command line application requires an input configuration file (input.cfg) to be provided in the plain text.
Run below modetest command to set CRTC configurations for 4kp60:
% modetest -D a00c0000.v_mix -s 35:3840x2160-60@BG24
Run below modetest command to set CRTC configurations for 4kp30:
% modetest -D a00c0000.v_mix -s 35:3840x2160-30@BG24
Execution of the application is shown below:
% vcu_gst_app < path to *.cfg file>
Example:
4kp60 XV20 HEVC_25Mbps ultra low-latency(LLP2) display pipeline execution.
% vcu_gst_app /media/card/config/1-4kp60/Display/Single_4kp60_HEVC_25Mbps.cfg
4kp60 XV20 HEVC_25Mbps ultra low-latency(LLP2) stream-out pipeline execution.
% vcu_gst_app /media/card/config/1-4kp60/Stream-out/Single_4kp60_HEVC_25Mbps.cfg
NOTE: Make sure HDMI-Rx should be configured to 4kp60 mode.
4kp60 XV20 HEVC ultra low-latency(LLP2) stream-in pipeline execution.
% gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 caps="application/x-rtp, media=video, clock-rate=90000, payload=96, encoding-name=H265" ! rtpjitterbuffer latency=5 ! rtph265depay ! h265parse ! video/x-h265, alignment=nal ! omxh265dec low-latency=1 ! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink bus-id=a00c0000.v_mix plane-id=30" sync=true -v
NOTE: Low latency(LLP1/LLP2) stream-in pipeline is not supported in vcu_gst_app.
To measure the latency of the pipeline, run the below command. The latency data is huge, so dump it to a file.
% GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency" GST_DEBUG_FILE=/run/latency.log vcu_gst_app /media/card/config/input.cfg
Refer below link for detailed run flow steps
1.3 Build Flow
Refer below link for detailed build flow steps
2 Other Information
2.1 Known Issues
- For Petalinux related known issues please refer AR# 72950: PetaLinux 2019.2 - Product Update Release Notes and Known Issues.
- For VCU related known issues please refer AR# 66763: LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues and Xilinx Zynq UltraScale+ MPSoC Video Codec Unit.
- Design has a negative slack of WNS around -50 ps . However it does not affect the functionality of the design in the long run also and will be fixed in next release.
2.2 Limitations
- For Petalinux related limitations please refer AR# 72950: PetaLinux 2019.2 - Product Update Release Notes and Known Issues.
- For VCU related limitations please refer AR# 66763: LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues, Xilinx Zynq UltraScale+ MPSoC Video Codec Unit and PG252 link.
2.3 Optimum VCU Encoder parameters for use-cases
Video streaming:
- Video streaming use-case requires very stable bitrate graph for all pictures.
- It is good to avoid periodic large Intra pictures during the encoding session
- Low-latency rate control (hardware RC) is the preferred control-rate for video streaming, it tries to maintain equal amount frame sizes for all pictures.
- Good to avoid periodic Intra frames instead use low-delay-p (IPPPPP…)
- VBR is not a preferred mode of streaming.
Performance: AVC Encoder settings:
- It is preferred to use 8 slices only for better AVC encoder performance.
- AVC standard does not support Tile mode processing which results in the processing of MB rows sequentially for entropy coding.
Quality: Low bitrate AVC encoding:
- Enable profile=high and use qp-mode=auto for low-bitrate encoding use-cases.
- The high profile enables 8x8 transform which results in better video quality at low bitrates.
3 Appendix A - Input Configuration File (input.cfg)
The example configuration files are stored at /media/card/config/ folder.
Common Configuration:
It is the starting point of common configuration.
Num of Input:
1, 2
Output:
Select the video interface.
Options: HDMI
Out Type:
Options: display and stream
Display Rate:
Pipeline frame rate.
Options: 30 FPS or 60 FPS for each stream.
Exit:
It indicates to the application that the configuration is over.
Input Configuration:
It is the starting point of the input configuration.
Input Num:
Starting Nth input configuration.
Options: 1, 2
Input Type:
Input source type.
Options: HDMI
Raw:
To tell the pipeline is processed or pass-through.
Options: False
Note: Raw use-case is not supported for both LLP2 and non-LLP2 use-case as mixer is not connected to PS DDR.
Width:
The width of the live source.
Options: 3840, 1920
Height:
The height of the live source.
Options: 2160, 1080
Format:
The format of input data.
Options: XV20
Enable LLP2:
To enable LLP2 configuration.
Options: True
Note: Set Enable LLP2 equals to False for non-LLP2 use-case.
Exit:
It indicates to the application that the configuration is over.
Encoder Configuration:
It is the starting point of encoder configuration.
Encoder Num:
Starting Nth encoder configuration.
Options: 1, 2
Encoder Name:
Name of the encoder.
Options: AVC, HEVC
Profile:
Name of the profile.
Options: high for AVC and main for HEVC.
Rate Control:
Rate control options.
Options: low_latency.
Filler Data:
Filler Data NAL units for CBR rate control.
Options: False
QP:
QP control mode used by the VCU encoder.
Options: Uniform, Auto
L2 Cache:
Enable or Disable L2Cache buffer in encoding process.
Options: True, False
Latency Mode:
Encoder latency mode.
Options: sub_frame
Low Bandwidth:
If enabled, decrease the vertical search range used for P-frame motion estimation to reduce the bandwidth.
Options: True, False
Gop Mode:
Group of Pictures mode.
Options: Basic, low_delay_p, low_delay_b
Bitrate:
Target bitrate in Kbps
Options: 1-25000
B Frames:
Number of B-frames between two consecutive P-frames
Options: 0
Slice:
The number of slices produced for each frame. Each slice contains one or more complete macroblock/CTU row(s). Slices are distributed over the frame as regularly as possible. If slice-size is defined as well more slices may be produced to fit the slice-size requirement.
Options:
4-22 4kp resolution with HEVC codec
4-32 4kp resolution with AVC codec
4-32 1080p resolution with HEVC codec
4-32 1080p resolution with AVC codec
Note: The recommended is 8 for LLP2 use-case.
GoP Length:
The distance between two consecutive I frames
Options: 1-1000
Preset:
Options: Custom
Exit
It indicates to the application that the configuration is over.
Streaming Configuration:
It is the starting point of streaming configuration.
Streaming Num:
Starting Nth Streaming configuration.
Options: 1, 2
Host IP:
The host to send the packets to
Options: 192.168.25.89 or Windows PC IP
Port:
The port to send the packets to
Options: 5004, 5008, 5012 and 5016
Exit
It indicates to the application that the configuration is over.
Trace Configuration:
It is the starting point of trace configuration.
FPS Info:
To display fps info on the console.
Options: True, False
APM Info:
To display APM counter number on the console.
Options: True, False
Pipeline Info:
To display pipeline info on console.
Options: True, False
Exit
It indicates to the application that the configuration is over.
4 Appendix B
- HDMI source can be locked to any resolution. Run the below command for all media nodes to print media device topology where "mediaX" represents different media nodes. In the topology, log look for the “v_hdmi_rx_ss” string to identify the HDMI input source media node.
$ xmedia-ctl -p -d /dev/mediaX
- To check the link status, resolution and video node of the HDMI input source, run below xmedia-ctl command where "mediaX" indicates media node for the HDMI input source.
$ xmedia-ctl -p -d /dev/mediaX
When HDMI source is connected to 4KP60 resolution, it shows:
root@zcu106_vcu_trd:~# xmedia-ctl -p -d /dev/media1 -----> media node for HDMI input source Media controller API version 4.19.0 Media device information ------------------------ driver xilinx-video model Xilinx Video Composite Device serial bus info hw revision 0x0 driver version 4.19.0 Device topology - entity 1: vcap_hdmi output 0 (1 pad, 1 link) type Node subtype V4L flags 0 device node name /dev/video0 -----> Video node for HDMI Rx source pad0: Sink <- "a0040000.v_proc_ss":1 [ENABLED] - entity 5: a0040000.v_proc_ss (2 pads, 2 links) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev1 pad0: Sink [fmt:VYYUYY10_4X20/1280x720 field:none colorspace:srgb] <- "a0000000.v_hdmi_rx_ss":0 [ENABLED] pad1: Source [fmt:VYYUYY10_4X20/1920x1080 field:none colorspace:srgb] -> "vcap_hdmi output 0":0 [ENABLED] - entity 8: a0000000.v_hdmi_rx_ss (1 pad, 1 link) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev2 pad0: Source [fmt:RBG888_1X24/3840x2160 field:none colorspace:srgb] [dv.caps:BT.656/1120 min:0x0@25000000 max:4096x2160@297000000 stds:CEA-861,DMT,CVT,GTF caps:progressive,reduced-blanking,custom ] [dv.detect:BT.656/1120 3840x2160p60 (4400x2250) stds:CEA-861 flags:CE-video] -----> Resolution and Frame-rate of HDMI Rx source -> "a0040000.v_proc_ss":0 [ENABLED]
NOTE: Check resolution and frame-rate of "dv.detect" under "v_hdmi_rx_ss" node.
When the HDMI source is not connected, it shows:
root@zcu106_vcu_trd:~# xmedia-ctl -p -d /dev/media1 -----> media node for HDMI input source Media controller API version 4.19.0 Media device information ------------------------ driver xilinx-video model Xilinx Video Composite Device serial bus info hw revision 0x0 driver version 4.19.0 Device topology - entity 1: vcap_hdmi output 0 (1 pad, 1 link) type Node subtype V4L flags 0 device node name /dev/video0 -----> Video node for HDMI Rx source pad0: Sink <- "a0040000.v_proc_ss":1 [ENABLED] - entity 5: a0040000.v_proc_ss (2 pads, 2 links) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev1 pad0: Sink [fmt:VYYUYY10_4X20/1280x720 field:none colorspace:srgb] <- "a0000000.v_hdmi_rx_ss":0 [ENABLED] pad1: Source [fmt:VYYUYY10_4X20/1920x1080 field:none colorspace:srgb] -> "vcap_hdmi output 0":0 [ENABLED] - entity 8: a0000000.v_hdmi_rx_ss (1 pad, 1 link) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev2 pad0: Source [fmt:RBG888_1X24/3840x2160 field:none colorspace:srgb] [dv.caps:BT.656/1120 min:0x0@25000000 max:4096x2160@297000000 stds:CEA-861,DMT,CVT,GTF caps:progressive,reduced-blanking,custom ] [dv.query:no-link] -----> HDMI Rx Link Status -> "a0040000.v_proc_ss":0 [ENABLED]
NOTE: Here "dv.query:no-link" under "v_hdmi_rx_ss" node shows HDMI-Rx source is not connected or HDMI-Rx source is not active(Try waking up the device by pressing a key on remote).
Notes for gst-launch-1.0 commands:
- Video node for HDMI Rx source can be checked using xmedia-ctl command. Run below xmedia-ctl command to check video node for HDMI Rx source where "media1" indicates media node for HDMI input source.
$ xmedia-ctl -p -d /dev/media1
- Make sure HDMI-Rx media pipeline is configured for 4kp60 resolution and source/sink have the same colour format for connected nodes. For XV20 format, run below xmedia-ctl commands to set resolution and format of HDMI scaler node where "media1" indicates media node for HDMI input source.
When HDMI Input Source is NVIDIA SHIELD
$ xmedia-ctl -d /dev/media1 -V "\"a0040000.v_proc_ss\":0 [fmt:RBG888_1X24/3840x2160 field:none]" $ xmedia-ctl -d /dev/media1 -V "\"a0040000.v_proc_ss\":1 [fmt:UYVY10_1X20/3840x2160 field:none]"
NOTE: Make sure NVIDIA SHIELD is configured for 4kp resolution and RBG888_1X24 format.
- Follow the below steps to switch the HDMI-Rx resolution from 1080p60 to 4kp60.
- Check current HDMI Input Source Resolution (1080p60) by following the above-mentioned steps.
- Run vcu_gst_app for current HDMI resolution (1080p60) by executing the following command.
$ vcu_gst_app /media/card/config/input.cfg
Below configurations needs to be set in input.cfg for non-LLP2 HDMI-1080p60 use-case.
Common Configuration : START Num Of Input : 1 Output : HDMI Out Type : Display Frame Rate : 60 Exit Input Configuration : START Input Num : 1 Input Type : hdmi_1 Raw : FALSE Width : 1920 Height : 1080 Format : XV20 Enable LLP2 : FALSE Exit Encoder Configuration : START Encoder Num : 1 Encoder Name : HEVC Profile : Main Rate Control : Low_Latency Filler Data : False QP : Auto L2 Cache : TRUE Latency Mode : Sub_Frame Low Bandwidth : FALSE Gop Mode : Basic Bitrate : 25000 B Frames : 0 Slice : 8 GoP Length : 60 Preset : Custom Exit
- Change Resolution of HDMI Input Source from 1080p60 to 4kp60 by following below steps.
- Set the HDMI source resolution to 4kp60 (Homepage → Settings → Display & Sound → Resolution → change to 4kp60).
- Save the configuration to take place the change.
- Verify the desired HDMI Input Source Resolution (4kp60) by following the above-mentioned steps.
- Change Resolution of HDMI Input Source from 1080p60 to 4kp60 by following below steps.
- If HDMI Tx link-up issue is observed after Linux booting, use the following command to get the blue screen on HDMI-Tx for 4kp60.
% modetest -D a00c0000.v_mix -s 35:3840x2160-60@BG24
- The table below lists the parameters of the pixel format.
Pixel Format | GStreamer Format | Media Bus Format | GStreamer HEVC Profile | GStreamer AVC Profile | Kmssink Plane-id |
---|---|---|---|---|---|
XV20 | NV16_10LE32 | UYVY10_1X20 | main-422-10 | high-4:2:2 | 30 and 31 |
- Run the following gst-launch-1.0 command to display XV20 video on HDMI-Tx using ultra low-latency(LLP2) GStreamer pipeline (capture → encode → decode → display). Where "video0" indicates a video node for the input source.
$ gst-launch-1.0 v4l2src io-mode=4 device=/dev/video0 ! video/x-raw\(memory:XLNXLL\), width=3840, height=2160, format=NV16_10LE32, framerate=60/1 ! omxh265enc qp-mode=auto gop-mode=basic gop-length=60 b-frames=0 target-bitrate=25000 num-slices=8 control-rate=low-latency prefetch-buffer=TRUE low-bandwidth=false filler-data=0 cpb-size=1000 initial-delay=500 ! video/x-h265, alignment=nal ! queue max-size-buffers=0 ! omxh265dec low-latency=1 ! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink bus-id=a00c0000.v_mix plane-id=30 max-lateness=5000000 show-preroll-frame=false sync=true" sync=true
- Run the following gst-launch-1.0 command to stream-out XV20 video using ultra low-latency(LLP2) GStreamer pipeline. Where "video0" indicates a video node for the input source.
$ gst-launch-1.0 v4l2src io-mode=4 device=/dev/video0 ! video/x-raw\(memory:XLNXLL\), width=3840, height=2160, format=NV16_10LE32, framerate=60/1 ! omxh265enc qp-mode=auto gop-mode=low-delay-p gop-length=60 periodicity-idr=60 b-frames=0 target-bitrate=25000 num-slices=8 control-rate=low-latency prefetch-buffer=TRUE low-bandwidth=false filler-data=0 cpb-size=1000 initial-delay=500 ! video/x-h265, alignment=nal ! queue max-size-buffers=0 ! rtph265pay ! udpsink host=192.168.25.89 port=5004 buffer-size=60000000 max-bitrate=120000000 max-lateness=-1 qos-dscp=60 async=false
NOTE: Here 192.168.25.89 is host/client IP address and 5004 is port no.
- Run the following gst-launch-1.0 command to display XV20 stream-in video on HDMI-Tx using ultra low-latency(LLP2) GStreamer pipeline where 5004 is port no.
$ gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 caps="application/x-rtp, media=video, clock-rate=90000, payload=96, encoding-name=H265" ! rtpjitterbuffer latency=5 ! rtph265depay ! h265parse ! video/x-h265, alignment=nal ! omxh265dec low-latency=1 ! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink bus-id=a00c0000.v_mix plane-id=30" sync=true
NOTE: Low latency(LLP1/LLP2) stream-in pipeline is not supported in vcu_gst_app.
© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy