This page provides all the information related to Design Module 7 - VCU TRD Xilinx low latency(LLP2) PS DDR NV12 HDMI Audio Video Capture and Display design.

1 1 Overview
- 1.1 1.1 Board Setup
- 1.2 1.2 Run Flow
  - 1.2.1 1.2.1 GStreamer Application (vcu_gst_app)
- 1.3 1.3 Build Flow
2 2 Other Information
- 2.1 2.1 Known Issues
- 2.2 2.2 Limitations
- 2.3 2.3 Optimum VCU Encoder parameters for use-cases
3 3 Appendix A - Input Configuration File (input.cfg)
4 4 Appendix B - HDMI-Rx/Tx Link-up and GStreamer Commands
5 5 References

1 Overview

This module enables capture of video and audio data from an HDMI-Rx subsystem implemented in the PL. The video and audio data can be displayed through the HDMI-Tx subsystem implemented in the PL. The module can stream-out and stream-in live captured video and audio through an Ethernet interface at ultra-low latencies using Sync IP. This module supports four video streams using AXI broadcaster at capture side and mixer at display side for NV12 pixel format. It also supports single-stream audio.

The VCU encoder and decoder operate in slice mode. An input frame is divided into multiple slices (8 or 16) horizontally. The encoder generates a slice_done interrupt at every end of the slice. Generated NAL unit data can be passed to a downstream element immediately without waiting for the frame_done interrupt. The VCU decoder also starts processing data as soon as one slice of data is ready in its circular buffer instead of waiting for complete frame data. The Sync IP does an AXI transaction-level tracking so that the producer and consumer can be synchronized at the granularity of AXI transactions instead of granularity at the video buffer level. Sync IP is responsible for synchronizing buffers between Capture DMA and VCU encoder as both work on same buffer.

The capture element (FB write DMA) writes video buffers in raster-scan order. SyncIP monitors the buffer level while the capture element is writing into DRAM and allows the encoder to read input buffer data if the requested data is already written by DMA, otherwise it blocks the encoder until DMA completes its writes. On the decoder side, the VCU decoder writes decoded video buffer data into DRAM in block-raster scan order and displays reads data in raster-scan order. To avoid display under-run problems, software ensures a phase difference of "~frame_period/2", so that decoder is ahead compare to display.

This design supports the following video interfaces:

Sources:

HDMI-Rx capture pipeline implemented in the PS.
Stream-In from network or internet.

Sinks:

HDMI-Tx display pipeline implemented in the PS.

VCU Codec:

Video Encode/Decode capability using VCU hard block in PS
- AVC/HEVC encoding
- Encoder/decoder parameter configuration.

Video format:

NV12

Supported Resolutions:

The table below provides the supported resolution for this design.

Resolution	Command Line
Resolution	Single Stream	Multi-stream
4kp60	√	NA
4kp30	√	√ (Max 2)
1080p60	√	√ (Max 4 for encoder) (Max 2 for decoder)

√ - Supported
NA – Not applicable
x – Not supported

When using Low Latency mode (LLP1/LLP2), The encoder and decoder are limited by the number of internal cores. The encoder has a maximum of four streams and the decoder has a maximum of two streams.

The below table gives information about the features supported in this design.

Pipeline	Video Input source	Audio Input source	Video Format	Video Output Type	Audio Output Type	Resolution	VCU codec

Pipeline	Video Input source	Audio Input source	Video Format	Video Output Type	Audio Output Type	Resolution	VCU codec
Serial pipeline `(Capture -> Encode -> Decode -> Display)`	HDMI-Rx	HDMI-Rx	NV12	HDMI-Tx	HDMI-Tx	4kp60/4kp30/1080p60	HEVC/AVC
Stream-Out pipeline `(Capture -> Encode -> Stream-out)`	HDMI-Rx	HDMI-Rx	NV12	Stream-Out	Stream-Out	4kp60/4kp30/1080p60	HEVC/AVC
Stream-in pipeline `(Stream-in -> Decode -> Display)`	Stream-In	Stream-In	NV12	HDMI-Tx	HDMI-Tx	4kp60/4kp30/1080p60	HEVC/AVC

The below figure shows the Xilinx Low Latency PS DDR NV12 HDMI Audio Video Capture and Display design hardware block diagram.

The below figure shows the Xilinx Low Latency PS DDR NV12 HDMI Audio Video Capture and Display design software block diagram.

1.1 Board Setup

Refer to the below link for Board Setup

Zynq UltraScale+ MPSoC VCU TRD 2022.1 Board Setup

1.2 Run Flow

The TRD package is released with the source code, Vivado project, PetaLinux BSP, and SD card image that enables the user to run the demonstration. It also includes the binaries necessary to configure and boot the ZCU106 board. Prior to running the steps mentioned in this wiki page, download the TRD package and extract its contents to a directory referred to as TRD_HOME - which is the home directory.

Refer to Section 4.1 : Download the TRD of the Zynq UltraScale+ MPSoC VCU TRD 2022.1 wiki page to download all TRD contents.

TRD package contents are placed in the following directory structure. The user needs to copy all the files from the $TRD_HOME/images/vcu_llp2_hdmi_nv12/ to the FAT32 formatted SD card directory.

rdf0428-zcu106-vcu-trd-2022-1/
├── apu
│   └── vcu_petalinux_bsp
├── images
│   ├── vcu_audio
│   ├── vcu_llp2_hdmi_nv12
│   ├── vcu_llp2_hlg_sdi
│   ├── vcu_llp2_plddr_hdmi
│   ├── vcu_multistream_nv12
│   ├── vcu_plddrv1_hdr10_hdmi
│   ├── vcu_plddrv2_hdr10_hdmi
│   └── vcu_yuv444
├── pl
│   ├── constrs
│   ├── designs
│   ├── prebuild
│   ├── README.md
│   └── srcs
├── README.txt
└── zcu106_vcu_trd_sources_and_licenses.tar.gz

16 directories, 3 files

TRD package contents specific to Xilinx Low Latency PS DDR NV12 HDMI Audio Video Capture and Display design are placed in the following directory structure.

rdf0428-zcu106-vcu-trd-2022-1
├── apu
│   └── vcu_petalinux_bsp
│       └── xilinx-vcu-zcu106-v2022.1-final.bsp
├── images
│   ├── vcu_llp2_hdmi_nv12
│   │   ├── autostart.sh
│   │   ├── BOOT.BIN
│   │   ├── bootfiles/
│   │   ├── boot.scr
│   │   ├── config/
│   │   ├── Image
│   │   ├── rootfs.cpio.gz.u-boot
│   │   ├── system.dtb
│   │   └── vcu/
├── pl
│   ├── constrs/
│   ├── designs
│   │   ├── zcu106_llp2_audio_nv12/
│   ├── prebuild
│   │   ├── zcu106_llp2_audio_nv12/
│   ├── README.md
│   └── srcs
└── README.txt
└── zcu106_vcu_trd_sources_and_licenses.tar.gz

Configuration files(input.cfg) for various resolutions are placed in the following directory structure in /media/card.

The single streams configs (1-1080p60, 1-4kp30 and 1-4kp60) support Audio and Video both.
As llp2 stream-in is not supported with vcu-gst-app, we have added sample shell scripts containing relevant GStreamer commands for all Stream-in use-cases. User can modify the scripts as per convenience, or can directly use GStreamer pipelines provided in this wiki page.
For 4x1080p60 display use-case, we have added sample shell scripts containing relevant GStreamer commands for all Display use-cases. User can modify the scripts as per convenience, or can directly use GStreamer pipelines provided in this wiki page.

config/
├── 1-1080p60
│   ├── Display
│   ├── Stream-in
│   └── Stream-out
├── 1-4kp30
│   ├── Display
│   ├── Stream-in
│   └── Stream-out
├── 1-4kp60
│   ├── Display
│   ├── Stream-in
│   └── Stream-out
├── 2-1080p60
│   ├── Display
│   ├── Stream-in
│   └── Stream-out
├── 2-4kp30
│   ├── Display
│   ├── Stream-in
│   └── Stream-out
├── 4-1080p60
│   ├── Display
│   ├── Stream-in
│   └── Stream-out
└── input.cfg

24 directories, 1 file

1.2.1 GStreamer Application (vcu_gst_app)

The vcu_gst_app is a command-line multi-threaded Linux application. The command-line application requires an input configuration file (input.cfg) to be provided in plain text.

Run below modetest command to set CRTC configurations for 4kp60:

Run below modetest command to set CRTC configurations for 4kp30:

Execution of the application is shown below:

Example:

Make sure HDMI-Rx should be configured to 4kp60 mode while running below example pipelines.
Low latency(LLP1/LLP2) video and audio+video stream-in pipelines are not supported in vcu_gst_app.
The vcu_gst_app uses RTP+RTCP streaming and opus encoder for LLP1/LLP2 audio+video stream-out use-cases.
All single-stream serial/streaming pipelines have audio configuration ON by default. To execute only display pipeline, change the Audio Enable property to FALSE in the configuration file.

4kp60 NV12 HEVC_25Mbps ultra low-latency(LLP2) audio+video serial pipeline execution.

4kp60 NV12 HEVC_25Mbps ultra low-latency(LLP2) stream-out pipeline execution.

4kp60 NV12 HEVC ultra-low-latency(LLP2) video stream-in pipeline execution.

OR

4kp60 NV12 HEVC ultra-low-latency(LLP2) audio+video stream-in pipeline execution. where 192.168.25.90 is the server’s IP address.

OR

For LLP1/LLP2 Multi-stream HEVC serial and stream-out use-cases (2-4kp30, 2-1080p60, 4-1080p60), use ENC_EXTRA_OP_BUFFERS=10 variable before vcu_gst_app command. The sample pipeline is given below:

To measure the latency of the pipeline, run the below command. The latency data is huge, so dump it to a file.

Refer to the below link for detailed run flow steps

Zynq UltraScale+ MPSoC VCU TRD 2022.1 - Run Flow

1.3 Build Flow

Refer to the below link for detailed build flow steps

Zynq UltraScale+ MPSoC VCU TRD 2022.1 - Build Flow

2 Other Information

2.1 Known Issues

For PetaLinux related known issues please Refer to: PetaLinux 2022.1 - Product Update Release Notes and Known Issues
For VCU related known issues please Refer to AR# 76600: LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues and Xilinx Zynq UltraScale+ MPSoC Video Codec Unit.
To reduce performance issues with llp2 4x serial pipelines, please refer to IRQ Balancing in PG252.

2.2 Limitations

For PetaLinux related limitations please refer to: PetaLinux 2022.1 - Product Update Release Notes and Known Issues
For VCU related limitations please refer to Answer Record 76600: LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues, Xilinx Zynq UltraScale+ MPSoC Video Codec Unit and PG252.

2.3 Optimum VCU Encoder parameters for use-cases

Video streaming:

Video streaming use-case requires very stable bitrate graph for all pictures.
It is good to avoid periodic large Intra pictures during the encoding session
Low-latency rate control (hardware RC) is the preferred control-rate for video streaming, it tries to maintain equal amount frame sizes for all pictures.
Good to avoid periodic Intra frames instead use low-delay-p (IPPPPP…)
VBR is not a preferred mode of streaming.

Performance: AVC Encoder settings:

It is preferred to use 8 slices only for better AVC encoder performance.
AVC standard does not support Tile mode processing which results in the processing of MB rows sequentially for entropy coding.

Quality: Low bitrate AVC encoding:

Enable profile=high and use qp-mode=auto for low-bitrate encoding use-cases.
The high profile enables 8x8 transform which results in better video quality at low bitrates.

3 Appendix A - Input Configuration File (input.cfg)

The example configuration files are stored at /media/card/config/ folder.

Configuration Type	Configuration Name	Description	Available Options	NOTE

Configuration Type	Configuration Name	Description	Available Options	NOTE
Common	Common Configuration	It is the starting point of common configuration
	Num of Input	Number of input	1, 2, 3, 4
	Output	Select the video interface.	HDMI
	Out Type	Type of output	display, stream
	Display Rate	Pipeline frame rate	30, 60
	Exit	It indicates to the application that the configuration is over
Input	Input Configuration	It is the starting point of the input configuration
	Input Num	Starting N^th input configuration	1, 2, 3, 4
	Input Type	Input source type	HDMI
	Raw	To tell the pipeline is processed or pass-through	FALSE	Raw use-case is not supported for LLP2 use-case. It is supported for non-LLP2 use-case.
	Width	The width of the live source	3840,1920
	Height	The height of the live source	2160, 1080
	Format	The format of input data	NV12
	Enable LLP2	To enable LLP2 configuration.	TRUE, FALSE	Set Enable LLP2 equals to False for non-LLP2 use-case.
	Exit	It indicates to the application that the configuration is over
Encoder	Encoder Configuration	It is the starting point of encoder configuration
	Encoder Num	Starting N^th encoder configuration	1,2,3,4
	Encoder Name	Name of the encoder	AVC, HEVC
	Profile	Name of the profile	high for AVC, main for HEVC.
	Rate Control	Rate control options	Low_Latency
	Filler Data	Filler Data NAL units for CBR rate control	False
	QP	QP control mode used by the VCU encoder	Uniform, Auto
	L2 Cache	Enable or Disable L2Cache buffer in encoding process.	True, False
	Latency Mode	Encoder latency mode.	sub_frame
	Low Bandwidth	If enabled, decrease the vertical search range used for P-frame motion estimation to reduce the bandwidth.	True, False
	Gop Mode	Group of Pictures mode.	Basic, low_delay_p, low_delay_b
	Bitrate	Target bitrate in Kbps	1-25000
	B Frames	Number of B-frames between two consecutive P-frames	0
	Slice	The number of slices produced for each frame. Each slice contains one or more complete macroblock/CTU row(s). Slices are distributed over the frame as regularly as possible. If slice-size is defined as well more slices may be produced to fit the slice-size requirement.	4-22 4K resolution with HEVC codec 4-32 4K resolution with AVC codec 4-32 1080p resolution with HEVC codec 4-32 1080p resolution with AVC codec	The recommended slice for LLP2 use-case is 8.
	GoP Length	The distance between two consecutive I frames	1-1000
	GDR Mode	It specifies which Gradual Decoder Refresh(GDR) scheme should be used when gop-mode = low_delay_p	Horizontal/Vertical/Disabled	GDR mode is currently supported with LLP1/LLP2 low-delay-p use-cases only
	Entropy Mode	It specifies the entropy mode for H.264 (AVC) encoding process	CAVLC/CABAC/Default
	Max Picture Size	It is used to curtail instantaneous peak in the bit-stream using this parameter. It works in CBR/VBR rate-control only. When it is enabled, max-picture-size value is calculated and set with 10% of AllowedPeakMargin. i.e. `max-picture-size = (TargetBitrate / FrameRate) * 1.1`	TRUE/FALSE
	Preset		Custom
	Exit	It indicates to the application that the configuration is over
Streaming	Streaming Configuration	It is the starting point of streaming configuration.
	Streaming Num	Starting N^th Streaming configuration	1, 2, 3, 4
	Host IP	The host to send the packets to	192.168.25.89 or Windows PC IP
	Port:	The port to send the packets to. In case of LLP1/LLP2 audio+video rtp stream-out pipelines, rtp/rtcp audio and video port numbers are assigned in below pattern in vcu_gst_app: RTP/RTCP Port Type \| Port Value \| e.g. Port Values, if Port = 5004 Actual video data rtp port \| Port \| 5004 Tx Video rtcp packets port \| Port+1 \| 5005 Rx Video rtcp packets port \| Port+2 \| 5006 Actual Audio data rtp port \| Port+4 \| 5008 Tx Audio rtcp packets port \| Port+5 \| 5009 Rx Audio rtcp packets port \| Port+6 \| 5010	5004, 5008, 5012, 5016
	Exit	It indicates to the application that the configuration is over.
Audio Configuration	Audio Configuration	It is the starting point of the audio configuration.
	Audio Enable	Enable or Disable audio in pipeline.	True, False
	Audio Format	The format of the audio	S24_32LE (for serial use-cases) S16LE (for stream-out use-cases)
	Sampling Rate	To set the audio sampling rate.	48000
	Num Of Channel	The number of audio channels.	2
	Source	It should be HDMI, as currently only HDMI audio capture is supported.
	Renderer	It should be HDMI, as currently only HDMI audio renderer is supported.
	Exit	It indicates to the application that the configuration is over.
Trace	Trace Configuration	It is the starting point of trace configuration.
	FPS Info	To display fps info on the console.	True, False
	APM Info	To display APM counter number on the console.	True, False
	Pipeline Info	To display pipeline info on console.	True, False
	Exit	It indicates to the application that the configuration is over.

4 Appendix B - HDMI-Rx/Tx Link-up and GStreamer Commands

This section covers configuration of HDMI-Rx using media-ctl utility and HDMI-Tx using modetest utility, along with demonstrating HDMI-Rx/Tx link-up issues and steps to switch HDMI-Rx resolution. It also contains sample GStreamer Low-Latency NV12 and Xilinx’s Ultra Low-Latency NV12 Audio+Video pipelines for Display, Stream-In and Stream-Out use-cases.

HDMI source can be locked to any resolution. Run the below command for all media nodes to print media device topology where mediaX represents different media nodes. In the topology, log look for the v_hdmi_rx_ss string to identify the HDMI input source media node.

To check the link status, resolution and video node of the HDMI input source, run below media-ctl command where mediaX indicates media node for the HDMI input source.

When HDMI source is connected to 4kp60 resolution, it shows:

When the HDMI source is not connected, it shows:

Notes for gst-launch-1.0 commands:

Video node for HDMI-Rx source can be checked using media-ctl command. Run below media-ctl command to check video node for HDMI-Rx source where media0 indicates media node for HDMI input source.

Make sure HDMI-Rx media pipeline is configured for 4kp60 resolution and source/sink have the same color format for connected nodes. For NV12 format, run below media-ctl commands to set resolution and format of HDMI scaler node where media0 indicates media node for HDMI input source.
When HDMI Input Source is NVIDIA SHIELD

Follow the below steps to switch the HDMI-Rx resolution from 1080p60 to 4kp60.
- Check current HDMI input source resolution (1080p60) by following the steps mentioned earlier to check HDMI resolution using media-ctl command
- Run vcu_gst_app for current HDMI resolution (1080p60) by executing the following command.

Below configurations needs to be set in input.cfg for LLP1 HDMI-1080p60 use-case.

Change Resolution of HDMI Input Source from 1080p60 to 4kp60 by following below steps.
- Set the HDMI source resolution to 4kp60 (Homepage → Settings → Display & Sound → Resolution → change to 4kp60).
- Save the configuration to take place the change.
Verify the desired HDMI Input Source Resolution (4kp60) by following the above-mentioned steps.
If HDMI-Tx link-up issue is observed after Linux booting, use the following command to get the blue screen on HDMI-Tx for 4kp60.

The table below lists the parameters of the pixel format.

Pixel Format	GStreamer Format	Media Bus Format	GStreamer HEVC Profile	GStreamer AVC Profile	Kmssink Plane-id

Pixel Format	GStreamer Format	Media Bus Format	GStreamer HEVC Profile	GStreamer AVC Profile	Kmssink Plane-id
NV12	NV12	VYYUYY8_1X24	main	high	34 and 35

Run the following gst-launch-1.0 command to display NV12 video on HDMI-Tx using low-latency(LLP1) GStreamer pipeline.

Run the following gst-launch-1.0 command to display NV12 video on HDMI-Tx using Xilinx's ultra low-latency(LLP2) GStreamer pipeline..

Run the following gst-launch-1.0 command to capture and play Xilinx's ultra low-latency(LLP2) HDMI video and raw HDMI audio using the GStreamer pipeline.

Run the following gst-launch-1.0 command to stream-out NV12 video using low-latency(LLP1) GStreamer pipeline. where, 192.168.25.89 is host/client IP address and 5004 is a port number.

Run the following gst-launch-1.0 command to display stream-in NV12 video on HDMI-Tx using low-latency(LLP1) GStreamer pipeline. where, 5004 is port number.

Run the following gst-launch-1.0 command to stream-out NV12 video using Xilinx's ultra low-latency(LLP2) GStreamer pipeline. where, 192.168.25.89 is host/client IP address and 5004 is a port number.

Run the following gst-launch-1.0 command to display NV12 stream-in video on HDMI-Tx using Xilinx's ultra low-latency(LLP2) GStreamer pipeline. where, 5004 is a port number.

Run the following gst-launch-1.0 command to capture, encode and stream-out Xilinx's ultra low-latency(LLP2) HDMI NV12 video and opus encoded HDMI audio using the GStreamer pipeline. It enables transmission and reception of both RTP and RTCP packets. RTP packets are sent on ports 5004(for video) and 5008(for audio). sender RTCP packets are sent on ports 5005(for video) and 5009(for audio). The receiver RTCP packets are received on ports 5006(for video) and 5010(for audio). Here, 192.168.25.89 is client’s IP address.

Run the following gst-launch-1.0 command to stream-in, decode and play Xilinx's ultra low-latency(LLP2) HDMI NV12 video and opus encoded HDMI audio using the GStreamer pipeline. It enables transmission and reception of both RTP and RTCP packets. RTP packets are received on ports 5004(for video) and 5008(for audio). sender RTCP packets are received on ports 5005(for video) and 5009(for audio). The receiver RTCP packets are sent on ports 5006(for video) and 5010(for audio). Here, 192.168.25.90 is server’s IP address.

Run the following gst-launch-1.0 command to capture, encode and stream-out Xilinx's ultra low-latency(LLP2) HDMI NV12 video and opus encoded HDMI audio using the GStreamer pipeline. It enables transmission of only RTP packets. RTP packets are sent on ports 5004(for video) and 5005(for audio).

Run the following gst-launch-1.0 command to stream-in, decode and play Xilinx's ultra low-latency(LLP2) HDMI NV12 video and opus encoded HDMI audio using the GStreamer pipeline. It enables reception of only RTP packets. RTP packets are received on ports 5004(for video) and 5005(for audio).

5 References

To get details on all LogiCORE IPs used in this design module , refer to LogiCORE IPs product guide.

Xilinx Wiki

Zynq UltraScale+ MPSoC VCU TRD 2022.1 - Xilinx Low Latency PS DDR NV12 HDMI Audio Video Capture and Display

Table of Contents