Zynq UltraScale+ MPSoC VCU TRD 2021.1 - Xilinx Low Latency PS DDR NV12 HDMI Audio Video Capture and Display

This page provides all the information related to Design Module 7 - VCU TRD Xilinx low latency(LLP2) PS DDR NV12 HDMI design.

Table of Contents

1 Overview

This module enables capture of video and audio data from an HDMI-Rx subsystem implemented in the PL. The video and audio data can be displayed through the HDMI-Tx subsystem implemented in the PL. The module can stream-out and stream-in live captured video and audio through an Ethernet interface at ultra-low latencies using Sync IP. This module supports four video streams using AXI broadcaster at capture side and mixer at display side for NV12 pixel format. It also supports single-stream audio.

The VCU encoder and decoder operate in slice mode. An input frame is divided into multiple slices (8 or 16) horizontally. The encoder generates a slice_done interrupt at every end of the slice. Generated NAL unit data can be passed to a downstream element immediately without waiting for the frame_done interrupt. The VCU decoder also starts processing data as soon as one slice of data is ready in its circular buffer instead of waiting for complete frame data. The Sync IP does an AXI transaction-level tracking so that the producer and consumer can be synchronized at the granularity of AXI transactions instead of granularity at the video buffer level. Sync IP is responsible for synchronizing buffers between Capture DMA and VCU encoder as both work on same buffer.

The capture element (FB write DMA) writes video buffers in raster-scan order. SyncIP monitors the buffer level while the capture element is writing into DRAM and allows the encoder to read input buffer data if the requested data is already written by DMA, otherwise it blocks the encoder until DMA completes its writes. On the decoder side, the VCU decoder writes decoded video buffer data into DRAM in block-raster scan order and displays reads data in raster-scan order. To avoid display under-run problems, software ensures a phase difference of "~frame_period/2", so that decoder is ahead compare to display.

This design supports the following video interfaces:

Sources:

  • HDMI-Rx capture pipeline implemented in the PS.

  • Stream-In from network or internet.

Sinks:

  • HDMI-Tx display pipeline implemented in the PS.

VCU Codec:

  • Video Encode/Decode capability using VCU hard block in PS

    • AVC/HEVC encoding

    • Encoder/decoder parameter configuration.

Video format:

  • NV12

Supported Resolution:

The table below provides the supported resolution from command line app only in this design.

Resolution

Command Line

Single Stream

Multi-stream

4kp60

NA

4kp30

√ (Max 2)

1080p60

√ (Max 4 for encoder) (Max 2 for decoder)

√ - Supported
NA – Not applicable
x – Not supported

When using Low Latency mode (LLP1/LLP2), The encoder and decoder are limited by the number of internal cores. The encoder has a maximum of four streams and the decoder has a maximum of two streams.

The below table gives information about the features supported in this design. 

Pipeline

Video Input
source

Audio Input
source

Video Format

Video Output
Type

Audio Output
Type

Resolution

VCU codec

Pipeline

Video Input
source

Audio Input
source

Video Format

Video Output
Type

Audio Output
Type

Resolution

VCU codec

Serial pipeline

HDMI-Rx

HDMI-Rx

NV12

HDMI-Tx

HDMI-Tx

4kp60/4kp30/1080p60

HEVC/AVC

Stream-Out pipeline

HDMI-Rx

HDMI-Rx

NV12

Stream-Out

Stream-Out

4kp60/4kp30/1080p60

HEVC/AVC

Stream-in pipeline

Stream-In

Stream-In

NV12

HDMI-Tx

HDMI-Tx

4kp60/4kp30/1080p60

HEVC/AVC

The below figure shows the Xilinx Low Latency PS DDR NV12 HDMI design hardware block diagram.

The below figure shows the Xilinx Low Latency PS DDR NV12 HDMI design software block diagram.

1.1 Board Setup

Refer below link for Board Setup

1.2 Run Flow

The TRD package is released with the source code, Vivado project, Petalinux BSP, and SD card image that enables the user to run the demonstration. It also includes the binaries necessary to configure and boot the ZCU106 board. Prior to running the steps mentioned in this wiki page, download the TRD package and extract its contents to a directory referred to as TRD_HOME, which is the home directory.

TRD package contents are placed in the following directory structure. The user needs to copy all the files from the $TRD_HOME/images/vcu_llp2_hdmi_nv12/ to the FAT32 formatted SD card directory.

rdf0428-zcu106-vcu-trd-2021-1/ ├── apu │ └── vcu_petalinux_bsp ├── images │ ├── vcu_10g │ ├── vcu_audio │ ├── vcu_llp2_hdmi_nv12 │ ├── vcu_llp2_hdmi_nv16 │ ├── vcu_llp2_hdmi_xv20 │ ├── vcu_llp2_sdi_xv20 │ ├── vcu_multistream_nv12 │ ├── vcu_pcie │ ├── vcu_plddrv1_hdr10_hdmi │ ├── vcu_plddrv2_hdr10_hdmi │ └── vcu_sdi_xv20 ├── pcie_host_package │ ├── COPYING │ ├── include │ ├── LICENSE │ ├── readme.txt │ ├── RELEASE │ ├── tests │ ├── tools │ └── xdma ├── pl │ ├── constrs │ ├── designs │ ├── prebuild │ ├── README.md │ └── srcs └── README.txt └── zcu106_vcu_trd_sources_and_licenses.tar.gz 24 directories, 7 files

TRD package contents specific to Xilinx Low Latency PS DDR NV12 HDMI design are placed in the following directory structure.

rdf0428-zcu106-vcu-trd-2021-1 ├── apu │ └── vcu_petalinux_bsp │ └── xilinx-vcu-zcu106-v2021.1-final.bsp ├── images │ ├── vcu_llp2_hdmi_nv12 │ │ ├── autostart.sh │ │ ├── BOOT.BIN │ │ ├── boot.scr │ │ ├── config │ | ├── Image │ | ├── rootfs.cpio.gz.u-boot │ │ ├── system.dtb │ │ └── vcu ├── pcie_host_package ├── pl │ ├── constrs │ ├── designs │ │ ├── zcu106_llp2_audio_nv12 │ ├── prebuild │ │ ├── zcu106_llp2_audio_nv12 │ ├── README.md │ └── srcs │ ├── hdl │ └── ip └── README.txt └── zcu106_vcu_trd_sources_and_licenses.tar.gz

Configuration files(input.cfg) for various resolutions are placed in the following directory structure in /media/card.

  • The single streams configs (1-1080p60, 1-4kp30 and 1-4kp60) support Audio and Video both.

  • As llp2 stream-in is not supported with vcu-gst-app, we have added sample shell scripts containing relevant GStreamer commands for all Stream-in use-cases. User can modify the scripts as per convenience, or can directly use GStreamer pipelines provided in this wiki page.

config/ ├── 1-1080p60 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 1-4kp30 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 1-4kp60 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 2-1080p60 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 2-4kp30 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 4-1080p60 │   ├── Display │   ├── Stream-in │   └── Stream-out └── input.cfg 24 directories, 1 file

1.2.1 GStreamer Application (vcu_gst_app)

The vcu_gst_app is a command-line multi-threaded Linux application. The command-line application requires an input configuration file (input.cfg) to be provided in plain text.

Run below modetest command to set CRTC configurations for 4kp60:

Run below modetest command to set CRTC configurations for 4kp30:

Execution of the application is shown below:

Example:

  • Make sure HDMI-Rx should be configured to 4kp60 mode while running below example pipelines.

  • Low latency(LLP1/LLP2) video and audio+video stream-in pipelines are not supported in vcu_gst_app.

  • The vcu_gst_app uses RTP+RTCP streaming and opus encoder for LLP1/LLP2 audio+video stream-out use-cases.

  • All single-stream serial/streaming pipelines have audio configuration ON by default. To execute only display pipeline, change the Audio Enable property to FALSE in the configuration file.

4kp60 NV12 HEVC_25Mbps ultra low-latency(LLP2) audio+video serial pipeline execution.

4kp60 NV12 HEVC_25Mbps ultra low-latency(LLP2) stream-out pipeline execution.

4kp60 NV12 HEVC ultra-low-latency(LLP2) video stream-in pipeline execution.

OR

4kp60 NV12 HEVC ultra-low-latency(LLP2) audio+video stream-in pipeline execution. where 192.168.25.90 is the server’s IP address.

OR

For LLP1/LLP2 Multi-stream HEVC serial and stream-out use-cases (2-4kp30, 2-1080p60, 4-1080p60), use ENC_EXTRA_OP_BUFFERS=10 variable before vcu_gst_app command. The sample pipeline is given below:

To measure the latency of the pipeline, run the below command. The latency data is huge, so dump it to a file.

Refer below link for detailed run flow steps

1.3 Build Flow

Refer below link for detailed build flow steps


2 Other Information

2.1 Known Issues

2.2 Limitations

2.3 Optimum VCU Encoder parameters for use-cases

Video streaming:

  • Video streaming use-case requires very stable bitrate graph for all pictures.

  • It is good to avoid periodic large Intra pictures during the encoding session

  • Low-latency rate control (hardware RC) is the preferred control-rate for video streaming, it tries to maintain equal amount frame sizes for all pictures.

  • Good to avoid periodic Intra frames instead use low-delay-p (IPPPPP…)

  • VBR is not a preferred mode of streaming.

Performance: AVC Encoder settings:

  • It is preferred to use 8 slices only for better AVC encoder performance.

  • AVC standard does not support Tile mode processing which results in the processing of MB rows sequentially for entropy coding.

Quality: Low bitrate AVC encoding:

  • Enable profile=high and use qp-mode=auto for low-bitrate encoding use-cases.

  • The high profile enables 8x8 transform which results in better video quality at low bitrates.


3 Appendix A - Input Configuration File (input.cfg)

The example configuration files are stored at /media/card/config/ folder.

Configuration Type

Configuration Name

Description

Available Options

NOTE

Configuration Type

Configuration Name

Description

Available Options

NOTE

Common

 

Common Configuration

It is the starting point of common configuration

 

 

Num of Input

Number of input

1, 2, 3, 4

 

Output

Select the video interface.

HDMI

 

Out Type

Type of output

display, stream

 

Display Rate

Pipeline frame rate

30, 60

 

Exit

It indicates to the application that the configuration is over

 

 

 

Input

Input Configuration

It is the starting point of the input configuration

 

 

Input Num

Starting Nth input configuration

1, 2, 3, 4

 

Input Type

Input source type

HDMI

 

Raw

To tell the pipeline is processed or pass-through

FALSE

Raw use-case is not supported for LLP2 use-case. It is supported for non-LLP2 use-case.

Width

The width of the live source

3840,1920

 

Height

The height of the live source

2160, 1080

 

Format

The format of input data

NV12

 

Enable LLP2

To enable LLP2 configuration.

TRUE, FALSE

Set Enable LLP2 equals to False for non-LLP2 use-case.

Exit

It indicates to the application that the configuration is over

 

 

Encoder

 

Encoder Configuration

It is the starting point of encoder configuration

 

 

Encoder Num

Starting Nth encoder configuration

1,2,3,4

 

Encoder Name

Name of the encoder

AVC, HEVC

 

Profile

Name of the profile

high for AVC,
main for HEVC.

 

Rate Control

Rate control options

Low_Latency

 

Filler Data

Filler Data NAL units for CBR rate control

False

 

QP

QP control mode used by the VCU encoder

Uniform, Auto

 

L2 Cache

Enable or Disable L2Cache buffer in encoding process.

True, False

 

Latency Mode

Encoder latency mode.

sub_frame

 

Low Bandwidth

If enabled, decrease the vertical search range used for P-frame motion estimation to reduce the bandwidth.

True, False

 

Gop Mode

Group of Pictures mode.

Basic, low_delay_p, low_delay_b

 

Bitrate

Target bitrate in Kbps

1-25000

 

B Frames

Number of B-frames between two consecutive P-frames

0

 

Slice

The number of slices produced for each frame. Each slice contains one or more complete macroblock/CTU row(s). Slices are distributed over the frame as regularly as possible. If slice-size is defined as well more slices may be produced to fit the slice-size requirement.

  • 4-22 4K resolution with HEVC codec

  • 4-32 4K resolution with AVC codec

  • 4-32 1080p resolution with HEVC codec

  • 4-32 1080p resolution with AVC codec

The recommended slice for LLP2 use-case is 8.

GoP Length

The distance between two consecutive I frames

1-1000

 

GDR Mode

It specifies which Gradual Decoder Refresh(GDR) scheme should be used when gop-mode = low_delay_p

Horizontal/Vertical/Disabled

GDR mode is currently supported with LLP1/LLP2 low-delay-p use-cases only

Entropy Mode

It specifies the entropy mode for H.264 (AVC) encoding process

CAVLC/CABAC/Default

 

Max Picture Size

It is used to curtail instantaneous peak in the bit-stream using this parameter. It works in CBR/VBR rate-control only. When it is enabled, max-picture-size value is calculated and set with 10% of AllowedPeakMargin. i.e. max-picture-size =  (TargetBitrate / FrameRate) * 1.1

TRUE/FALSE

 

Preset

 

Custom

 

Exit

It indicates to the application that the configuration is over

 

 

Streaming

Streaming Configuration

It is the starting point of streaming configuration.

 

 

Streaming Num

Starting Nth Streaming configuration

1, 2, 3, 4

 

Host IP

The host to send the packets to

192.168.25.89 or Windows PC IP

 

Port:

The port to send the packets to. In case of LLP1/LLP2 audio+video rtp stream-out pipelines, rtp/rtcp audio and video port numbers are assigned in below pattern in vcu_gst_app:
RTP/RTCP Port Type | Port Value | e.g. Port Values, if Port = 5004
Actual video data rtp port | Port | 5004
Tx Video rtcp packets port | Port+1 | 5005
Rx Video rtcp packets port | Port+2 | 5006
Actual Audio data rtp port | Port+4 | 5008
Tx Audio rtcp packets port | Port+5 | 5009
Rx Audio rtcp packets port | Port+6 | 5010

5004, 5008, 5012, 5016

 

Exit

It indicates to the application that the configuration is over.

 

 

Audio Configuration

Audio Configuration

It is the starting point of the audio configuration.

 

 

Audio Enable

Enable or Disable audio in pipeline.

True, False

 

Audio Format

The format of the audio

S24_32LE (for serial use-cases)
S16LE (for stream-out use-cases)

 

Sampling Rate

To set the audio sampling rate.

48000

 

Num Of Channel

The number of audio channels.

2

 

Source

It should be HDMI, as currently only HDMI audio capture is supported.

 

 

Renderer

It should be HDMI, as currently only HDMI audio renderer is supported.

 

 

Exit

It indicates to the application that the configuration is over.

 

 

Trace

Trace Configuration

It is the starting point of trace configuration.

 

 

FPS Info

To display fps info on the console.

True, False

 

APM Info

To display APM counter number on the console.

True, False

 

 

Pipeline Info

To display pipeline info on console.

True, False

 

 

Exit

It indicates to the application that the configuration is over.

 

 

 


4 Appendix B - HDMI-Rx/Tx Link-up and GStreamer Commands

This section covers configuration of HDMI-Rx using media-ctl utility and HDMI-Tx using modetest utility, along with demonstrating HDMI-Rx/Tx link-up issues and steps to switch HDMI-Rx resolution. It also contains sample GStreamer Low-Latency NV12 and Xilinx’s Ultra Low-Latency NV12 Audio+Video pipelines for Display, Strea