Zynq UltraScale+ MPSoC VCU TRD 2020.2 - Xilinx Low Latency PL DDR XV20 SDI Video Capture and Display

This page provides detailed information related to Design Module 10 - Xilinx Low Latency (LLP2) PL DDR XV20 SDI Video Capture and Display

Table of Contents

1 Overview

This module enables the capture of video from an SDI-Rx subsystem implemented in the PL. The video can be displayed through the SDI-Tx subsystem implemented in the PL. The module can stream-out and stream-in live captured video frames through an Ethernet interface at ultra-low latencies using Sync IP. This module supports multi-stream for XV20 pixel format. In this design, PL_DDR is used for decoding and PS_DDR for encoding so that DDR bandwidth would be enough to support high bandwidth VCU applications requiring simultaneous encoder and decoder operations and transcoding at 4k@60 FPS.

The VCU encoder and decoder operate in slice mode. An input frame is divided into multiple slices (8 or 16) horizontally. The encoder generates a slice_done interrupt at every end of the slice. Generated NAL unit data can be passed to a downstream element immediately without waiting for the frame_done interrupt. The VCU decoder also starts processing data as soon as one slice of data is ready in its circular buffer instead of waiting for complete frame data. The Sync IP does an AXI transaction-level tracking so that the producer and consumer can be synchronized at the granularity of AXI transactions instead of granularity at the video buffer level. Sync IP is responsible for synchronizing buffers between Capture DMA and VCU encoder as both works on the same buffer.

The capture element (FB write DMA) writes video buffers in raster-scan order. SyncIP monitors the buffer level while the capture element is writing into DRAM and allows the encoder to read input buffer data if the requested data is already written by DMA, otherwise it blocks the encoder until DMA completes its writes. On the decoder side, the VCU decoder writes decoded video buffer data into DRAM in block-raster scan order and displays reads data in raster-scan order. To avoid display under-run problems, the software ensures a phase difference of "~frame_period/2", so that decoder is ahead compared to display.

This design supports the following video interfaces:

Sources:

  • SDI-Rx capture pipeline implemented in the PL.

  • Stream-In from network or internet.

Sinks:

  • SDI-Tx display pipeline implemented in the PL.

VCU Codec:

  • Video Encode/Decode capability using VCU hard block in PL.

    • AVC/HEVC encoding

    • Encoder/decoder parameter configuration.

Video format:

  • XV20

Supported Resolution:

The table below provides the supported resolution from command line app only in this design.

Resolution

Command Line

Single Stream

Multi-stream

4Kp60

X

4Kp30

X

1080p60

X

√ - Supported
X – Not supported

The below table gives information about the features supported in this design. 

Pipeline

Input source

Format

Output Type

Resolution

VCU Codec

Pipeline

Input source

Format

Output Type

Resolution

VCU Codec

Serial pipeline

SDI-Rx

XV20

SDI-Tx

4Kp60/4Kp30/1080p60

HEVC/AVC

Stream-Out pipeline

SDI-Rx

XV20

Stream-Out

4Kp60/4Kp30/1080p60

HEVC/AVC

Stream-In pipeline

Stream-In

XV20

SDI-Tx

4Kp60/4Kp30/1080p60

HEVC/AVC


The below figure shows the Xilinx Low-latency PL DDR SDI design hardware block diagram.

The below figure shows the Xilinx Low-latency PL DDR SDI design software block diagram.

1.1 Board Setup

Refer below link for Board Setup

1.2 Run Flow

The TRD package is released with the source code, Vivado project, Petalinux BSP, and SD card image that enables the user to run the demonstration. It also includes the binaries necessary to configure and boot the ZCU106 board. Prior to running the steps mentioned in this wiki page, download the TRD package and extract its contents to a directory referred to as TRD_HOME, which is the home directory.

TRD package contents are placed in the following directory structure. The user needs to copy all the files from the $TRD_HOME/images/vcu_llp2_sdi_xv20 to FAT32 formatted SD card directory.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 rdf0428-zcu106-vcu-trd-2020-2 ├── apu │ └── vcu_petalinux_bsp ├── images │ ├── vcu_10g │ ├── vcu_audio │ ├── vcu_hdr10_hdmi │ ├── vcu_llp2_hdmi_nv12 │ ├── vcu_llp2_hdmi_nv16 │ ├── vcu_llp2_hdmi_xv20 │ ├── vcu_llp2_sdi_xv20 │ ├── vcu_multistream_nv12 │ ├── vcu_pcie │ ├── vcu_quad_sensor │ └── vcu_sdi_xv20 ├── pcie_host_package │ ├── COPYING │ ├── include │ ├── LICENSE │ ├── readme.txt │ ├── RELEASE │ ├── tests │ ├── tools │ └── xdma ├── pl │ ├── constrs │ ├── designs │ ├── prebuild │ ├── README.md │ └── srcs └── README.txt

TRD package contents specific to VCU Xilinx Low Latency PL DDR XV20 SDI design are placed in the following directory structure.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 rdf0428-zcu106-vcu-trd-2020-2 ├── apu │   └── vcu_petalinux_bsp │   └── xilinx-vcu-zcu106-v2020.2-final.bsp ├── images │   ├── vcu_llp2_sdi_xv20 │   │   ├── autostart.sh │   │   ├── BOOT.BIN │ │ ├── boot.scr │   │   ├── config │   │   ├── image.ub │   │   ├── system.dtb │   │   └── vcu ├── pl │   ├── constrs │   ├── designs │   │   ├── zcu106_picxo_llp2_sdi │   ├── prebuild │   │   ├── zcu106_picxo_llp2_sdi │   └── srcs │   ├── hdl │   └── ip └── README.txt

Configuration files (input.cfg) for various resolutions are placed in the following directory structure in /media/card.

As llp2 stream-in is not supported with vcu-gst-app, we have added sample shell scripts containing relevant GStreamer commands for all Stream-in use-cases. User can modify the scripts as per convenience, or can directly use GStreamer pipelines provided in this wiki page.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 config ├── 1080p60 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 4kp30 │   ├── Display │   ├── Stream-in │   └── Stream-out ├── 4kp60 │   ├── Display │   ├── Stream-in │   └── Stream-out └── input.cfg 12 directories, 1 file

1.2.1 GStreamer Application (vcu_gst_app)

The vcu_gst_app is a command-line multi-threaded Linux application. The command-line application requires an input configuration file (input.cfg) to be provided in the plain text

Execution of the application is shown below:

1 $ vcu_gst_app <path to *.cfg file>

Example:

4kp60 XV20 HEVC_25Mbps Display Pipeline execution.

1 $ vcu_gst_app /media/card/config/4kp60/Display/Single_4kp60_HEVC_25Mbps.cfg

4kp60 XV20 HEVC_25Mbps low-delay-p Stream-out Pipeline execution.

1 $ vcu_gst_app /media/card/config/4kp60/Stream-out/Single_4kp60_HEVC_25Mbps.cfg

4kp60 XV20 HEVC_HIGH Stream-in Pipeline execution

1 $ sh /media/card/config/4kp60/Stream-in/Single_4kp60_HEVC_25Mbps.sh

Make sure SDI-Rx should be configured to 4kp60 mode.

To measure the latency of the pipeline, run the below command. The latency data is huge, so dump it to a file.

1 $ GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency" GST_DEBUG_FILE=/run/latency.log vcu_gst_app /media/card/config/input.cfg

Refer below link for detailed run flow steps

1.3 Build Flow

Refer below link for detailed build flow steps


2 Other Information

2.1 Known Issues

2.2 Limitations

2.3 Optimum VCU Encoder parameters for use-cases

Video streaming:

  • Video streaming use-case requires very stable bitrate graph for all pictures

  • It is good to avoid periodic large Intra pictures during the encoding session

  • Low-latency rate control (hardware RC) is the preferred control-rate for video streaming, it tries to maintain equal amount frame sizes for all pictures.

  • Good to avoid periodic Intra frames instead use low-delay-p (IPPPPP…)

  • VBR is not a preferred mode of streaming

Performance: AVC Encoder settings:

  • It is preferred to use 8 or higher slices for better AVC encoder performance

  • AVC standard does not support Tile mode processing which results in the processing of MB rows sequentially for entropy coding.

Quality: Low bitrate AVC encoding:

  • Enable profile=high and use qp-mode=auto for low-bitrate encoding use-cases

  • The high profile enables 8x8 transform which results In better video quality at low bitrate


3 Appendix A - Input Configuration File (input.cfg)

The example configuration files are stored at /media/card/config/ folder.

Configuration Type

Configuration Name

Description

Available Options

Configuration Type

Configuration Name

Description

Available Options

Common

 

Common Configuration

It is the starting point of common configuration

 

Num of Input

Number of input

1

Output

Select the video interface.

SDI

Out Type

Type of output

display, stream

Display Rate

Pipeline frame rate

30, 60

Exit

It indicates to the application that the configuration is over

 

 

Input

Input Configuration

It is the starting point of the input configuration

 

Input Num

Starting Nth input configuration

1

Input Type

Input source type

SDI, Stream

Width

The width of the live source

3840,1920

Height

The height of the live source

2160, 1080

Format

The format of input data

XV20

Enable LLP2

To enable LLP2 use-case

True

Exit

It indicates to the application that the configuration is over

 

Encoder

 

Encoder Configuration

It is the starting point of encoder configuration

 

Encoder Num

Starting Nth encoder configuration

1

Encoder Name

Name of the encoder

AVC, HEVC

Profile

Name of the profile

high for AVC and main for HEVC.

Rate Control

Rate control options

low-latency.

Filler Data

Filler Data NAL units for CBR rate control

False

QP

QP control mode used by the VCU encoder

Uniform, Auto

L2 Cache

Enable or Disable L2Cache buffer in encoding process.

True, False

Latency Mode

Encoder latency mode.

sub_frame

Low Bandwidth

If enabled, decrease the vertical search range used for P-frame motion estimation to reduce the bandwidth.

True, False

Gop Mode

Group of Pictures mode.

Basic, low_delay_p, low_delay_b

Bitrate

Target bitrate in Kbps

1-25000

B Frames

Number of B-frames between two consecutive P-frames

0

Slice

The number of slices produced for each frame. Each slice contains one or more complete macroblock/CTU row(s). Slices are distributed over the frame as regularly as possible. If slice-size is defined as well more slices may be produced to fit the slice-size requirement.

The recommended slice value is 8 for LLP2 use-case.

  • 4 to 22 : 4Kp resolution with HEVC codec

  • 4 to 32 : 4Kp resolution with AVC codec

  • 4 to 32 : 1080p resolution with HEVC codec

  • 4 to 32 : 1080p resolution with AVC codec

GoP Length

The distance between two consecutive I frames

1-1000

GDR Mode

It specifies which Gradual Decoder Refresh(GDR) scheme should be used when gop-mode = low_delay_p

GDR mode is currently supported with LLP1/LLP2 low-delay-p use-cases only

Horizontal/Vertical/Disabled

Entropy Mode

It specifies the entropy mode for H.264 (AVC) encoding process

CAVLC/CABAC/Default

Max Picture Size

It is used to curtail instantaneous peak in the bit-stream using this parameter. It works in CBR/VBR rate-control only. When it is enabled, max-picture-size value is calculated and set with 10% of AllowedPeakMargin. i.e. max-picture-size =  (TargetBitrate / FrameRate) * 1.1

TRUE/FALSE

Preset

Encoder configuration Preset

Custom

Exit

It indicates to the application that the configuration is over

 

Streaming

Streaming Configuration

It is the starting point of streaming configuration.

 

Streaming Num

Starting Nth Streaming configuration

1

Host IP

The host to send the packets to

192.168.25.89 or Windows PC IP

Port

The port to send the packets to.

5004, 5008, 5012, 5016

Exit

It indicates to the application that the configuration is over.

 

Trace

Trace Configuration

It is the starting point of trace configuration.

 

FPS Info

To display fps info on the console.

True, False

APM Info

To display APM counter number on the console.

True, False

Pipeline Info

To display pipeline info on console.

True, False

Exit

It indicates to the application that the configuration is over.

 


4 Appendix B - SDI-Rx/Tx Link-up and GStreamer Commands

This section covers configuration of SDI-Rx using media-ctl utility and SDI-Tx using modetestutility, along with demonstrating SDI-Rx/Tx link-up issue and steps to switch resolution. It also contains sample Gstreamer SDI XV20 Xilinx’s Ultra Low-Latency and Video pipelines for Display, Stream-In and Stream-Out use-cases.

Run the below command to check the SDI link status, resolution, video node and output format of the SDI input source. Run the below command for all media nodes to print media device topology where mediaX represents different media nodes. In the topology, log look for the v_smpte_uhdsdi_rx_ss string to identify the SDI input source media node. The media-ctl command generated as part of petalinux bsp will support all the vcu supported formats like NV12, NV16, XV15 and XV20.

1 $ media-ctl -p -d /dev/mediaX

Check resolution and frame-rate of dv.detect under v_smpte_uhdsdi_rx_ss node.

  • When HDMI source is connected to 4Kp60 resolution, it shows as below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 root@zcu106_vcu_picxo_llp2_sdi:/media/card# media-ctl -p -d /dev/media0 -----> media node for SDI input source Media controller API version 5.4.0 Media device information ------------------------ driver xilinx-video model Xilinx Video Composite Device serial bus info hw revision 0x0 driver version 5.4.0 Device topology - entity 1: vcap_sdirxsdi_rx_input_v_smpte_ (1 pad, 1 link) type Node subtype V4L flags 0 device node name /dev/video0 -----> Video node for SDI input source pad0: Sink <- "a0030000.v_smpte_uhdsdi_rx_ss":0 [ENABLED] - entity 5: a0030000.v_smpte_uhdsdi_rx_ss (1 pad, 1 link) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev0 pad0: Source [fmt:UYVY10_1X20/3840x2160@1000/60000 field:none colorspace:rec709 xfer:709 ycbcr:bt2020 quantization:lim-range] [dv.detect:BT.656/1120 3840x2160p60 (4400x2250) stds:CEA-861 flags:can-reduce-fps,CE-video,has-cea861-vic] ---> SDI-Rx link up -> "vcap_sdirxsdi_rx_input_v_smpte_":0 [ENABLED]
  • When the SDI source is not connected, it shows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 root@zcu106_vcu_picxo_llp2_sdi:/media/card# media-ctl -p -d /dev/media0 -----> media node for SDI input source Media controller API version 5.4.0 Media device information ------------------------ driver xilinx-video model Xilinx Video Composite Device serial bus info hw revision 0x0 driver version 5.4.0 Device topology - entity 1: vcap_sdirxsdi_rx_input_v_smpte_ (1 pad, 1 link) type Node subtype V4L flags 0 device node name /dev/video0 pad0: Sink <- "a0030000.v_smpte_uhdsdi_rx_ss":0 [ENABLED] - entity 5: a0030000.v_smpte_uhdsdi_rx_ss (1 pad, 1 link) type V4L2 subdev subtype Unknown flags 0 device node name /dev/v4l-subdev0 pad0: Source [dv.query:no-lock] -> "vcap_sdirxsdi_rx_input_v_smpte_":0 [ENABLED] ----> link is not detected

Here dv.query:no-lock under v_smpte_uhdsdi_rx_ss node shows SDI-Rx source is not connected or SDI-Rx source is not active (Try waking up the device by pressing a key on remote).

Modetest commands:

  • Modetest command for 4Kp60 Display

1 $ modetest -M xlnx -s 35:3840x2160-60@XV20 -w 35:sdi_mode:5 -w 35:sdi_data_stream:8 -w 35:is_frac:0
  • Modetest command for 4Kp30 Display

1 $ modetest -M xlnx -s 35:3840x2160-30@XV20 -w 35:sdi_mode:4 -w 35:sdi_data_stream:8 -w 35:is_frac:0
  • Modetest command for 1080p60 Display

1 $ modetest -M xlnx -s 35:1920x1080-60@XV20 -w 35:sdi_mode:2 -w 35:sdi_data_stream:2 -w 35:is_frac:0
  • Follow the below steps to switch the SDI-Rx resolution from 1080p60 to 4Kp60.

    • Check current SDI Input Source Resolution (1080p60) by following the above-mentioned steps.

    • Run vcu_gst_app for current SDI resolution (1080p60) by executing the following command.

1 $ vcu_gst_app /media/card/config/input.cfg

Below configurations needs to be set in input.cfg for SDI-1080p60.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Common Configuration : START Num Of Input : 1 Output : SDI Out Type : Display Frame Rate : 60 Exit Input Configuration : START Input Num : 1 Input Type : SDI Raw : TRUE Width : 1920 Height : 1080 Format : XV20 Enable LLP2 : FALSE Exit
  • Change Resolution of SDI Input Source from 1080p60 to 4Kp60 by following below steps.

    • Set the SDI source resolution to 4Kp60 (Homepage → Settings → Display & Sound → Resolution → change to 4Kp60).

    • Save the configuration to take place the change.

    • Verify the desired SDI Input Source Resolution (4Kp60) by following the above-mentioned steps.

  • The table below lists the parameters of the pixel format.

After booting you need to run the modetest command(mandatory) for respective resolution you want to validate.

Pixel Format

GStreamer Format

Media Bus Format

GStreamer HEVC Profile

GStreamer AVC Profile

Kmssink Plane-id

Pixel Format

GStreamer Format

Media Bus Format

GStreamer HEVC Profile

GStreamer AVC Profile

Kmssink Plane-id

XV20

NV16_10LE32

UYVY10_1X20

main-422-10

high-4:2:2

32

Video0 in the each below gst-launch pipelines indicates a video node for the input source.

  • Run the following gst-launch-1.0 command to Display XV20 video on SDI-Tx using low-latency(LLP1) GStreamer Serial pipeline (capture → encode → decode → display).

1 $ gst-launch-1.0 v4l2src io-mode=4 device=/dev/video0 ! video/x-raw, width=3840, height=2160, format=NV16_10LE32, framerate=60/1 ! omxh265enc num-slices=8 control-rate=low-latency gop-mode=low-delay-p target-bitrate=25000 cpb-size=500 gdr-mode=horizontal initial-delay=250 periodicity-idr=240 filler-data=0 prefetch-buffer=true ! video/x-h265, alignment=nal ! queue max-size-buffers=0 ! omxh265dec low-latency=1 ! video/x-raw ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink driver-name=xlnx max-lateness=5000000 show-preroll-frame=false sync=true" sync=true -v
  • Run the following gst-launch-1.0 command to Display XV20 video on SDI-Tx using Xilinx’s ultra Low-Latency(LLP2) GStreamer Serial pipeline.

1 $ gst-launch-1.0 v4l2src io-mode=4 device=/dev/video0 ! video/x-raw\(memory:XLNXLL\), width=3840, height=2160, format=NV16_10LE32, framerate=60/1 ! omxh265enc num-slices=8 control-rate=low-latency gop-mode=low-delay-p target-bitrate=25000 cpb-size=500 gdr-mode=horizontal initial-delay=250 periodicity-idr=240 filler-data=0 prefetch-buffer=true ! video/x-h265, alignment=nal ! queue max-size-buffers=0 ! omxh265dec low-latency=1 ! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink driver-name=xlnx max-lateness=5000000 show-preroll-frame=false sync=true" sync=true -v
  • Run the following gst-launch-1.0 command to Stream-out XV20 video using low-latency(LLP1) GStreamer pipeline. where, 192.168.25.89 is host/client IP address and 5004 is port number.

1 $ gst-launch-1.0 v4l2src io-mode=4 device=/dev/video0 ! video/x-raw, width=3840, height=2160, format=NV16_10LE32, framerate=60/1 ! omxh265enc num-slices=8 periodicity-idr=240 cpb-size=500 gdr-mode=horizontal initial-delay=250 control-rate=low-latency prefetch-buffer=true target-bitrate=25000 gop-mode=low-delay-p ! video/x-h265, alignment=nal ! rtph265pay ! udpsink host=192.168.25.89 port=5004 buffer-size=60000000 max-bitrate=120000000 max-lateness=-1 qos-dscp=60 async=false
  • Run the following gst-launch-1.0 command to display XV20 Stream-in video on SDI-Tx using low-latency(LLP1) GStreamer pipeline. where, 5004 is port number.

1 $ gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 caps="application/x-rtp, media=video, clock-rate=90000, payload=96, encoding-name=H265" ! rtpjitterbuffer latency=7 ! rtph265depay ! h265parse ! video/x-h265, alignment=nal ! omxh265dec low-latency=1 ! video/x-raw ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink driver-name=xlnx" sync=true -v
  • Run the following gst-launch-1.0 command to Stream-out XV20 video using Xilinx’s ultra low-latency(LLP2) GStreamer pipeline. where 192.168.25.89 is host/client IP address and 5004 is port number.

1 $ gst-launch-1.0 v4l2src io-mode=4 device=/dev/video0 ! video/x-raw\(memory:XLNXLL\), width=3840, height=2160, format=NV16_10LE32, framerate=60/1 ! omxh265enc num-slices=8 periodicity-idr=240 cpb-size=500 gdr-mode=horizontal initial-delay=250 control-rate=low-latency prefetch-buffer=true target-bitrate=25000 gop-mode=low-delay-p ! video/x-h265, alignment=nal ! rtph265pay ! udpsink host=192.168.25.89 port=5004 buffer-size=60000000 max-bitrate=120000000 max-lateness=-1 qos-dscp=60 async=false
  • Run the following gst-launch-1.0 command to display XV20 Stream-in video on SDI-Tx using Xilinx’s ultra low-latency(LLP2) GStreamer pipeline. where, 5004 is port number.

1 $ gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 caps="application/x-rtp, media=video, clock-rate=90000, payload=96, encoding-name=H265" ! rtpjitterbuffer latency=7 ! rtph265depay ! h265parse ! video/x-h265, alignment=nal ! omxh265dec low-latency=1 ! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false video-sink="kmssink driver-name=xlnx" sync=true -v