Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page provides all the information related to Design Module 7 - VCU TRD Xilinx low latency(LLP2) PS DDR NV12 HDMI design.

...

...

2

...

V4l2 Capture Control Software Encoder Application

2.

...

For Petalinux related known issues please refer: PetaLinux 2020.2 - Product Update Release Notes and Known Issues

...

1 Overview

The primary goal of v4l2 capture control software encoder application is to demonstrate the Xilinx’s Ultra Low-Latency feature using the VCU ctrlsw APIs. This application (v4l2_capture_ctrlsw_enc) is an enhanced version of normal VCU ctrlsw app (ctrlsw_encoder). Normal ctrlsw_encoder application is only capable for file-based encoding, while this app captures data from the HDMI source and does stream-out using Gstreamer libraries.

The v4l2 capture ctrlsw encoder application has following features:

  • Stream out encoded data captured from HDMI source using RTP streaming.

  • Record encoded data captured from HDMI source to a file.

  • Supports various encoding options, can be set by config file as an input to the application, similar as how config file is used for ctrlsw_encoder.

  • Supports various latency modes e.g. Xilinx’s Ultra Low Latency (LLP2) mode via --xlnx-slicelat and Low Latency (LLP1) mode via --slicelat, can be set via command line

The following figure shows one of the use cases (Xilinx Low-Latency streaming):

...

As shown in the above figure, the app performs the below list of operations in case of Xilinx’s Ultra Low Latency (LLP2) mode:

  1. Application enables syncip and programs address ranges as per input video format and resolution.

  2. Application calls VIDIOC_DQBUF and sends empty input buffer to encoder using early dequeue mechanism.

  3. Encoder receives this empty buffer and starts generating read request.

  4. Start DMA command is issued to v4l2 capture driver and capture starts filling the buffer.

  5. SyncIP blocks the encoder until framebuffer-write is done writing data corresponding to read request made by encoder.

  6. Once encoder is unblocked, it starts encoding data and generating output slices corresponding to unblocked input read requests.

  7. Encoded data is feed to Gstreamer AppSrc, and it passed to UDP sink through RTP payloader to stream-out the encoded data.

  8. Similarly, for consecutive buffers v4l2 programs SyncIP, submits buffer to encoder using VIDIOC_DQBUF and syncip block the encoder until v4l2 has written sufficient data. This way syncip maintains the synchronization between producer (v4l2) and consumer (encoder).

In case of --slicelat (llp1) there will by no syncip in the input path to encoder  and application gets the input frame filled by v4l2 using VIDIOC_DQBUF which is  passed to encoder. The encoder then reads the input frame and generates output slices as per the number of slices set in the configuration file which are then streamed out as depicted in above section.

The below figure shows the v4l2 capture control software encoder application software block diagram:

...

2.2 Downloads

2.3 Build Flow

This tutorial shows how to build the above v4l2 control software encoder application’s AR package to generate Linux and boot image using the PetaLinux build tool. It assume that the $TRD_HOME environment variable is set as given below.

It is recommended to follow the build steps in sequence.

Code Block
$ export TRD_HOME=</path/to/downloaded/zipfile>/rdf0428-zcu106-vcu-trd-2020-2
  • Source the Petalinux tool-chain using below command

Code Block
$ source <path/to/petalinux-installer>/tool/petalinux-v2020.2-final/settings.sh
$ echo $PETALINUX

Post PetaLinux installation $PETALINUX environment variable should be set.

  • Create a PetaLinux project.

Code Block
$ cd $TRD_HOME/apu/vcu_petalinux_bsp
$ petalinux-create -t project -s xilinx-vcu-zcu106-v2020.2-final.bsp
  • Configure the PetaLinux project.

Code Block
$ cd xilinx-vcu-zcu106-v2020.2-final
$ petalinux-config --get-hw-description=$TRD_HOME/pl/prebuild/zcu106_llp2_audio_nv12/
  • If the Vivado project is modified, the user is expected to configure the bsp with the modified .xsa file and build. e.g.

Code Block
$ petalinux-config --get-hw-description=$TRD_HOME/pl/build/zcu106_llp2_audio_nv12/
  • Extract the downloaded AR package in the petalinux project (xilinx-vcu-zcu106-v2020.2-final/) directory using below command

Code Block
$ unzip -o <path-to-AR-package>/vcu_trd_v4l2_ctrlsw_enc_AR76096.zip

By default, the GStreamer support is enabled in v4l2 control software application. To Disable the GStreamer support, Update the ENABLE_GST = "1" flag to ENABLE_GST = "0" in project-spec/meta-user/recipes-apps/v4l2-ctrlsw-enc/v4l2-ctrlsw-enc.bb. GStreamer support is required to run stream-out use-cases using v4l2 control software application, on disabling the GStreamer support, application will be forced to run record use-case.

  • Create a soft link of llp2 psddr nv12 design dtsi file to system-user.dtsi using below command

Code Block
$ cd project-spec/meta-user/recipes-bsp/device-tree/files/
$ ln -sf vcu_llp2_psddr_hdmi.dtsi system-user.dtsi
$ cd ../../../../../
  • Build the PetaLinux project

Code Block
$ petalinux-build
  • Build SDK components to use it as sysroot for application development (Optional).

Code Block
$ petalinux-build --sdk
$ petalinux-package --sysroot
  • Create a boot image (BOOT.BIN) including FSBL, ATF, bitstream, and u-boot.

Code Block
$ cd images/linux
$ petalinux-package --boot --fsbl zynqmp_fsbl.elf --u-boot u-boot.elf --pmufw pmufw.elf --fpga system.bit
  • Copy the generated boot image and Linux image to the SD card directory.

Code Block
$ cp BOOT.BIN image.ub boot.scr $TRD_HOME/images/vcu_llp2_hdmi_nv12/

2.4 Run Flow

Following section assume that user has already done board-setup, prepared sd-card with built images and boot the board.

The v4l2_capture_ctrlsw_enc is a command-line Linux application. It requires an input configuration file to be provided in the plain text with the necessary encoder configuration options.

Run below command to check all available configuration options:

Code Block
$ v4l2_capture_ctrlsw_enc --help-cfg
OR
$ v4l2_capture_ctrlsw_enc --help-cfg-json

Using above configuration options, create a required configuration file according to your use-case. You can also use sample configuration files from /usr/local/etc/v4l2-capture-ctrlsw-enc/sample_cfg/ and tweak parameters as required.

Below are some example pipelines using v4l2_capture_ctrlsw_enc.

  • The stream-out pipeline will only run if application is compiled with ENABLE_GST = “1“

  • The v4l2_capture_ctrlsw_enc uses hard-coded port value 5004 for stream-out use-case, and it only supports single stream use-case.

  • Run following v4l2_capture_ctrlsw_enc command to run Low-Latency (LLP1) stream-out use-case. where 192.168.25.89 is host/client ip address.

Code Block
$ v4l2_capture_ctrlsw_enc -cfg /usr/local/etc/v4l2-capture-ctrlsw-enc/sample_cfg/4kp60_HEVC.cfg --slicelat --hostip 192.168.25.89
  • Run following gst-launch-1.0 command on client to display stream-in NV12 video on HDMI-Tx using Low-Latency(LLP1) GStreamer pipeline.

Code Block
$ gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 caps="application/x-rtp, media=video, clock-rate=90000, payload=96, encoding-name=H265" ! rtpjitterbuffer latency=7 ! rtph265depay ! h265parse ! video/x-h265, alignment=nal ! omxh265dec low-latency=1 ! video/x-raw ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false 'video-sink=kmssink bus-id=a0070000.v_mix hold-extra-sample=1 show-preroll-frame=false sync=true ' sync=true -v 
  • Run following v4l2_capture_ctrlsw_enc command to run Xilinx’s Ultra Low-Latency (LLP2) stream-out use-case. where 192.168.25.89 is host/client ip address.

Code Block
$ v4l2_capture_ctrlsw_enc -cfg /usr/local/etc/v4l2-capture-ctrlsw-enc/sample_cfg/4kp60_HEVC.cfg --xlnx-slicelat --hostip 192.168.25.89
  • Run following gst-launch-1.0 command on client to display stream-in NV12 video on HDMI-Tx using Xilinx’s Ultra Low-Latency(LLP2) GStreamer pipeline.

Code Block
$ gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 caps="application/x-rtp, media=video, clock-rate=90000, payload=96, encoding-name=H265" ! rtpjitterbuffer latency=7 ! rtph265depay ! h265parse ! video/x-h265, alignment=nal ! omxh265dec low-latency=1 ! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0 ! fpsdisplaysink name=fpssink text-overlay=false 'video-sink=kmssink bus-id=a0070000.v_mix hold-extra-sample=1 show-preroll-frame=false sync=true ' sync=true -v
  • Run following v4l2_capture_ctrlsw_enc command to run Xilinx’s Ultra Low-Latency (LLP2) record use-case.

Code Block
$ v4l2_capture_ctrlsw_enc -cfg /usr/local/etc/v4l2-capture-ctrlsw-enc/sample_cfg/4kp60_HEVC.cfg --xlnx-slicelat --record

3 Other Information

3.1 Known Issues

...

3.2 Limitations

...

3.3 Optimum VCU Encoder parameters for use-cases

Video streaming:

  • Video streaming use-case requires very stable bitrate graph for all pictures.

  • It is good to avoid periodic large Intra pictures during the encoding session

  • Low-latency rate control (hardware RC) is the preferred control-rate for video streaming, it tries to maintain equal amount frame sizes for all pictures.

  • Good to avoid periodic Intra frames instead use low-delay-p (IPPPPP…)

  • VBR is not a preferred mode of streaming.

...

  • Enable profile=high and use qp-mode=auto for low-bitrate encoding use-cases.

  • The high profile enables 8x8 transform which results in better video quality at low bitrates.

...

4 Appendix A - Input Configuration File (input.cfg)

The example configuration files are stored at /media/card/config/ folder.

Configuration Type

Configuration Name

Description

Available Options

NOTE

Common

Common Configuration

It is the starting point of common configuration

Num of Input

Number of input

1, 2, 3, 4

Output

Select the video interface.

HDMI

Out Type

Type of output

display, stream

Display Rate

Pipeline frame rate

30, 60

Exit

It indicates to the application that the configuration is over

Input

Input Configuration

It is the starting point of the input configuration

Input Num

Starting Nth input configuration

1, 2, 3, 4

Input Type

Input source type

HDMI

Raw

To tell the pipeline is processed or pass-through

FALSE

Raw use-case is not supported for LLP2 use-case. It is supported for non-LLP2 use-case.

Width

The width of the live source

3840,1920

Height

The height of the live source

2160, 1080

Format

The format of input data

NV12

Enable LLP2

To enable LLP2 configuration.

TRUE, FALSE

Set Enable LLP2 equals to False for non-LLP2 use-case.

Exit

It indicates to the application that the configuration is over

Encoder

Encoder Configuration

It is the starting point of encoder configuration

Encoder Num

Starting Nth encoder configuration

1,2,3,4

Encoder Name

Name of the encoder

AVC, HEVC

Profile

Name of the profile

high for AVC,
main for HEVC.

Rate Control

Rate control options

low_latency

Filler Data

Filler Data NAL units for CBR rate control

False

QP

QP control mode used by the VCU encoder

Uniform, Auto

L2 Cache

Enable or Disable L2Cache buffer in encoding process.

True, False

Latency Mode

Encoder latency mode.

sub_frame

Low Bandwidth

If enabled, decrease the vertical search range used for P-frame motion estimation to reduce the bandwidth.

True, False

Gop Mode

Group of Pictures mode.

Basic, low_delay_p, low_delay_b

Bitrate

Target bitrate in Kbps

1-25000

B Frames

Number of B-frames between two consecutive P-frames

0

Slice

The number of slices produced for each frame. Each slice contains one or more complete macroblock/CTU row(s). Slices are distributed over the frame as regularly as possible. If slice-size is defined as well more slices may be produced to fit the slice-size requirement.

  • 4-22 4K resolution with HEVC codec

  • 4-32 4K resolution with AVC codec

  • 4-32 1080p resolution with HEVC codec

  • 4-32 1080p resolution with AVC codec

The recommended slice for LLP2 use-case is 8.

GoP Length

The distance between two consecutive I frames

1-1000

GDR Mode

It specifies which Gradual Decoder Refresh(GDR) scheme should be used when gop-mode = low_delay_p

Horizontal/Vertical/Disabled

GDR mode is currently supported with LLP1/LLP2 low-delay-p use-cases only

Entropy Mode

It specifies the entropy mode for H.264 (AVC) encoding process

CAVLC/CABAC/Default

Max Picture Size

It is used to curtail instantaneous peak in the bit-stream using this parameter. It works in CBR/VBR rate-control only. When it is enabled, max-picture-size value is calculated and set with 10% of AllowedPeakMargin. i.e. max-picture-size =  (TargetBitrate / FrameRate) * 1.1

TRUE/FALSE

Preset

Custom

Exit

It indicates to the application that the configuration is over

Streaming

Streaming Configuration

It is the starting point of streaming configuration.

Streaming Num

Starting Nth Streaming configuration

1, 2, 3, 4

Host IP

The host to send the packets to

192.168.25.89 or Windows PC IP

Port:

The port to send the packets to. In case of LLP1/LLP2 audio+video rtp stream-out pipelines, rtp/rtcp audio and video port numbers are assigned in below pattern in vcu_gst_app:
RTP/RTCP Port Type | Port Value | e.g. Port Values, if Port = 5004
Actual video data rtp port | Port | 5004
Tx Video rtcp packets port | Port+1 | 5005
Rx Video rtcp packets port | Port+2 | 5006
Actual Audio data rtp port | Port+4 | 5008
Tx Audio rtcp packets port | Port+5 | 5009
Rx Audio rtcp packets port | Port+6 | 5010

5004, 5008, 5012, 5016

Exit

It indicates to the application that the configuration is over.

Audio Configuration

Audio Configuration

It is the starting point of the audio configuration.

Audio Enable

Enable or Disable audio in pipeline.

True, False

Audio Format

The format of the audio

S24_32LE (for serial use-cases)
S16LE (for stream-out use-cases)

Sampling Rate

To set the audio sampling rate.

48000

Num Of Channel

The number of audio channels.

2

Source

It should be HDMI, as currently only HDMI audio capture is supported.

Renderer

It should be HDMI, as currently only HDMI audio renderer is supported.

Exit

It indicates to the application that the configuration is over.

Trace

Trace Configuration

It is the starting point of trace configuration.

FPS Info

To display fps info on the console.

True, False

APM Info

To display APM counter number on the console.

True, False

Pipeline Info

To display pipeline info on console.

True, False

Exit

It indicates to the application that the configuration is over.

...

...

5 Appendix B - HDMI-Rx/Tx Link-up and GStreamer Commands

This section covers configuration of HDMI-Rx using media-ctl utility and HDMI-Tx using modetest utility, along with demonstrating HDMI-Rx/Tx link-up issues and steps to switch HDMI-Rx resolution. It also contains sample GStreamer Low-Latency NV12 and Xilinx’s Ultra Low-Latency NV12 Audio+Video pipelines for Display, Stream-In and Stream-Out use-cases.

...

Pixel Format

GStreamer Format

Media Bus Format

GStreamer HEVC Profile

GStreamer AVC Profile

Kmssink Plane-id

NV12

NV12

VYYUYY8_1X24

main

high

34 and 35

  • Video0 in the each gst-launch pipelines indicates a video node for the input source.

  • Make sure HDMI-Rx should be configured to 4kp60 mode, while running below example pipelines.

  • LLP1/LLP2 video / audio+video stream-in pipelines are not supported using vcu_gst_app.

  • For LLP1/LLP2 Multi-stream HEVC serial and stream-out use-cases (2-4kp30, 2-1080p60, 4-1080p60), use ENC_EXTRA_OP_BUFFERS=10 variable before gst-launch-1.0 command.

  • For LLP1/LLP2 Multi-stream serial and stream-in use-cases (2-4kp30, 2-1080p60, 4-1080p60), use internal-entropy-buffers=3 property in decoder.

  • latency-time and buffer-time in Audio+Video serial and streaming pipelines are very aggressive here compared to upstream GStreamer default values to get optimize latency for LLP2 Audio+Video pipeline. You can tune and experiment these parameters in case of any issue with audio/video sync, audio/video packet drop and audio distortion for your use-case.

...