Zynq UltraScale+ MPSoC VCU TRD 2022.1 - Multi Stream Audio Video Capture and Display
This page provides all the information related to Design Module 3 - VCU TRD Multi stream Audio-Video Capture and Display design.
Table of Contents
1 Overview
The primary goal of this Design is to demonstrate the capabilities of the VCU hard block present in Zynq UltraScale+ EV devices with a soft audio codec. The TRD will serve as a platform to tune the performance parameters of the VCU and arrive at optimal configurations for encoder and decoder blocks with audio-video synchronization.
This design supports the following video interfaces:
Sources:
HDMI-Rx capture pipeline implemented in the PL
MIPI CSI-2 Rx capture pipeline implemented in the PL
File source (SD card, USB storage, SATA hard disk)
Stream-In from network or internet
Sinks:
DP-Tx display pipeline in the PS
HDMI-Tx display pipeline implemented in the PL
VCU Codec:
Video Encode/Decode capability using VCU hard block in PL
AVC/HEVC encoding
Encoder/decoder parameter configuration
Streaming Interfaces:
1G Ethernet PS GEM
Video format:
NV12
Audio Configuration:
Codec: Opus
Format: S24_32LE
Channels: 2
Sampling rate: 48 kHz
Source: HDMI-Rx/ I2S-Rx
Render: HDMI-Tx/ I2S-Tx/ DP
Audio Deliverables:
Pipeline | Video-Input Source | Audio Input Source | Video Output Type | Audio Output Type | ALSA Drivers | Resolution | Audio Codec Type | Audio Configuration | Video Codec Type | Deliverables |
Record use-case |
|
| File-Sink | File-Sink | HDMI-Rx ALSA drivers | 4K / 1080p | Opus | 2 channel @ 48 kHz | HEVC / AVC | HDMI-Rx Audio encode with soft codec and video with VCU and store it in a container format. |
Playback use-case | File Source/ Stream-In | File Source/ Stream-In | DP |
| HDMI-Tx ALSA drivers | 4K / 1080p | Opus | 2 channel @ 48 kHz | HEVC / AVC | Playback of the local-file / stream-in with video decoded using VCU and Audio using GStreamer soft codec. |
Pass-through(RAW) use-case |
|
| DP |
| HDMI-Rx/Tx ALSA drivers | 4K / 1080p | NA | 2 channel @ 48 kHz | HEVC / AVC | HDMI-Rx Audio / Video pass to HDMI-Tx without VCU/Audio-Codec. |
Display(Serial) use-case |
|
| DP |
| HDMI-Rx/Tx ALSA drivers | 4K / 1080p | NA | 2 channel @ 48 kHz | HEVC / AVC | HDMI-Rx raw audio and video with VCU encoder and decode to achieve AV sync. |
Supports 1-4Kp60 Single Stream with either HDMI-Rx or I2S-Rx as input Audio source + HDMI-Rx / MIPI Rx as input Video source; and HDMI-Tx / I2S-Tx as Output Audio Sink + HDMI-Tx / DP as Output Video sink pipeline
Supports 1-4Kp30 Single Stream with either HDMI-Rx or I2S-Rx as input Audio source + HDMI-Rx / MIPI Rx as input Video source; and HDMI-Tx / I2S-Tx as Output Audio Sink + HDMI-Tx / DP as Output Video sink pipeline
Supports 1-1080p60 Single Stream with either HDMI-Rx or I2S-Rx as input Audio source + HDMI-Rx / MIPI Rx as input Video source; and HDMI-Tx / I2S-Tx as Output Audio Sink + HDMI-Tx / DP as Output Video sink pipeline
Supports 2-4Kp30 multi-stream feature with HDMI-Rx and I2S-Rx as input Audio sources, with HDMI-Rx and MIPI Rx as an input Video sources; and with HDMI-Tx and I2S-Tx as Output Audio Sink + HDMI-Tx as Output Video sink pipeline
Supports 2-1080p60 multi-stream feature with HDMI-Rx and I2S-Rx as input Audio sources with HDMI-Rx and MIPI Rx as an input Video sources; and with HDMI-Tx and I2S-Tx as Output Audio Sink + HDMI-Tx as Output Video sink pipeline
Other features:
This design supports a single-channel Stream-Based SCD (Scene Change Detection) IP for the HDMI input source only. SCD must be enabled for the HDMI input source through configuration.
Supported Resolution:
The table below provides the supported resolutions of this design.
Resolution | GUI | Command Line | |
Single Stream | Single Stream | Multi-stream | |
4Kp60 | X | √ | NA |
4Kp30 | √ | √ | √ (Max 2) |
1080p60 | √ | √ | √ (Max 2) |
√ - Supported
x – Not supported
NA – Not applicable
The below sections describe the HDMI / MIPI Video Capture and HDMI Display with the Audio from HDMI / I2S sources. It is a VCU TRD design supporting HDMI-Rx audio/video + HDMI-Tx with Audio/video and MIPI-Rx video + I2S-Rx audio with HDMI-Tx video + I2S-Tx audio.
For the overview, software tools, system requirements, and design files follow the link below:
The below figure shows the HDMI, MIPI Video Capture along with HDMI, I2S Audio Capture and HDMI Display with Audio Capture and Display design hardware block diagram.
The below figure shows the HDMI, MIPI Video Capture along with HDMI, I2S Audio Capture and HDMI Display with Audio Capture and Display design software block diagram.
1.1 Board Setup
Refer to the below link for Board Setup
I2S Audio signals from the MPSoC PL fabric are connected to the PMOD0 GPIO Header (J55 - right angle female connector )
The PMOD I2S2 Add-on card connects to the J55 connector and its Master/Slave select jumper (JP1) should be placed into the Slave (SLV) position
1.2 Run Flow
The TRD package is released with the source code, Vivado project, PetaLinux BSP, and SD card image that enables the user to run the demonstration. It also includes the binaries necessary to configure and boot the ZCU106 board. Prior to running the steps mentioned in this wiki page, download the TRD package and extract its contents to a directory referred to as TRD_HOME
- which is the home directory.
Refer to the below link to download all TRD contents.
Refer to Section 4.1 : Download the TRD of the
Zynq UltraScale+ MPSoC VCU TRD 2022.1
wiki page to download all TRD contents.
TRD package contents are placed in the following directory structure. The user needs to copy all the files from the $TRD_HOME/images/vcu_audio/
to a FAT32 formatted SD card directory.
rdf0428-zcu106-vcu-trd-2022-1/
├── apu
│ └── vcu_petalinux_bsp
├── images
│ ├── vcu_audio
│ ├── vcu_llp2_hdmi_nv12
│ ├── vcu_llp2_hlg_sdi
│ ├── vcu_llp2_plddr_hdmi
│ ├── vcu_multistream_nv12
│ ├── vcu_plddrv1_hdr10_hdmi
│ ├── vcu_plddrv2_hdr10_hdmi
│ └── vcu_yuv444
├── pl
│ ├── constrs
│ ├── designs
│ ├── prebuild
│ ├── README.md
│ └── srcs
├── README.txt
└── zcu106_vcu_trd_sources_and_licenses.tar.gz
16 directories, 3 files
TRD package content specific to the Multi-stream Audio-Video design is placed in the following directory structure.
rdf0428-zcu106-vcu-trd-2022-1
├── apu
│ └── vcu_petalinux_bsp
│ └── xilinx-vcu-zcu106-v2022.1-final.bsp
├── images
│ ├── vcu_audio
│ │ ├── autostart.sh
│ │ ├── BOOT.BIN
│ │ ├── bootfiles/
│ │ ├── boot.scr
│ │ ├── config/
│ │ ├── Image
│ │ ├── rootfs.cpio.gz.u-boot
│ │ ├── system.dtb
│ │ └── vcu/
├── pl
│ ├── constrs/
│ ├── designs
│ │ ├── zcu106_audio/
│ ├── prebuild
│ │ ├── zcu106_audio/
│ ├── README.md
│ └── srcs
│ ├── hdl/
│ └── ip/
├── README.txt
└── zcu106_vcu_trd_sources_and_licenses.tar.gz
Configuration files (input.cfg)
for various Resolutions are placed in the following directory structure in /media/card
.
config
├── 1-4kp60
│ ├── Display
│ ├── Record
│ ├── Stream-in
│ └── Stream-out
├── 2-1080p60
│ ├── Display
│ ├── Record
│ ├── Stream-in
│ └── Stream-out
├── 2-4kp30
│ ├── Display
│ ├── Record
│ ├── Stream-in
│ └── Stream-out
└── input.cfg
1.2.1 GStreamer Application (vcu_gst_app)
The vcu_gst_app
is a command-line multi-threaded Linux application. The command-line application requires an input configuration file (input.cfg)
to be provided in plain text.
4Kp60 modetest for HDMI: If a HDMI-Tx link-up issue is observed after Linux booting, use the following command for 60fps use-cases:
4Kp30 modetest for HDMI: If a HDMI-Tx link-up issue is observed after Linux booting, use the following command for 30fps use-cases:
Execution of the application is shown below:
Example:
The HDMI-Rx should be configured to 4kp60 mode to run the below example pipelines.
4Kp60 HEVC_HIGH Display Pipeline execution
4Kp60 HEVC_HIGH Record Pipeline execution
4Kp60 HEVC_HIGH Stream-out Pipeline execution
4Kp60 HEVC_HIGH Stream-in Pipeline execution
Latency Measurement: To measure the latency of the pipeline, run the below command. The latency data is huge, so dump it to a file.
Refer to the below link for detailed run flow steps
1.3 Build Flow
Refer to the below link for the Build Flow
2 Other Information
2.1 Known Issues
Block Noise is observed in AVC_MEDIUM and AVC_LOW in 4Kp60 pipelines
The Digilent PMOD card cannot support passive sources like MICROPHONES. Only active sources are to be connected. Here the source is from the Aux cable which is connected in between the source (laptop) and PMOD card.
For PetaLinux related known issues please refer to: PetaLinux 2022.1 - Product Update Release Notes and Known Issues
For VCU related known issues please refer to AR# 76600: LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues and Xilinx Zynq UltraScale+ MPSoC Video Codec Unit.
2.2 Limitations
For playback in DP, video input resolution should match the DP's native resolution. This constraint is in support of the GUI. In the GUI if we allow a video source other than native resolution (by setting full screen overlay) then the graphics layer will disappear. To recover, the GUI user need to kill and relaunch the GUI app. To avoid this condition the TRD only supports video input resolution which is equal to the DP's native resolution.
For PetaLinux related limitations please refer to: PetaLinux 2022.1 - Product Update Release Notes and Known Issues
For VCU related limitations please refer to AR# 76600: LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues, Xilinx Zynq UltraScale+ MPSoC Video Codec Unit and PG252.
2.3 Optimum VCU Encoder parameters for use-cases:
Video streaming:
The video streaming use-case requires a very stable bitrate graph for all pictures
It is good to avoid periodic large Intra pictures during the encoding session
Low-latency rate control (hardware RC) is the preferred control-rate for video streaming, it tries to maintain equal frame sizes for all pictures.
It is good to avoid periodic Intra frames and instead use low-delay-p (IPPPPP…)
VBR is not a preferred mode of streaming
Performance: AVC Encoder settings:
It is preferred to use 8 or more slices for better AVC encoder performance
The AVC standard does not support Tile mode processing which results in the processing of MB rows sequentially for entropy coding
Quality: Low bitrate AVC encoding:
Enable
profile=high
and useqp-mode=auto
for low-bitrate encoding use-casesThe high profile enables 8x8 transform which results in better video quality at low bitrate
2.4 Audio-Video Synchronization
Clocks and synchronization in GStreamer
When playing complex media, each sound and video sample must be played in a specific order at a specific time. For this purpose, GStreamer provides a synchronization mechanism.
GStreamer provides support for the following use cases:
Non-live sources with access faster than the playback rate. This is the case where one is reading media from a file and playing it back in a synchronized fashion. In this case, multiple streams need to be synchronized, such asd audio, video and subtitles.
Capture and synchronized muxing/mixing of media from multiple live sources. This is a typical use case where you record audio and video from a microphone/camera and mux it into a file for storage.
Streaming from (slow) network streams with buffering. This is the typical web streaming case where you access content from a streaming server using HTTP.
Capture from live source and playback with configurable latency. This is used, for example, when capturing from a camera, applying an effect and displaying the result. It is also used when streaming low latency content over a network with UDP.
Simultaneous live capture and playback from prerecorded content. This is used in audio recording cases where you play a previously recorded audio and record new samples, the purpose is to have the new audio perfectly in sync with the previously recorded data.
GStreamer uses a GstClock object, buffer timestamps and a SEGMENT event to synchronize streams in a pipeline as we will see in the next sections.
See the GStreamer documentation for more information:
Clock running-time
In a typical computer, there are many sources that can be used as a time source, for example, the system time, sound cards, CPU performance counters, etc. For this reason, GStreamer has many GstClock implementations available. Note that clock time does not have to start from 0 or any other known value. Some clocks start counting from a particular start date, others from the last reboot, etc.
A GstClock returns the absolute-time according to that clock with gst_clock_get_time (). The absolute-time (or clock time) of a clock is monotonically increasing.
A running-time is the difference between a previous snapshot of the absolute-time called the base-time, and any other absolute-time.
running-time = absolute-time - base-time
A GStreamer GstPipeline object maintains a GstClock object and a base-time when it goes to the PLAYING state. The pipeline gives a handle to the selected GstClock to each element in the pipeline along with selected base-time. The pipeline will select a base-time in such a way that the running-time reflects the total time spent in the PLAYING state. As a result, when the pipeline is PAUSED, the running-time stands still.
Because all objects in the pipeline have the same clock and base-time, they can thus all calculate the running-time according to the pipeline clock.
Buffer running-time
To calculate a buffer running-time, we need a buffer timestamp and the SEGMENT event that preceded the buffer. First we can convert the SEGMENT event into a GstSegment object and then we can use the gst_segment_to_running_time () function to perform the calculation of the buffer running-time.
Synchronization is now a matter of making sure that a buffer with a certain running-time is played when the clock reaches the same running-time. Usually, this task is performed by sink elements. These elements also have to take into account the configured pipeline's latency and add it to the buffer running-time before synchronizing to the pipeline clock.
Non-live sources timestamp buffers with a running-time starting from 0. After a flushing seek, they will produce buffers again from a running-time of 0.
Live sources need to timestamp buffers with a running-time matching the pipeline running-time when the first byte of the buffer was captured.
Buffer stream-time
The buffer stream-time, also known as the position in the stream, is a value between 0 and the total duration of the media and it's calculated from the buffer timestamps and the preceding SEGMENT event.
The stream-time is used for the following:
Report the current position in the stream with the POSITION query
The position used in the seek events and queries
The position used to synchronize controlled values
The stream-time is never used to synchronize streams, this is only done with the running-time.
Time overview
Here is an overview of the various timelines used in GStreamer.
The image below represents the different times in the pipeline when playing a 100ms sample and repeating the part between 50ms and 100ms.
You can see how the running-time of a buffer always increments monotonically along with the clock-time. Buffers are played when their running-time is equal to the clock-time - base-time. The stream-time represents the position in the stream and jumps backwards when repeating.
Clock providers
A clock provider is an element in the pipeline that can provide a GstClock object. The clock object needs to report an absolute-time that is monotonically increasing when the element is in the PLAYING state. It is allowed to pause the clock while the element is PAUSED.
Clock providers exist because they play back media at some rate, and this rate is not necessarily the same as the system clock rate. For example, a sound card might play back at 44.1 kHz, but that does not mean that after exactly 1 second according to the system clock, the sound card has played back 44100 samples. This is only true by approximation. In fact, the audio device has an internal clock based on the number of samples played that we can expose.
If an element with an internal clock needs to synchronize, it needs to estimate when a time according to the pipeline clock will take place according to the internal clock. To estimate this, it needs to slave its clock to the pipeline clock.
If the pipeline clock is exactly the internal clock of an element, the element can skip the slaving step and directly use the pipeline clock to schedule playback. This can be both faster and more accurate. Therefore, generally, elements with an internal clock like audio input or output devices will be a clock provider for the pipeline.
When the pipeline goes to the PLAYING state, it will go over all elements in