Xilinx Zynq UltraScale+ MPSoC Video Codec Unit
Table of Contents
- 1 Overview
- 2 Supported Features
- 3 API information and Example pipelines
- 4 Source code repos
- 5 CMA Size setting instructions for Yocto and Petalinux users:
- 6 Device Tree Binding
- 7 Known Issues
- 8 2022.2 Release
- 8.1 New Feature Support:
- 8.2 Bug Fixes:
- 8.3 Known Issues:
- 9 2022.1 Release
- 9.1 New Feature Support:
- 9.2 Bug Fixes:
- 9.3 Known Issues:
- 10 2021.2 Release
- 10.1 New Feature Support:
- 10.2 Bug Fixes:
- 10.3 Known Issues:
- 11 2021.1 Release
- 11.1 New Feature Support:
- 11.2 Bug Fixes:
- 11.3 Known Issues:
- 12 2020.2 Release
- 13 2020.1 Release
- 13.1 New Feature Support:
- 14 2019.2 Release:
- 14.1 New Feature Support
- 14.2 Bug Fixes
- 15 2019.1 Release:
- 16 2018.3 Release:
- 17 2018.2 Release:
Overview
Xilinx Zynq UltraScale+ MPSoC Video Codec Unit (VCU) provides multi-standard video encoding and decoding capabilities, including: High Efficiency Video Coding (HEVC), i.e., H.265; and Advanced Video Coding (AVC), i.e., H.264 standards. VCU software stack consists of custom kernel module and custom user space library known as Control Software (CtrlSW). The OpenMAX IL (OMX) layer is integrated on top of CtrlSW, and Gstreamer frame work is used to integrate OMX-IL component along with other multimedia elements.
OpenMAX™ is a cross-platform API that provides comprehensive streaming media codec and application portability by enabling accelerated multimedia components. Gstreamer is the cross-platform / open source multimedia framework, and provides the infrastructure to integrate multiple multimedia components and create pipelines.
Users can develop their application at all 3 levels, i.e. CtrlSW, OMX-IL, and Gstreamer.
VCU Software Stack |
Supported Features
Multi-standard encoding/decoding support, including:
Advanced Video Coding (AVC) H.264
High Efficiency Video Coding (HEVC) H.265
HEVC Main profiles, level upto 5.1, High Tier, 4kp60 Encoding/Decoding
AVC BP/MP/HP, level upto 5.2, 4kp60 Encoding/Decoding
Supports multi-stream encoding/decoding up to 8(1080p30) streams
420/422 chroma-format support
8/10 bit encoding/decoding
Slice level Encoding/Decoding
Flexible Rate Control: CBR, VBR, Constant QP, Low-latency-RC(Optimized for streaming applications)
On-the-fly change of multiple encoding parameters
Target bitrate
Gop length
Number of B frames
Insertion of IDR picture
Region of Interest based Encoding (ROI)
API information and Example pipelines
Refer to VCU product guide documentation for more details on example pipelines, API information, Encoder/Decoder application flow chart, Encoder parameters...etc.
H.264/H.265 Video Codec Unit v1.2
Source code repos
VCU:
Gstreamer:
CMA Size setting instructions for Yocto and Petalinux users:
Using kernel menuconfig option
Run kernel menuconfig using below cmds
peta-linux : "petalinux-config -c kernel"
yocto: " bitbake -f virtual/kernel -c menuconfig"
Go to Device Drivers→Generic Driver options→ DMA Contiguous Memory Allocator
Set default contiguous memory size as per your preferred value depending upon your usecase
Save the configuration, exit the menuconfig and build images
Using bootargs at uboot prompt
During boot-up stop at uboot prompt by continuously pressing any key during bootup and set custom CMA size using below cmd
"setenv bootargs $bootargs cma=1500m"
boot
Using custom uEnv.txt for Yocto
There is a custom uEnv.txt file being generated by yocto build at work/build/tmp/deploy/images/zcu106-zynqmp which can be edited to set custom cma size.
Modify uEnv.txt bootargs option to set requried cma size as below
"bootargs=earlycon clk_ignore_unused root=/dev/mmcblk0p2 rw rootwait cma=1500m"
Copy uEnv.txt having CMA boot args setting in boot partition (i.e the first partition) of sdcard and boot the board.
Device Tree Binding
The device tree node will be automatically generated, if the core is configured in the HW design, using the Device Tree BSP.
Steps to generate device-tree is documented here,
http://www.wiki.xilinx.com/Build+Device+Tree+Blob
And a sample binding is shown below and the description of DT property is documented here - Documentation/devicetree/bindings/clock/xlnx,vcu.txt
Known Issues
AR66763 - LogiCORE H.264/H.265 Video Codec Unit (VCU) - Release Notes and Known Issues for the Vivado 2017.3 tool and later versions
2022.2 Release
New Feature Support:
Dynamic Insertion of IDR frame in Low latency
Bug Fixes:
Fix race condition when destroying the channel
Fixed wrong computation of iMaxSlices at decoder side, due to which several channel variables are computed wrong and decoder was crashing
Handled SDI input cable disconnect and reconnect scenarios based on video locking/unlocking event
Fixed NULL pointer dereference in ALSA framework while setting DAI data for xilinx DP audio based use cases
Known Issues:
S.No | Issue Description | Workaround | comments/AR link |
|---|---|---|---|
1 | Audio lost/distortion observed with only audio usecase and audio+video in LLP2 serial pipelines during long-run | No workaround available | NA |
2 | Memory leak observed with decoder app | No workaround available | NA |
3 | Fps drops observed with below usecase:
| No workaround available. | NA |
4 | VCU decoder will hang and could not recover itself if the input stream exceeds its capability for a few times | No workaround available. | NA |
5 | Broken bitstream observed with h264 parser on decoding specific streams | No workaround available. | NA |
2022.1 Release
New Feature Support:
DMA fd support for VCU Encoder output.
YUV444 support for encoder and decoder using Xilinx custom solution.
Bug Fixes:
Fix for memory leak issue that was observed with gstreamer based low-latency VCU pipelines.
Release reset for DP before accessing DP registers. Accessing the DP register without releasing the reset makes system hang.
Known Issues:
S.No | Issue Description | Workaround | comments/AR link |
|---|---|---|---|
1 | Audio lost/distortion observed with only audio usecase and audio+video in LLP2 serial pipelines during long-run | No workaround available | NA |
2 | SIGINT is not working properly while killing low-latency stream-out pipelines | Send two consecutive ctrl+c signals kills the pipeline | NA |
3 | Fps drops observed with below usecase:
| No workaround available. | NA |
4 | ctrlsw encoder application is not closing properly after running rigorously in a loop. (1000+ iterations) | Sending another SIGINT_KILL or Ctrl+C closes the application | NA |
2021.2 Release
New Feature Support:
4 byte start code for all slices.
Bug Fixes:
Fixed: 4K HEVC Encoder with skip-frames enabled produces invalid bitstream syntax
Fixed: Scheduler changes for skip-frame issue
Fixed: Fixed MCU trace error reported when receiving RTSP stream
Fixed: Memory leak in gst-plugins-bad
Fixed: Decoder latency is high when receving rtsp multicast stream
Fixed: DPDMA example shows black screen with 1080p display monitors
Fixed: Handling case where mixer is not able to scale to application requested dimensions
Fixed: Allow media pipeline enable with single dma start
Known Issues:
S.No | Issue Description | Workaround | comments/AR link |
|---|---|---|---|
1 | SIGINT is not working properly while killing low-latency stream-out pipelines | Send two consecutive ctrl+c signals kills the pipeline | NA |
2 | Fps drops observed with LLP2 2-4kp30 XV20 AVC serial use-case. | No workaround available. | NA |
3 | ctrlsw encoder application is not closing properly after running rigorously in a loop. (1000+ iterations) | Sending another SIGINT_KILL or Ctrl+C closes the application | NA |
4 | Audio lost/distortion observed with only audio usecase and audio+video in LLP2 serial pipelines during long-run | No workaround available | NA |
5 | Interlaced file playback shows distortion on the Display in the long run | No workaround available | NA |
6 | PS_DP is not sending correct channel_status message for audio | No workaround available | NA |
2021.1 Release
New Feature Support:
Gstreamer version upgraded to 1.16.3
GRAY8/GRAY10 format support is added for VCU encoder/decoder at Gstreamer
Dynamic IDR insertion support is added for Pyramidal GOP
Added external CRTC (ex: PL video-mixer) support to PS-DP subsystem.
Uniform slice_type parameter support is added for VCU encoder
Vertical alignment/crop on input buffers is supported for VCU encoder
used in 486i to 480i conversion.
Bug Fixes:
Fixed V4l2 mem2mem driver reload/load issue.
Fixed gstreamer kmssink to display full screen mode for 4k wider monitors.
Fixed dma driver to capture and encode resolutions which are not aligned to 32, ex: 1400x1050.
Fixed overwriting SEI messages on BP and PT SEI messages.
Fixed kmssink to display planar 420 I420 video using PS_DP.
LLP2:
Fixed issues related to 720p resolution capture + encode use-case
Fixed Display Ripple effects and start/stop live sources on the fly.
Known Issues:
S.No | Issue Description | Workaround | comments/AR link |
|---|---|---|---|
1 | SIGINT is not working properly while killing low-latency stream-out pipelines | Send two consecutive ctrl+c signals kills the pipeline | NA |
2 | VCU encoder MCU trace error is reported when receiving erroneous RTSP stream | No workaround available, | Need to handle error concealment for such errors |
3 | Interlaced file playback shows distortion on the Display in the long run | No workaround available | NA |
4 | 4x1080p60 decode (PL_DDR) → display use-case shows occasional frame drops when B-frames are present in input video | Use b-frames=0 | NA |
2020.2 Release
New Feature Support:
HDR10 is supported for capture, VCU encode/decode and display at gstreamer level
Added max-consecutive-skip parameter to VCU encoder
Allows users to specify a maximum number of consecutive skipped frames
Interlaced video support added for SCD MM
NTSC 4:2:0 interlace support is added to VCU encoder/decoder
Interlaced video/audio support is validated
Added example for passing DMA buffers between the VCU and appsrc/appsink
Bug Fixes:
Fixed XAVC compliance errors
Fixed incorrect POC on skipped interlaced frames
Fixed LLP2 encoder latency issues
Fixed frame drop issue with AVC, HIGH Profile 4kp60 with num-slices=16
Fixed coverity check errors in V4L2 and DRM
Fixed gstreamer parser bugs which caused crashes with third party video files
Fixed issue with improper handling of prefix NALs
Known issues:
S.No | Issue description | Work around | Comments/AR link |
|---|---|---|---|
1 | 4x1080p60 decode (PL_DDR) → display use-case shows occasional frame drops when B-frames are present in input video | Use b-frames=0 | NA |
2 | LLP2: 1280x720 XV20 (422 10 bit) doesn't work | Fix available, AR will be published | NA |
3 | Memory leak when using v4l2src | Fix available, AR will be published | NA |
4 | LLP2: Switching Live source resolutions on the fly or removing and re-inserting live source cable when VCU encoder LLP2 pipeline is running, causes hang for next immediate pipeline launch. | remove and re-insert vcu kernel modules when it happens | NA |
5 | LLP2: Ripple effect on display | Fix available, AR will be published | NA |
2020.1 Release
New Feature Support:
Gstreamer version upgraded to 1.16.1
Added support for HDR10 metadata insertion and extraction at control software level
Enabled buffer metadata to indicate encoder frame-skip frame to application at control software level
Users can determine whether or not a frame is skipped by calling "AL_Buffer_GetMetaData()" and extracting the bSkipped flag
Added custom ROI delta-qp (roi-by-value) support to encoder.
LLP2 Video + Audio pipeline support is verified.
Bug Fixes:
Fixed gstreamer parser bugs which caused crashes with third party video files
Fixed picture timing SEI parsing
Fixed crash caused by incomplete AUs
Fixed interlaced transport stream parsing
Fixed gradual horizontal/vertical line sweep when using GDR mode
Provided custom gstreamer application (zynqmp_relative_qp_insertion) to improve visual quality by modifying delta-qp, alpha, and beta offsets
Fixed coverity check errors in control software
Fixed target bitrate in interlaced mode
XAVC SEI message buffer size is increased to avoid video hang/corruption for AVC encoder.
Known issues:
S.No | Issue description | Work around | Comments/AR link |
|---|---|---|---|
1 | 4x1080p60 decode (PL_DDR) → display use-case shows occasional frame drops when B-frames are present in input video | No workaround | NA |
2 | LLP2: Higher encoder latency is observed in 4x 1080p60 use-case. Extra 3 to 4 msec is observed, this may lead to occasional frame drops or lower fps problem. | Use extra processing-deadline for gst-pipeline sink, it may reduce frame drops to some extent | NA |
3 | Frame drops observed in serial/streaming use-case for AVC, HIGH Profile 4kp60 with num-slices=16 use-case. | use num-slices=8 for slice encoding | NA |
4 | Switching Live source resolutions on the fly or removing and re-inserting live source cable when VCU encoder LLP2 pipeline is running, causes hang for next immediate pipeline launch. | remove and re-insert vcu kernel modules when it happens | NA |
2019.2 Release:
New Feature Support
ZDMA copy is supported for VCU Encoder output buffer reconstruction, helps in reducing CPU load for encode use-cases.
Added IntraMB forcing at block level (Encoder) through External QP table and ROI Map
Added 15 Bframes support in Pyramidal GOP structure
Dynamic resolution change support without port reconfiguration is added gstreamer level, previous release had support at vcu control-sw level.
Added sample gstreamer test application to show case 32x stream video transcoding use-case.
Added support for generating separate codec-config data for VCU encoder, useful in android framework
Added max-picture-sizes control based on Frame type <I, P, B>
Added support for blow xAVC profiles
AL_PROFILE_XAVC_HIGH10_INTRA_CBG
AL_PROFILE_XAVC_HIGH10_INTRA_VBR
AL_PROFILE_XAVC_HIGH_422_INTRA_CBG
AL_PROFILE_XAVC_HIGH_422_INTRA_VBR
AL_PROFILE_XAVC_LONG_GOP_MAIN_MP4
AL_PROFILE_XAVC_LONG_GOP_HIGH_MP4
AL_PROFILE_XAVC_LONG_GOP_HIGH_MXF
AL_PROFILE_XAVC_LONG_GOP_HIGH_422_MXF
Added support external rate control plugin at MCU level
Enables users to develop their own rate-control and plugin into vcu firmware.
Added support for Loading of external QP map at gstremaer level
Added support for transfering decoder pts/dts data using one-to-one buffer mapping instead of FIFO at application level.