Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following figure shows the data flow for the GStreamer pipeline of stream-in use casecases.

...

Block Diagram of Stream-in Pipeline

...

  1. DPU is initialized and DPU kernel will be loaded using libn2cube APIs - int dpuOpen(), DPUKernel *dpuLoadKernel(const char *networkName)

  2. DPU GStreamer plugin receives the data frame from HDMI-Rx through a v4l2src plugin

  3. Create the DPU task - int dpuCreateTask(DPUKernel *kernel, int mode)

  4. Scale the input frame to 640x480 resolution using Xilinx Scaler IP

  5. Convert the input frame data format from NV12 to BGR format using Xilinx Color Space Converter(CSC) soft IP

  6. Prepare the OpenCV image using BGR data

  7. Pass the intermediate OpenCV image to the DPU - int dpuSetInputImage2(DPUTask *task, const char *nodeName, const cv::Mat &image, int idx=0)

  8. Run the DPU task - int dpuRunTask(DPUTask *task)

  9. Extract the ROI(face) co-ordinates from the DPU output

  10. Map the detected face co-ordinates to the original input frame resolution

  11. Fill the ROI metadata buffer using extracted ROI (face) co-ordinates

  12. Pass the ROI meta data metadata buffer and input NV12 frame data buffer to the Xilinx VCU encoder

  13. Destroy the DPU task and kernel - int dpuDestroyTask(DPUTask *task), int dpuDestroyKernel(DPUKernel*kernel)

  14. Close the DPU - int dpuClose()

...

DPU is a programmable engine dedicated for to the convolutional neural network. The unit contains a register configure the module, data controller module, and convolution computing module. There is a specialized instruction set for DPU, which enables DPU to work efficiently for many convolutional neural networks. The deployed convolutional neural network in DPU includes VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, etc.

...

The libn2cube and, libdputils and libhineon from DNNDK are used to run face detection use case on DPU device. The gstreamer GStreamer plugin uses the libn2cube library APIs to load the DPU kernel code and data into the DPU dedicated memory, create and run the DPU tasks.

...

Deep Neural Network Development Kit (DNNDK) is a full-stack deep learning SDK for the Deep-learning Processor Unit(DPU). It provides a unified solution for deep neural network inference applications by providing pruning, quantization, compilation, optimization, and run-time support.

Below The below figure shows the data flow starting from training Machine Learning(ML) model to perform inference using DPU.

...

Below are is the sequence of processes executed to run inference on DPU.

  1. Machine Learning(ML) model is trained using Caffe or TensorFlow ML framework on input training data set

  2. Compression will be performed on the trained model to get high throughput by reducing the memory bandwidth requirement

    1. Deep Compression Tool (DECENT) provided by DNNDK is used to perform the compression process

    2. The trained model is analyzed and pruning will be performed to remove the ineffective or very less effective nodes from the model

    3. Quantization will be performed to reduce the computing complexity without losing prediction accuracy by converting 32-bit floating-point weights and activation values to an 8-bit integer

  3. Deep Neural Network Compiler (DNNC) is used to perform model compression which maps the model to the DPU instructions

    1. The front-end parser is responsible for parsing the Caffe/TensorFlow model and generates an intermediate representat ion representation (IR) of the input model

    2. The optimizer handles optimizations based on the IR

    3. The code generator maps the optimized IR to DPU instructions

    4. The Deep Neural Network Assembler (DNNAS) is responsible for assembling DPU instructions into ELF binary code

  4. The DPU loader handles the transfer of DPU kernels from the hybrid ELF executable into memory and dynamically relocates the memory of DPU code

  5. The libn2cube library provides APIs to load the DPU kernel code and data into the DPU dedicated memory, create and run the DPU tasks

  6. DPU drivers are provided by the DNNDK to interact with DPU hardware

  7. DSight is the DNNDK performance profiling tool. It is a visual performance analysis tool for neural network model profiling

  8. DExplorer tool provides DPU information like DPU running mode configuration, DPU status checking, DPU architecture version, DNNDK version, working frequency, DPU core numbers, etc.

...

            Note: Hardware configuration will create a pop-up window with settings; Save & Exit

...

The vcu_gst_app and supporting libraries will be built as a "vcu-gst-app" recipe inside petalinux-project. Refer "project-spec/meta-user/recipes-apps/vcu-gst-app" directory inside petalinux-project for vcu-gst-app recipe. Source of vcu_apm_lib, vcu_video_lib, vcu_gst_lib and vcu_gst_app is provided as zip inside "project-spec/meta-user/recipes-apps/vcu-gst-app/files/" directory. vcu_gst_app will be built as part of petalinux project and the executable is placed in /usr/bin location of rootfs. User Users can update the zip file if any source code modifications need to be and run following command to build vcu-gst-app recipe.

Code Block
languagebash
% petalinux-build -c vcu-gst-app

Note:

  • Modify the value of the Enable Roi parameter to FALSE in the input.cfg file to run the pipeline without ROI.

  • Able to get 23 FPS with 4kp30 and 30 FPS with 1080p30.

  • 60 FPS pipelines are supported, but FPS drops are observed.

  • Run time enablement/disablement of ROI is not supported.

...

L2 Cache:
Enable or Disable L2Cache buffer in the encoding process.
Options: True, False
Latency Mode:
Encoder latency mode.
Options: Normal, sub_frame

...

Code Block
% xmedia-ctl -p -d /dev/media0

Make sure the HDMI-Rx media pipeline is configured for 4kp30 resolution and source/sink have has the same color format. Run below xmedia-ctl commands to set the resolution and format of the HDMI scaler node.

When HDMI Input Source is NVIDIA SHIELD

...

  • Run vcu_gst_app for current HDMI resolution (1080p30) by executing the following command.

    Code Block
    languagebash
    $ vcu_gst_app /media/card/config/input.cfg
  • Change Resolution of HDMI Input Source from 1080p30 to 4kp30 by following the below steps.

    • Set the HDMI source resolution to 4kp30 (Homepage → settings → display & sound → Advanced settings → HDMI settings → HDMI display modes → change to 4kp30)

    • Save the configuration to take place the change

  • Verify the desired HDMI Input Source Resolution (4kp30) by following the above-mentioned steps

...

Run the following gst-launch-1.0 command to display a pass-through pipeline.

Code Block
$ gst-launch-1.0 v4l2src device=/dev/video0 io-mode=4 ! video/x-raw, width=3840, height=2160, format=NV12, framerate=30/1 ! queue ! kmssink bus-id="a0060000.v_mix"

Run the following gst-launch-1.0 command to run the ROI use case in the RAW pipeline.

...