Video Framebuffer Read

Table of Contents

Overview
Location
Supported IP Features
Unsupported IP Features
Known Issues
Kernel Configuration
Device Tree Configuration
Interfacing with the Video Framebuffer Driver from DMA Clients
DMA Interleaved Template Requirements
struct dma_interleaved_template:
struct data_chunk:
Driver Operation
Test Approach

Overview

Video Framebuffer Read IP cores are designed for video applications requiring frame buffers and is designed for high-bandwidth access between the AXI4-Stream video interface and the AXI4-interface.

Location

The driver is currently located in a special branch of the standard Xilinx Linux kernel: https://github.com/Xilinx/linux-xlnx/tree/2017.3_video_ea

Supported IP Features

The following is a list of IP constraints for which there is support in the driver and for which verification within the context of the listed reference designs has been performed (see below):
1. Streaming Video Format Support: RGB, YUV 4:2:2, YUV 4:4:4, YUV 4:2:0
2. Memory Video Format Support: RGB8, BGRX8, RGBX8, YUYV8, YUVX8, RGBX10, YUVX10, Y_UV8, Y_UV8_420, UYVY8, YUV8, Y_UV10, Y_UV10_420, Y8, Y10
3. Programmable memory video format
4. Support for 8-bit or 10-bit per color component on stream or memory interface
5. Resolutions up to 3840x2160

Unsupported IP Features

The following list of IP constraints either has no driver support or has not yet been verified to work in any existing technical reference design:
1. Resolutions up to 8192x4320

Known Issues

When DMA operations are initiated by a client, the hardware is placed into "autorestart" mode. When the last buffer has been returned to the client as "completed", if the client does not supply a new read buffer location or fails to halt the driver, then the last buffer location written to will continue to be utilized by the driver. In effect, the driver will "spin" on the last location programmed.

Kernel Configuration

The dirver must be enabled in the kernel by selecting option CONFIG_XILINX_FRMBUF



Device Tree Configuration

Comprehensive documentation regarding device tree configuration may be found: <linux_root>/Documentation/devicetree/bindings/dma/xilinx/xilinx_frmbuf.txt

Below is a device tree example for a Framebuffer Read instance configured with 32-bit wide DMA descriptors and support for RGB8 as well as RGBX8 memory formats:
v_frmbuf_rd_0: v_frmbuf_rd@80000000 {
        #dma-cells = <1>;
        compatible = "xlnx,axi-frmbuf-rd-v2";
        interrupt-parent = <&&gic>;
        interrupts = <0 92 4>;
        reset-gpios = <&&gpio 80 1>;
        reg = <0x0 0x80000000 0x0 0x10000>;
        xlnx,dma-addr-width = <32>;
        xlnx,vid-formats = "bgr888","xbgr8888";
};

Interfacing with the Video Framebuffer Driver from DMA Clients

The Linux driver for Framebuffer Read implements the Linux DMA Engine interface semantics for a single channel DMA controller. Because the IP is video format aware, it has capabilities that are not fully served by the dma engine interface. As such, the Video Framebuffer driver exports an API interface that must be used by DMA clients in addition to the Linux DMA Engine interface for proper programming. (see <linux_root>/include/linux/dma/xilinx_frmbuf.h).

The general steps for preparing DMA to read to a specific memory buffer:
1. Using the Video Framebuffer API, configure the DMA device with the expected memory format for read
2. Prepare an interleaved template describing the buffer location (note: see section DMA Interleaved Template Requirements below for more details)
3. Pass the interleaved template to the DMA device using the Linux DMA Engine interface
4. With the DMA descriptor which is returned from step 3, add a callback and then submit to the DMA device via the DMA Engine interface
5. Start the DMA read operation
6. Terminate DMA read operation when frame processing deemed complete by client
/* Abstract DRM Client Code Example */
 
struct dma_chan *frmbuf_dma = to_frmbuf_dma_chan(xdev);
struct dma_interleaved_template dma_tmplt;
dma_addr_t addr = xvmixer_drm_fb_get_gem_obj(drm_framebuff, 0);
u32 flags = DMA_PREP_INTERRUPT | DMA_CTRL_ACK;
 
/* Step 1 - Configure the dma channel to read out packed RGB */
xilinx_xdma_drm_config(frmbuf_dma, DRM_FORMAT_BGR888);
 
/* Step 2 - Describe the buffer attributes for a 1080p frame */
dma_tmplt.dir = DMA_MEM_TO_DEV;
dma_tmplt.src_sgl = true;
dma_tmplt.dst_sgl = false;
dma_tmplt.src_start = addr;
dma_tmplt.frame_size = 1; /* single plane pixel format */
dma_tmplt.numf = 1080; /* 1920x1080 frame */
 
dma_tmplt.sgl[0].size = 5760; /* 3 bytes/pixel x 1920 pixels */
dma_tmplt.sgl[0].icg = 0;
 
 
/* Step 3 - Submit the buffer description to the dma channel */
desc = dmaengine_prep_interleaved_dma(frmbuf_dma, &&dma_tmplt, flags);
desc->callback = dma_complete;
desc->callback_param = buf;
 
/* Step 4 - Submit the returned and updated descriptor to the dma channel */
dmaengine_submit(desc);
 
/* Step 5 - Start dma to memory operation */
dma_async_issue_pending(frmbuf_dma);
 
/* Step 6 - Halt DMA when required frame processing completed */
dmaengine_terminate_all(frmbuf_dma);

DMA Interleaved Template Requirements

The Video Framebuffer IP supports two dma address pointers for semi-planar formats: one for luma and one for chroma. As such, data for the two planes need to be strictly contiguous which permits for alignment of plane data within a larger buffer. However, all frame data (luma and chroma) must be contained within a contiguous frame buffer and luma plane data should be arranged to come before chroma data. Note that this is not a limitation imposed by the IP but by the driver at this moment. When preparing a struct dma_interleaved_template instance to describe a semi-planar format, the following members must be filled out as follows:

linux/dmaengine.h:

struct dma_interleaved_template:

src_start = <physical address from which to start reading frame data (any offsets should be added to this value)>
src_sgl = true
dst_sgl = false
numf = <height of frame in pixels; height of luma frame for semi-planar formats>
frame_size = < 1 or 2 depending on whether this is describing a packed or semi-planar format>
sgl = <see struct data_chunk below>

struct data_chunk:

sgl[0].size = <number of bytes devoted to image data for a row>
sgl[0].icg = < number of non-data bytes within a row of image data; padding>
sgl[0].src_sgl = <the offset in bytes between the end of luma frame data to the start of chroma plane data; only needed for semi-planar formats>

Below is a code example for semi-planar YUV 422 (i.e. NV16)
/* Step 1 - Configure the dma channel to read out semi-planar YUV 422 */
xilinx_xdma_drm_config(frmbuf_dma, DRM_FORMAT_NV16);
 
/* Step 2 - Describe the buffer attributes for a 1080p frame */
dma_tmplt.dir = DMA_MEM_TO_DEV;
dma_tmplt.src_sgl = true;
dma_tmplt.dst_sgl = false;
dma_tmplt.src_start = luma_addr;
dma_tmplt.frame_size = 2; /* two plane pixel format */
dma_tmplt.numf = 1080; /* height of luma frame */
 
dma_tmplt.sgl[0].size = 1920; /* 1 byte/pixel x 1920 pixels for Y plane */
dma_tmplt.sgl[0].icg = 0;
 
frame_height = dma_tmplt.numf;
stride = dma_tmplt.sgl[0].size + dma_tmplt.sgl[0].icg;
 
dma_tmplt.sql[0].src_icg = chroma_addr - luma_addr - (frame_height * stride);

Driver Operation

The Framebuffer driver manages buffer descriptors in software keeping them in one of four possible states in the following order:
1. pending
2. staged
3. active
4. done

When a DMA client calls dma_commit(), the buffer descriptor is placed in the driver’s “pending” queue. Multiple buffers can be queued in this manner by the DMA client before proceeding to the next step (see step 4 of Interfacing with the Video Framebuffer Driver from DMA Clients). When dma_async_issue_pending() is called (step 5), the driver begins processing all queued buffers on the “pending” list. A buffer is plucked from the pending list and then stored as “staged”. At this moment, driver programs the registers with data provided within the “staged” buffer descriptor. During normal processing (i.e. all frames except the first frame*), these values will not become active until the currently processed frame completes. As such, there is a one-frame delay between programming and the actual writing data to memory. Hence the term “staged” to describe this part of the buffer lifecycle. When the currently active frame completed, the buffer descriptor is classified as “active” in the driver. At this point, a new descriptor is plucked from the pending list and this new buffer is marked as “staged” with its values programmed into the IP registers as described earlier. The buffer marked “active” represents the data currently being read from memory. Other than being held in the “active” state, no other action is taken with the buffer. When the active frame completes, it is moved to the “done” list. The driver utilizes a tasklet which is called at the end of the frame interrupt handler. The tasklet will process any buffer descriptors on the done list by removing them from the list and calling any callback the client has linked to the descriptor.

This completes the lifecycle of a buffer descriptor. As can be seen, with four possible states, it is best to allocate at least four buffers to maintain consistent frame processing. Fewer buffers will result in gaps within the pipeline and result in frame data within a given buffer being read one or more times (depending on how few buffers are queued and the number of resulting gaps in the driver’s buffer pipeline).


Test Approach

Testing the Framebuffer Read driver is best done when incorporated into a larger design designed for display output. It is best to reference the test procedure for the Video Mixer.

In particular, run test #6 (change output resolution).

Additionally, run modetest to change the output resolution with the -v argument which will result in page flipping on the primary plane
root@mixer_proj:~# modetest -M xilinx_drm_mixer -s 37:640x480@BG24 -v
setting mode 640x480-75Hz@BG24 on connectors 37, crtc 35
select timed out or error (ret 0)
freq: 7.20Hz
freq: 15.00Hz
freq: 15.00Hz
freq: 15.00Hz
freq: 15.00Hz
freq: 15.00Hz
 
The output frequency reported should be approximately 1/4 that of the current refresh rate. This is because modetest only creates a single framebuffer and the Video Framebuffer driver requires four (4) buffers for optimal operation.