Linux Multichannel DMA from User Space

This article provides instructions for creating a first user space application targeting the Multichannel DMA IP.

Table of Contents

Introduction

FPGAs have been providing DMA engines in the Programmable Logic for many years. To add to the existing DMA offering, Xilinx added a Multichannel DMA IP into the Xilinx IP Catalog to provide additional channels. Additional DMA channels provide a hardware segregation of data that ultimately provides more efficient management of data streams. This leads to an overall savings of Programmable Logic, while maintaining data movement throughput.

This prototype is the multichannel update to the previous version of the Linux DMA From User Space 2.0 article. Many of the major concepts are the same from the previous design, but is now enhanced to enable multiple channels with the Xilinx MCDMA IP. The same ZCU102 board is used with no additional hardware.

Background Knowledge

The AXI Multichannel Direct Memory Access IP provides an AXI Stream RX (Slave) and TX(Master) interface including the TDEST sideband signal. The TDEST signal is the signal in hardware to indicate the destination channel. Each descriptor generated in the kernel will use the TDEST field in the descriptor to distinguish the channel.

The xilinx_dma Linux driver is able to interface to the the AXI MCDMA IP. This driver provides support for the Linux DMAEngine framework. This article will also make use of the Proxy driver introduced from the Linux DMA From User Space article to allow a user space application to interface as close as possible to the hardware. The proxy driver has updates to include multichannel support.

Why Multichannel?

As FPGAs increase in size, the data sources and destinations tend to grow as well. While it is possible to add a new DMA for each data source and destination, it is often sufficient (from a bandwidth perspective) to only have one DMA and to channelize the data. This saves resources in the FPGA, while still being able to identify the data from software and hardware. The resource savings improves as data widths increase as well; which is also a trend.

Another benefit from additional channels is the ability to prioritize the data coming to the DMA. The prioritization of channels is highlighted in the AXI MCDMA PG288.

System Requirements and Assumptions

The prototype is slightly different from the AXI DMA designs by modifying the following:

  • Removing the loopback connection between TX and RX.

  • The RX path has multiple Data generators intended to provide a simple identifiable data pattern to the user based on the channel chosen. If the DMA channel is not set up to receive data, the RX path will drop the data. The data generators head into an AXI Stream Switch IP before heading directly into the MCDMA IP.

  • The TX path provides a terminated path that can be viewed with an ILA, if desired. The tready signal is tied high. Filtering on TDEST would show the data stream on each channel.

  • The RX interface of the MCDMA never throttles. This means when a channel provides data and the channel is not enabled, the data will be dropped. This is done intentionally by the IP so a channel will not throttle the rest of the channels. However, care must be taken if all RX data needs to be captured.

This article also doesn’t walk a user through the entire flow and relies on PetaLinux experience on how to add an application and/or module into a running Linux system. Modifying the device tree is also required.

Hardware

The Vivado design is built entirely within IPI with IPs from the Xilinx IP Catalog. Below is a screenshot to show the connections:

 

 

Data Generators

The 4 datagen blocks are meant to mimic stream sources. They are continuously streaming data and are throttled back by the switch. Below is an image of the stream circuits showing how they generate streaming data.

The binary counters create the identifiable data (incremented or decremented data), while the subset converter creates the tlast pulse to the stream.

All of the data generators are routed to an AXI4 Stream Switch IP to manage the routing of the streams. Below are the options selected. It is important for the switch to arbitrate on TLAST since the MCDMA does not support interleaving of data streams.

Another consideration with using the AXI4-Stream Switch is “Arbitrate on number LOW TVALID cycles” option. In this prototype, the data is constantly streaming in so TVALID will not go low. However, if a custom hardware design does anticipate TVALID to go low, for many cycles, it may need to be buffered prior to entering the switch to prevent arbitration in the middle of a packet.

 

Interrupt Lines

Each channel from the MCDMA has one interrupt line and they are routed to the pl_ps_irq lines within the processing system.

MCDMA Options

Below is a screenshot of the MCDMA options:

The advanced tab was left as defaults.

System ILA

The system ILA IP was placed in the design to be able to observe the hardware as transfers are made. The Vivado Hardware Manager must be used to interface with the ILA IP. Setting triggers on the data stream enables a user to see what the data looks like in hardware while the software is running.

Source Files

The Vivado IPI project was built with Vivado catalog IPs and can be recreated with the attached Tcl script in Vivado 2021.1:

Software

The same git repo is used in the wiki article as the Linux DMA from User Space 2.0 article. The default files in the repo are meant for a 1 channel design, however, they’re intended to be modified to allow for more channels.

Device Tree

In a PetaLinux project, the system-user.dtsi is modified as shown below. These modifications expect the user to use 3 TX and 3 TX channels. The hardware is capable of supporting 4 channels in the implementation above, however only 3 are used for demonstration purposes.

/include/ "system-conf.dtsi" / { dma_proxy { compatible ="xlnx,dma_proxy"; dmas = <&axi_mcdma_0 0 &axi_mcdma_0 16 &axi_mcdma_0 1 &axi_mcdma_0 17 &axi_mcdma_0 2 &axi_mcdma_0 19> ; dma-names = "dma_proxy_tx_0", "dma_proxy_rx_0", "dma_proxy_tx_1", "dma_proxy_rx_1", "dma_proxy_tx_2", "dma_proxy_rx_2"; } ; }; &axi_mcdma_0 { #dma-cells = <1>; };

The #dma-cells update to the mcdma node is needed due to a tool bug with the device tree generator in 2021.1. This should be fixed in a later release.

Proxy Driver

The proxy driver can be used as-is from the repo here:

https://github.com/Xilinx-Wiki-Projects/software-prototypes.git

Proxy Application

The proxy example application is located in the same git repo. However, it must be modified to utilize the additional channels. In the example, we’ll keep the application consistent with the number of channels described in the device tree. Below we update the names of the DMA Channels to match the names in the device tree.

#define TX_CHANNEL_COUNT 3 #define RX_CHANNEL_COUNT 3 const char *tx_channel_names[] = { "dma_proxy_tx_0", "dma_proxy_tx_1", "dma_proxy_tx_2" }; const char *rx_channel_names[] = { "dma_proxy_rx_0", "dma_proxy_rx_1", "dma_proxy_rx_2" };

Running the Application

Inserting the Kernel Module

Before running the application, make sure the dma-proxy is included in the kernel. Check with lsmod. If it’s not there, then insert the module with either insmod or modprobe. Below is the output

root@xilinx-zcu102-2021_1:~# modprobe dma-proxy [ 372.520904] dma_proxy module initialized [ 372.524844] Device Tree Channel Count: 6 [ 372.528858] Creating channel dma_proxy_tx_0 [ 372.535548] Allocating memory, virtual address: 0000000011FE5000 physical address: 000000006D200000 [ 372.544604] Creating channel dma_proxy_rx_0 [ 372.550621] Allocating memory, virtual address: 00000000123EE000 physical address: 000000006D700000 [ 372.559678] Creating channel dma_proxy_tx_1 [ 372.565959] Allocating memory, virtual address: 000000001417D000 physical address: 000000006DC00000 [ 372.575004] Creating channel dma_proxy_rx_1 [ 372.581010] Allocating memory, virtual address: 0000000014586000 physical address: 000000006E100000 [ 372.590054] Creating channel dma_proxy_tx_2 [ 372.596014] Allocating memory, virtual address: 000000001498F000 physical address: 000000006E600000 [ 372.605059] Creating channel dma_proxy_rx_2 [ 372.610851] Allocating memory, virtual address: 0000000014D98000 physical address: 000000006EB00000

Running the Application

After verifying the proxy driver is in the kernel, run the application. Below is the expected output. Be sure not to enable the verify feature when using this example design. The verify option will not work with this hardware implementation because the TX channel is not looped back to the RX channel.

Please refer the Linux DMA From User Space article for a description of the arguments.

root@xilinx-zcu102-2021_1:~# dma-proxy-test 10000 128 DMA proxy test Verify = 0 Time: 125905 microseconds Transfer size: 1280000 KB Throughput: 10410 MB / sec DMA proxy test complete

After running the application the two tasks completed were:

  • The 3 RX channels received data into user space from the MCDMA.

  • The 3 TX channels transmitted data from user space to the MCDMA TX AXI stream ports.

Conclusion

The proxy-dma driver with the proxy-dma-test have been enhanced to support interfacing with multiple channels of the MCDMA IP. Supporting multiple channels provides a simple method in software to distinguish various streaming sources identified with the TDEST signal from the AXI Streaming interface.

© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy