Versal and Versal AI Edge Series Gen 2 SMMU Usage Guidance

Versal and Versal AI Edge Series Gen 2 SMMU Usage Guidance

 

Introduction

The SMMU acts just like an MMU for processors; it translates addresses coming from System I/O devices, such as DMA. The SMMU requires masters such as DMAs to use virtual addresses rather than physical addresses. The SMMU is transparent to device drivers in Linux, as the DMA framework knows how to handle it. The SMMU is equivalent to an IOMMU used in other system architectures.

This page is not intended to be a tutorial about the SMMU. Readers should refer to other documents (such as the Versal Technical Reference Manual) for a more detailed understanding of Versal Gen1, together with ARM documents such as the ARM System Memory Management Unit Architecture Specification and the ARM Cortex-A Series Programmers Guide for a more complete understanding of the SMMU operation.

The primary focus of this page is to provide users with information on the usage and how to enable the SSMU for Versal devices from a hardware and software perspective.

Enabling SMMU

By default, SMMU is disabled. The section provides information on enabling SMMU, which covers steps specific to SMMU.

Hardware Requirements

For FPD, changes are not required in the hardware design. Only the LPD masters need the changes outlined below at the hardware design stage.

Routing LPD I/O Master's Traffic Through SMMU (Changes in Hardware Design)

An LPD master's traffic by default takes the LPD path, which skips SMMU.  In order to manage a specific LPD I/O master device through SMMU, its traffic must be routed through SMMU.  The steps to enable that option are shown below:

Design Changes

  • To get the Peripheral traffic routing through SMMU, we need to set the routing bit to FPD. This can be done in two ways.

  • Generate design using Vivado Integrated Design Environment (IDE) (for details, consult [1])

    • Create Vivado Init file

      $ vi /home/<username>/.Xilinx/Vivado/Vivado_init.tcl Add the below contents to enable isolation-related features, set_param bd.isolation true set_param gui.test isolation_gui
    • Modify Block Design
      Select The "Isolation Editor" Option.

      image-20260204-082452.png

       

      Enable Coherent & Virtualization under Edit flag

      image-20260204-082645.png

       

      Only Enabled for GEM (same can be done for other Peripherals)

      image-20260204-083057.png
    • Generate Device images further with the usual steps. To confirm readback, the register below where the FPD bit should be set.

  •  Another way to enable the SMMU for an LPD DMA master is to add CDO commands that program the relevant registers so that the LPD master’s traffic is routed through FPD, and to ensure that this FPD‑routing setting is enabled for that specific DMA master.

    Versal Gen 1

Sr. No

IP

AXI Write Protection

AXI Read Protection

Enable routing to FPD

Sr. No

IP

AXI Write Protection

AXI Read Protection

Enable routing to FPD

1

GEM0

Reg:0xFF0A0000, Value: 0x2

Reg: 0xFF0A0004, Value: 0x2

Reg: 0xFF080328, Value: 0x1

2

GEM1

Reg:0xFF0A0010, Value: 0x2

Reg: 0xFF0A0014, Value: 0x2

Reg: 0xFF080348, Value: 0x1

3

SD0

Reg:0xF1070000, Value: 0x2

Reg: 0xF1070004, Value: 0x2

Reg: 0xF1060464, Value: 0x1

4

SD1

Reg:0xF1070010, Value: 0x2

Reg: 0xF1070014, Value: 0x2

Reg: 0xF10604E4, Value: 0x1

5

QSPI

Reg:0xF1070020, Value: 0x2

N/A

Reg: 0xF106050C, Value: 0x1

6

OSPI

Reg:0xF1070030, Value: 0x2

N/A

Reg: 0xF1060534, Value: 0x1

7

USB

Reg:0xFF0A0020, Value: 0x1

N/A

Reg: 0xFF080428, Value:0x1

8

ADMA

N/A

N/A

Reg: 0xFE600014, Value:0x1

Versal™ AI Edge Series Gen 2 (Default EDF below write is already taken care)

reg:0xF1D61008 val:0x1 reg:0xF1090030 val:0x1 reg:0xF1D21008 val:0x1 reg:0xF1A20324 val:0xF reg:0xF1A40004 val:0x2 reg:0xF1A40000 val:0x2 reg:0xEB410280 val:0x1 reg:0xEB410284 val:0x1 reg:0xEB410288 val:0x1 reg:0xEB41028C val:0x1 reg:0xEB410290 val:0x1 reg:0xEB410294 val:0x1 reg:0xEB410298 val:0x1 reg:0xEB41029C val:0x1 reg:0xF1060A00 val:0x1 reg:0xF1060A08 val:0xF reg:0xF10700D0 val:0x2 reg:0xF10700D4 val:0x2

 

Software Changes

  • Kernel Config :
    CONFIG_ARM_CCI_PMU, CONFIG_ARM_SMMU, CONFIG_ARM_SMMU_V3, etc…

  • Device tree node:

    • Make sure to set SMMU status = “ok”. The DMA master needs to have the iommus and dma-coherent properties.

      Versal Gen1 iommu@fd800000 { compatible = "arm,mmu-500"; status = "okay"; reg = <0x00 0xfd800000 0x00 0x40000>; stream-match-mask = <0x7c00>; #iommu-cells = <0x01>; #global-interrupts = <0x01>; interrupts = <0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04 0x00 0x6b 0x04>; interrupt-parent = <0x4d>; xlnx,ip-name = "psv_fpd_smmutcu"; phandle = <0x5a>; }; dma-controller@ffa80000 { phandle = <0x7f>; compatible = "xlnx,zynqmp-dma-1.0"; status = "okay"; reg = <0x00 0xffa80000 0x00 0x1000>; interrupts = <0x00 0x3c 0x04>; interrupt-parent = <0x59>; clock-names = "clk_main\0clk_apb"; xlnx,dma-type = <0x01>; #dma-cells = <0x01>; xlnx,bus-width = <0x40>; clocks = <0x8f 0x51 0x8f 0x52>; power-domains = <0x8d 0x18224035>; xlnx,is-cache-coherent = <0x01>; xlnx,ip-name = "psv_adma"; xlnx,dma-mode = <0x01>; dma-coherent; iommus = <0x5a 0x210>; //Stream-ID }; ---------------------------------------------------------------------------------------------------------------------------------------------- Versal Gen 2 iommu@ec000000 { phandle = <0x14a>; compatible = "arm,smmu-v3"; status = "ok"; reg = <0x00 0xec000000 0x00 0x40000>; #iommu-cells = <0x01>; interrupt-names = "combined"; interrupt-parent = <0x1b>; interrupts = <0x00 0xa9 0x04>; dma-coherent; }; dma-controller@ebd00000 { phandle = <0x49>; compatible = "amd,versal2-dma-1.0"; status = "okay"; reg = <0x00 0xebd00000 0x00 0x1000>; interrupt-parent = <0x1b>; interrupts = <0x00 0x48 0x04>; clock-names = "clk_main\0clk_apb"; #dma-cells = <0x01>; xlnx,bus-width = <0x40>; clocks = <0xb6 0x51 0xb6 0x52>; power-domains = <0xbc 0x18224035>; xlnx,is-cache-coherent = <0x00>; xlnx,ip-name = "adma"; xlnx,dma-mode = <0x01>; xlnx,zdma-clk-freq-hz = <0x8f0cba9>; iommus = <0x14a 0x210>; //Stream-ID dma-coherent; };

       

      SMMU driver boot logs

      Versal Gen 1 xilinx-vck190-20252:~$ dmesg | grep -e smmu -e iommu [ 1.074217] iommu: Default domain type: Translated [ 1.079050] iommu: DMA domain TLB invalidation policy: strict mode [ 2.097672] arm-smmu fd800000.iommu: probing hardware configuration... [ 2.104276] arm-smmu fd800000.iommu: SMMUv2 with: [ 2.109021] arm-smmu fd800000.iommu: stage 1 translation [ 2.114467] arm-smmu fd800000.iommu: stage 2 translation [ 2.119911] arm-smmu fd800000.iommu: nested translation [ 2.125273] arm-smmu fd800000.iommu: stream matching with 64 register groups [ 2.132473] arm-smmu fd800000.iommu: 32 context banks (0 stage-2 only) [ 2.139153] arm-smmu fd800000.iommu: Supported page sizes: 0x61311000 [ 2.145746] arm-smmu fd800000.iommu: Stage-1: 48-bit VA -> 48-bit IPA [ 2.152335] arm-smmu fd800000.iommu: Stage-2: 48-bit IPA -> 48-bit PA [ 2.159054] arm-smmu fd800000.iommu: preserved 0 boot mappings [ 2.165044] arm-smmu fd800000.iommu: Failed to disable prefetcher for errata workarounds, check SACR.CACHE_LOCK [ 2.868813] xilinx-zynqmp-dma ffa80000.dma-controller: Adding to iommu group 0 [ 2.876458] xilinx-zynqmp-dma ffa90000.dma-controller: Adding to iommu group 1 [ 2.884000] xilinx-zynqmp-dma ffaa0000.dma-controller: Adding to iommu group 2 [ 2.891533] xilinx-zynqmp-dma ffab0000.dma-controller: Adding to iommu group 3 [ 2.899109] xilinx-zynqmp-dma ffac0000.dma-controller: Adding to iommu group 4 [ 2.906639] xilinx-zynqmp-dma ffad0000.dma-controller: Adding to iommu group 5 [ 2.914174] xilinx-zynqmp-dma ffae0000.dma-controller: Adding to iommu group 6 [ 2.921824] xilinx-zynqmp-dma ffaf0000.dma-controller: Adding to iommu group 7 [ 2.929551] zynqmp-qspi f1030000.spi: Adding to iommu group 8 [ 2.977064] macb ff0c0000.ethernet: Adding to iommu group 9 [ 6.238537] macb ff0d0000.ethernet: Adding to iommu group 10 [ 6.389379] dwc3 fe200000.usb: Adding to iommu group 11 [ 6.537812] sdhci-arasan f1050000.mmc: Adding to iommu group 12 --- Versal Gen2 [ 1.512323] iommu: Default domain type: Translated [ 1.517163] iommu: DMA domain TLB invalidation policy: strict mode [ 1.903610] arm-smmu-v3 ec000000.iommu: 65518 shared contexts [ 1.903613] arm-smmu-v3 ec000000.iommu: ias 48-bit, oas 48-bit (features 0x001e1f8f) [ 1.911885] arm-smmu-v3 ec000000.iommu: allocated 65536 entries for cmdq [ 1.919013] arm-smmu-v3 ec000000.iommu: allocated 32768 entries for evtq [ 2.412534] xilinx-zynqmp-dma ebd00000.dma-controller: Adding to iommu group 0 [ 2.419956] xilinx-zynqmp-dma ebd10000.dma-controller: Adding to iommu group 1 [ 2.427301] xilinx-zynqmp-dma ebd20000.dma-controller: Adding to iommu group 2 [ 2.434634] xilinx-zynqmp-dma ebd30000.dma-controller: Adding to iommu group 3 [ 2.441971] xilinx-zynqmp-dma ebd40000.dma-controller: Adding to iommu group 4 [ 2.449310] xilinx-zynqmp-dma ebd50000.dma-controller: Adding to iommu group 5 [ 2.456639] xilinx-zynqmp-dma ebd60000.dma-controller: Adding to iommu group 6 [ 2.463970] xilinx-zynqmp-dma ebd70000.dma-controller: Adding to iommu group 7 [ 2.500562] macb f19e0000.ethernet: Adding to iommu group 8 [ 2.519131] macb f19f0000.ethernet: Adding to iommu group 9

       

SMMU Debug

Below are available mechanisms to observe SMMU behavior during debug, including tracing IOVA (and related PA mappings) and other IOMMU trace events.
These traces also expose runtime device attach/detach to IOMMU domains, which is useful for diagnosing spurious or misbehaving devices. In addition, dma, kmem, swiotlb, and iomap trace events can be leveraged to further root-cause issues during deeper debug.

Kernel Tracing

Here for particular map()/unmap() request the iova(intermediate virtual address) and associated paddr(physical address) with size. echo 1 > /sys/kernel/tracing/events/iommu/enable xilinx-vck190-20252:/home/petalinux# cat /sys/kernel/tracing/trace | head -n30 tcf-agent-607 [001] b.... 356.508089: map: IOMMU: iova=0x00000fffff900000 - 0x00000fffff901000 paddr=0x00000008058ef000 size=4096 <idle>-0 [000] ..s.. 356.508115: unmap: IOMMU: iova=0x00000fffff900000 - 0x00000fffff901000 size=4096 unmapped_size=4096 tcf-agent-607 [001] b.... 356.508140: map: IOMMU: iova=0x00000fffff8ff000 - 0x00000fffff900000 paddr=0x00000008058ef000 size=4096 <idle>-0 [000] ..s.. 356.508153: unmap: IOMMU: iova=0x00000fffff8ff000 - 0x00000fffff900000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 356.508828: unmap: IOMMU: iova=0x00000fffffef1000 - 0x00000fffffef2000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 356.508836: map: IOMMU: iova=0x00000fffffef1000 - 0x00000fffffef2000 paddr=0x000000081d6f6000 size=4096 tcf-agent-607 [001] b.... 356.508896: map: IOMMU: iova=0x00000fffff8fe000 - 0x00000fffff8ff000 paddr=0x00000008058ee000 size=4096 <idle>-0 [000] ..s.. 356.508900: unmap: IOMMU: iova=0x00000fffffef0000 - 0x00000fffffef1000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 356.508907: map: IOMMU: iova=0x00000fffffef0000 - 0x00000fffffef1000 paddr=0x000000081d6f6000 size=4096 <idle>-0 [000] ..s.. 356.508915: unmap: IOMMU: iova=0x00000fffff8fe000 - 0x00000fffff8ff000 size=4096 unmapped_size=4096 tcf-agent-607 [001] b.... 356.508941: map: IOMMU: iova=0x00000fffff8fd000 - 0x00000fffff8fe000 paddr=0x00000008058ee000 size=4096 <idle>-0 [000] ..s.. 356.508951: unmap: IOMMU: iova=0x00000fffff8fd000 - 0x00000fffff8fe000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 359.741221: unmap: IOMMU: iova=0x00000fffffeee000 - 0x00000fffffef0000 size=8192 unmapped_size=8192 <idle>-0 [000] ..s.. 359.741229: map: IOMMU: iova=0x00000fffffeee000 - 0x00000fffffef0000 paddr=0x000000081d6f5000 size=8192 <idle>-0 [000] ..s.. 359.741268: unmap: IOMMU: iova=0x00000fffffeed000 - 0x00000fffffeee000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 359.741272: map: IOMMU: iova=0x00000fffffeed000 - 0x00000fffffeee000 paddr=0x000000081d6f5000 size=4096 <idle>-0 [000] b.s.. 361.567364: map: IOMMU: iova=0x00000fffff8fd000 - 0x00000fffff8fe000 paddr=0x0000000801266000 size=4096 <idle>-0 [000] ..s.. 361.567392: unmap: IOMMU: iova=0x00000fffff8fd000 - 0x00000fffff8fe000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 361.567481: unmap: IOMMU: iova=0x00000fffffeea000 - 0x00000fffffeec000 size=8192 unmapped_size=8192 <idle>-0 [000] ..s.. 361.567486: map: IOMMU: iova=0x00000fffffeea000 - 0x00000fffffeec000 paddr=0x000000081d6f4000 size=8192 <idle>-0 [000] ..s.. 361.593894: unmap: IOMMU: iova=0x00000fffffee9000 - 0x00000fffffeea000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 361.593898: map: IOMMU: iova=0x00000fffffee9000 - 0x00000fffffeea000 paddr=0x000000081d6f4000 size=4096 <idle>-0 [000] b.s.. 361.593907: map: IOMMU: iova=0x00000fffff8fd000 - 0x00000fffff8fe000 paddr=0x0000000801266000 size=4096 <idle>-0 [000] ..s.. 361.593918: unmap: IOMMU: iova=0x00000fffff8fd000 - 0x00000fffff8fe000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 365.420542: unmap: IOMMU: iova=0x00000fffffee6000 - 0x00000fffffee8000 size=8192 unmapped_size=8192 <idle>-0 [000] ..s.. 365.420553: map: IOMMU: iova=0x00000fffffee6000 - 0x00000fffffee8000 paddr=0x000000081d6f3000 size=8192 tcf-agent-607 [001] b.... 365.420634: map: IOMMU: iova=0x00000fffff8fc000 - 0x00000fffff8fd000 paddr=0x00000008058ee000 size=4096 <idle>-0 [000] ..s.. 365.420648: unmap: IOMMU: iova=0x00000fffff8fc000 - 0x00000fffff8fd000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 365.420737: unmap: IOMMU: iova=0x00000fffffee5000 - 0x00000fffffee6000 size=4096 unmapped_size=4096 <idle>-0 [000] ..s.. 365.420741: map: IOMMU: iova=0x00000fffffee5000 - 0x00000fffffee6000 paddr=0x000000081d6f3000 size=4096

Event queue

Reported events prints over kernel console once occurred or by enabling dynamic debug.
Below is a typical example of “translation fault.”
Reference: Arm® System Memory Management Unit Architecture Specification SMMU architecture version 3

image-20260317-045036.png
Below logs are for Versal Gen 2 [THis is needs a patch which going to upstream soon] [ 13.548223] arm-smmu-v3 ec000000.iommu: event 0x10 received: <- Event 0x10 refer to Traslation fault [ 13.553878] arm-smmu-v3 ec000000.iommu: 0x0000023400000010 <-0x234 StreamID [ 13.559441] arm-smmu-v3 ec000000.iommu: 0x0000020800000000 [ 13.565004] arm-smmu-v3 ec000000.iommu: 0x000000000000dea8 <-False address causing trslation fault. [ 13.570565] arm-smmu-v3 ec000000.iommu: 0x0000000000000000 [ 20.657310] arm-smmu-v3 ec000000.iommu: event 0x10 received: <-- And same happens again here.. [ 20.662964] arm-smmu-v3 ec000000.iommu: 0x0000023400000010 [ 20.668527] arm-smmu-v3 ec000000.iommu: 0x0000020800000000 [ 20.674089] arm-smmu-v3 ec000000.iommu: 0x000000000000dea8 [ 20.679652] arm-smmu-v3 ec000000.iommu: 0x0000000000000000


Performance

Interface

Target

Remark1

Remark2

Interface

Target

Remark1

Remark2

ADMA

VCK190 vs VEK385

~40% drop on VEK385 with SMMU

Data inconsistent for VCK190. VEK385 better than VCK190.

OSPI

VEK385

No variation

Numbers unchanged with or without SMMU.

QSPI

VCK190

Drastic performance drop when SMMU enabled

USB

VCK190 & VEK385

~4% decrease

Slight decrease in performance with SMMU enabled.

Ethernet

VCK190

No variation

No variation compared to VCK190 baseline.

UFS

VEK385

~0–7% variation

Slight variation in numbers.

A detailed study can be checked at [3]

 

References

[1] https://amd.atlassian.net/wiki/spaces/XPS/pages/903865051

[2] Versal SMMU and CCI configuration

[3] https://amd.atlassian.net/wiki/spaces/XPS/pages/1393332590
[4] Arm® System Memory Management Unit Architecture Specification SMMU architecture version 3