The principles apply to the Linux kernel regardless of the distribution. Testing for the prototypes for this page was performed with PetaLinux 2017.2 and the 4.9 kernel on ARM64 (Zynq UltraScale+ MPSOC). Previous testing with the same principles has also been performed on ARM (Zynq 7K) successfully.
1.1.1 Memory Attributes
In Linux the MMU of the CPU is setup with memory attributes to determine how the memory is accessed (cached, non-cached, device memory, etc...).
1.1.2 Sparse Memory
The DDR memory for MPSOC is not contiguous as it includes 2 memory ranges, 0 - 0x8000_0000, and 0x8_0000_0000 - 0x8_8000_0000, when using 4 GB on the ZCU102 board.
1.2 Methods to Access Hardware in Linux
The focus of this page is on user space access of the hardware through user space drivers. Kernel drivers should also be considered when the required skills are available.
1.2.1 The /dev/mem Device Driver
The /dev/mem device driver included in the kernel by default (for Xilinx kernel configurations) provides a method to access hardware from user space. This driver allows memory mapped hardware to be mapped into user space using the mmap() function call. There are many examples of using this driver in the open source community, but there are some nuances that are not obvious and not documented that well.
22.214.171.124 Cache Control
The open of the /dev/mem device allows optional flags. Use O_SYNC to cause the accesses to the hardware to non-cached. Without this flag the accesses to the hardware will be cached which can be chaotic and difficult to debug.
126.96.36.199 Accessing Registers
The most typical use of /dev/mem is to access device registers which is in an address range unknown to the Linux kernel (not in the memory node of the device tree). These accesses are performed as device memory or strongly ordered to ensure no side effects.
188.8.131.52 Accessing Memory
Another use of the /dev/mem device driver is to access a memory and in this case it is desirable to access the memory in a more efficient manner such as normal memory that is non-cached. The Linux system should be setup such that the memory is part of the kernels memory (as setup in the memory node of the device tree), is reserved such that the kernel does not use it, and is mapped into the kernel memory space by not using the "no-map" property in the device tree. The Linux reserved memory framework
describes how to reserve memory in the device tree.
184.108.40.206 Other Important Details
- The /dev/mem driver requires root privileges which may not be desired in all systems.
- The O_SYNC flag in the open() of the /dev/mem driver would not be required in a cache coherent system as described at Cache Coherency.
- Normal memory can be accessed unaligned without issues while device/strongly ordered will cause exceptions.
1.2.2 UIO Device Driver
The UIO device driver, uio_pdrv_genirq, in the Linux kernel is another method to access hardware in from user space. This driver works well with the device tree and allows memory mapped hardware to be mapped into user space using the mmap() function call. This method is the preferred approach for accessing registers rather than /dev/mem. Using the UIO device driver causes the memory attributes for the address range to be device / strongly ordered which is good for registers but not for a memory with regards to performance. There are other methods for using UIO which are more complex and not covered here.
2 Linux Kernel Code Details
The following code snippet from the kernel helps understand how physical memory is mapped when using the /dev/mem driver.
2.1 Paths of phys_mem_access_prot()
Each of the paths through the function are explored and described. The user can easily instrument this in the kernel to verify the intended operation of /dev/mem.In this case, no valid page frame number, meaning memory has not been setup in the page tables by the kernel, the memory attributes of the memory are are setup to be device or strongly ordered (non-cached). I believe it's really device memory but getting a clear answer for Linux is not easy. This is seen when accessing memory addresses outside the kernel memory (not RAM). This is the traditional case of using it for device registers which needs to be device or strongly ordered to prevent unwanted side effects. Low performance validates that device or strongly ordered is used.
2.1.2 A Valid Page Frame Number and O_SYNC Is Specified (file→f_flags & O_SYNC)In this case this causes the memory to be altered to be write combined which is uncached buffered memory. This would typically be what you want for RAM such as a frame buffer or DMA memory.
2.1.3 A Valid Page Frame Number and O_SYNC Is Not SpecifiedIn this case this causes the memory attributes to be unaltered such that it's whatever it was which could be and is likely cached. This can cause strange and unpredictable behavior unless cached memory is acceptable.
3 Accessing BRAM Using /dev/mem
3.1 Linux Device Tree
By default the device generation process will generate a node in the PL device tree (pl.dtsi) for the AXI BRAM Controller. Since there is not a driver for the BRAM this should not be an issue. This could change in the future such that disabling the driver in the status is done.
The device tree is altered to add the BRAM memory range to the memory node and to add a reserved memory node so that the kernel does not use the memory, but does map the memory into the kernel memory. Note that the "no-map" property should not be used in the reserved node. The following device tree snippet (to be added to system-user.dtsi in PetaLinux) illustrates the changes for adding a 32K BRAM at address 0xA000_0000.
3.2 U-boot Memory
U-boot is configured by the device tree to some extent and by the u-boot configuration. The default u-boot configuration for the Xilinx build has 2 memory banks supported (for MPSOC). Both memory banks are required to support the sparse 4 GB of DDR for the ZCU102 board. Another bank is added to the u-boot configuration to support the existing 4 GB of DDR and a new BRAM. This step is required because u-boot must see the full amount of memory that is desired in Linux. U-boot alters the amount of memory in the memory node of a loaded device tree before passing it on to Linux such that Linux only recognizes the amount of memory that u-boot is configured for.
The platform-top.h file in the <project>/project-spec/meta-user/recipes-bsp/u-boot/files directory of the Petalinux project is altered by adding the following line.
#define CONFIG_NR_DRAM_BANKS 3
Before u-boot has been altered, the bd command in u-boot reflects two memory banks as shown below.
After u-boot has been altered, the board information should show 3 DRAM banks with the correct addresses and sizes.
3.3 User Space Application
A user space application is used to access the BRAM using the /dev/mem device driver. The application is no different than any other /dev/mem application generally. The following code snippet is only a prototype to illustrate the principles and is not intended to be a robust properly coded application.
Before running the application, use devmem to verify the memory contents.
After the application runs, use devmem again to verify the write was successful.
3.4 Kernel Page Tables Debug
It is typically challenging in Linux to verify that memory attributes are setup as expected. Kernel memory is easier to verify than user space memory. The ARM64 kernel allows the page tables to be dumped from user space and this can be a bit helpful. These methods are still being explored.
3.4.1 Configuring the Kernel
The kernel is not configured to allow the page tables to be dumped by default. The following screen shot shows how to enable the dumping of the page tables.
3.4.2 Dumping the Page Tables
In this example of adding a 32K BRAM it is easy to see the page table entry as 32K is not a common memory size.
Dump the page tables when the BRAM is not added to the Linux system to get a baseline.
Dump the page tables after adding the BRAM into the system as kernel memory and see the difference to verify the memory is mapped as normal memory rather than device / strongly ordered. The memory will show that it is cached as it is when the kernel maps it. User space causes the mapping to be changed to non-cached and that is not visible with this method.
4 Vivado Prototype System
The Vivado system is very simple with nothing but an AXI BRAM in the PL as illustrated below.