Linux EDAC Driver
Table of Contents
Overview
This documents provides driver details about the Synopsys DDR ECC controller driver used in Zynq and ZynqMP SOC.HW/IP Features
Zynq DDRC controller and ZynqMP SOC DDRC Controller supports single bit error correction and double bit error detectionZynqMP DDRC controller has interrupt support and error injection support.
Zynq DDRC controller reports the single and double bit errors based on poll method.
ZynqMP DDRC controller reports the single and double bit errors based on interrupt method.
Missing features, Known Issues and Limitations
- None
Kernel Configurations
The following kernel configuration options should be enabled for compiling the Synopsys EDAC driver
CONFIG_EDAC_SYNOPSYS = y
CONFIG_EDAC_SYNOPSYS: │ │ │ │ Support for error detection and correction on the Synopsys DDR │ │ memory controller. │ │ │ │ Symbol: EDAC_SYNOPSYS [=m] │ │ Type : tristate │ │ Prompt: Synopsys DDR Memory Controller │ │ Location: │ │ -> Device Drivers │ │ -> EDAC (Error Detection And Correction) reporting (EDAC [=y]) │ │ -> Main Memory EDAC (Error Detection And Correction) reporting (EDAC_MM_EDAC [=y]) │ │ Defined at drivers/edac/Kconfig:386 │ │ Depends on: EDAC [=y] &&&& EDAC_MM_EDAC [=y] &&&& (ARM [=y] || ARM64)
Device tree Node Settings
Refer the device tree bindings doc.Documentation/devicetree/bindings/memory-controllers/synopsys.txt
For ZynqMP SOC device tree bindings doc
refer devicetree bindings doc
For Zynq SOC
memory-controller@f8006000 { compatible = "xlnx,zynq-ddrc-a05"; reg = <0xf8006000 0x1000>; };
memory-controller@fd070000 { compatible = "xlnx,zynqmp-ddrc-2.40a"; reg = <0x0 0xfd070000 0x0 0x30000>; interrupt-parent = <&&gic>; interrupts = <0 112 4>; };
Mainline Status
This driver is in mainline.
Fixes related to coverity warning are not yet in mainline (~3 lines).
Test Procedure
Zynq
To test the EDAC driver on Zynq platform manually, below are the changes required in FSBL and U-Boot source:
FSBL changes:
If the ECC is enabled on Zynq, then ram size will be reduced to half. So after ECC is enabled, the DDR region is 512MB. In the "ps7_ddr_ecc_init" function in ps7_init.c file, reduce the length of the DDR region initialized to 500MB (Hex: 0x1F400000) instead of 512MB. int ps7_ddr_ecc_init(void) { unsigned long LengthBytes = 0x1F400000; //PS7_DDR_LENGTH; unsigned long SourceAddr = 0; unsigned long DestAddr = PS7_XPAR_PS7_DDR_0_S_AXI_BASEADDR; unsigned long Length = 0; ... ... }
U-boot Changes:
Change the U-Boot dts memory node, size value to 500MB (Hex: 0x1F400000) memory { device_type = "memory"; reg = <0x0 0x1F400000>; };
Use the above changes in FSBL and U-Boot and compile Images.
Use the compiled FSBL and U-Boot Images while booting linux.
Now after booting Linux:
Check driver is probed or not zynq> dmesg | grep edac EDAC MC0: Giving out device to module 1 controller synps_ddr_controller: DEV synps_edac (POLLED) zynq> Do any read operation on memory address which is more than 500MB (0x1F400000) zynq> devmem 0x1F500000 Unhandled fault: imprecise external abort (0x1406) at 0xb6eb4700 pgd = (ptrval) [b6eb4700] *pgd=3ec98831 Bus error zynq> EDAC MC0: 1 CE Bit Position: 77 Data: 0x00000000 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) EDAC MC0: 4 UE DDR ECC error type :UE Row 32064 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) zynq> zynq> To know the complete info of the edac device: zynq> cat /sys/devices/system/edac/mc/mc0/ ce_count csrow0/ mc_name rank0/ seconds_since_reset subsystem/ ue_noinfo_count ce_noinfo_count max_location power/ reset_counters size_mb ue_count uevent For each CE or UE error, ce_count and ue_count will be incremented.
ZynqMP
Injecting ECC Errors for ZynqMP DDRC Controller The following sysfs entries supports injecting ecc errors -> /sys/devices/system/edac/mc/mc0/inject_data_poison (to enable CE/UE) -> /sys/devices/system/edac/mc/mc0/inject_data_error (to specify address) Enable the CE/UE errors -> echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison the above command enables Correctable error injection -> echo "UE" > /sys/devices/system/edac/mc/mc0/inject_data_poison the above command enables UnCorrectable error injection Select the address to inject ECC Errors -> echo 0x7EE0EEE0 > /sys/devices/system/edac/mc/mc0/inject_data_error The above command configures Data poison registers to inject errors at the address specified As per DDRC ZynqMP controller spec, when ever a write operation detected on the address specified, it injects errors to that location and it will report the errors back, when a read operation is performed So write some data to the address specified -> devmem 0x7EE0EEE0 32 0x1234 with the above command, the controller corrupts the data at that address try reading the data from that address -> devmem 0x7EE0EEE0 EDAC MC0: 1 UE DDR ECC error type :UE Row 12544 Bank 0 Col 0 BankGroup Number 2 Block Number 64 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) Unhandled fault: synchronous external abort (0x92000210) at 0x0000007f8d666200 Bus error
Expected Output
Zynq
zynq> devmem 0x1F500000 Unhandled fault: imprecise external abort (0x1406) at 0xb6eb4700 pgd = (ptrval) [b6eb4700] *pgd=3ec98831 Bus error zynq> EDAC MC0: 1 CE Bit Position: 77 Data: 0x00000000 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) EDAC MC0: 4 UE DDR ECC error type :UE Row 32064 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) zynq> cat /sys/devices/system/edac/mc/mc0/ce_count 1 zynq># cat /sys/devices/system/edac/mc/mc0/ue_count 6 zynq>
ZynqMP
root@Xilinx-ZCU102-2017_3:~# dmesg | grep EDAC [ 1.688239] EDAC MC: Ver: 3.0.0 [ 1.691419] EDAC DEBUG: edac_mc_sysfs_init: device mc created [ 3.594032] EDAC DEBUG: edac_mc_alloc: allocating 2168 bytes for mci data (1 ranks, 1 csrows/channels) [ 3.594073] EDAC MC0: 5 CE DDR ECC error type :CE Row 0 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) [ 3.594078] EDAC DEBUG: edac_mc_add_mc_with_groups: [ 3.594082] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc0 [ 3.594117] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc0 [ 3.594180] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm0, located at csrow 0 channel 0 [ 3.594230] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank0 [ 3.594234] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow0 [ 3.594324] EDAC MC0: Giving out device to module 1 controller synps_ddr_controller: DEV synps_edac (INTERRUPT) [ 3.646086] EDAC MC0: 10 UE DDR ECC error type :UE Row 0 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) root@Xilinx-ZCU102-2017_3:~# root@Xilinx-ZCU102-2017_3:~#
root@xilinx-zcu102-2017_3:~# echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison root@xilinx-zcu102-2017_3:~# echo 0x7EE0EEE0 > /sys/devices/system/edac/mc/mc0/inject_data_error root@xilinx-zcu102-2017_3:~# devmem 0x7EE0EEE0 32 0x1234 root@xilinx-zcu102-2017_3:~# devmem 0x7EE0EEE0 [ 38.109379] EDAC MC0: 1 CE DDR ECC error type :CE Row 16240 Bank 1 Col 0 BankGroup Number 3 Block Number 748 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) 0x00001234 root@xilinx-zcu102-2017_3:~# echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison root@xilinx-zcu102-2017_3:~# echo 0x7EE0[ 909.353767] random: crng init donenject_data_error DDD root@xilinx-zcu102-2017_3:~# devmem 0x7EE0DDD0 32 0x1234 root@xilinx-zcu102-2017_3:~# devmem 0x7EE0DDD0 [ 917.861298] EDAC MC0: 1 CE DDR ECC error type :CE Row 16240 Bank 1 Col 0 BankGroup Number 3 Block Number 472 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) 0x00001234 root@xilinx-zcu102-2017_3:~# echo "UE" > /sys/devices/system/edac/mc/mc0/inject_data_poison root@xilinx-zcu102-2017_3:~# echo 0x6EE0DDD0 > /sys/devices/system/edac/mc/mc0/inject_data_error root@xilinx-zcu102-2017_3:~# devmem 0x6EE0DDD0 32 0x1234 root@xilinx-zcu102-2017_3:~# devmem 0x6EE0DDD0 [ 1492.761355] Unhandled fault: synchronous external abort (0x92000210) at 0x0000007fb52e9dd0 [ 1492.761367] EDAC MC0: 1 UE DDR ECC error type :UE Row 14192 Bank 1 Col 0 BankGroup Number 3 Block Number 472 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) Bus error root@xilinx-zcu102-2017_3:~#
Change log
2024.1
- Summary
- None
- Commits
- None
2023.2
- Summary
- None
- Commits
- None
2023.1
- Summary
- None
- Commits
- None
2022.2
- Summary
- None
- Commits
- None
2022.1
- Summary
- None
- Commits
- None
2021.2
- Summary
- None
- Commits
- None
2021.1
- Summary
- Fix the issue in reporting of the error count.
- Commits
2020.2
- Summary
- Fix the wrong value assignment for edac_mode.
- Commits
2020.1
- Summary
- None
- Commits
- None
Related Links
© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy