Table of Contents
CONFIG_EDAC_SYNOPSYS: │ │ │ │ Support for error detection and correction on the Synopsys DDR │ │ memory controller. │ │ │ │ Symbol: EDAC_SYNOPSYS [=m] │ │ Type : tristate │ │ Prompt: Synopsys DDR Memory Controller │ │ Location: │ │ -> Device Drivers │ │ -> EDAC (Error Detection And Correction) reporting (EDAC [=y]) │ │ -> Main Memory EDAC (Error Detection And Correction) reporting (EDAC_MM_EDAC [=y]) │ │ Defined at drivers/edac/Kconfig:386 │ │ Depends on: EDAC [=y] &&&& EDAC_MM_EDAC [=y] &&&& (ARM [=y] || ARM64) |
memory-controller@f8006000 { compatible = "xlnx,zynq-ddrc-a05"; reg = <0xf8006000 0x1000>; }; |
memory-controller@fd070000 { compatible = "xlnx,zynqmp-ddrc-2.40a"; reg = <0x0 0xfd070000 0x0 0x30000>; interrupt-parent = <&&gic>; interrupts = <0 112 4>; }; |
This driver is in mainline.
Fixes related to coverity warning are not yet in mainline (~3 lines).
To test the EDAC driver on Zynq platform manually, below are the changes required in FSBL and U-Boot source:
FSBL changes:
If the ECC is enabled on Zynq, then ram size will be reduced to half. So after ECC is enabled, the DDR region is 512MB. In the "ps7_ddr_ecc_init" function in ps7_init.c file, reduce the length of the DDR region initialized to 500MB (Hex: 0x1F400000) instead of 512MB. int ps7_ddr_ecc_init(void) { unsigned long LengthBytes = 0x1F400000; //PS7_DDR_LENGTH; unsigned long SourceAddr = 0; unsigned long DestAddr = PS7_XPAR_PS7_DDR_0_S_AXI_BASEADDR; unsigned long Length = 0; ... ... } |
U-boot Changes:
Change the U-Boot dts memory node, size value to 500MB (Hex: 0x1F400000) memory { device_type = "memory"; reg = <0x0 0x1F400000>; }; |
Use the above changes in FSBL and U-Boot and compile Images.
Use the compiled FSBL and U-Boot Images while booting linux.
Now after booting Linux:
Check driver is probed or not zynq> dmesg | grep edac EDAC MC0: Giving out device to module 1 controller synps_ddr_controller: DEV synps_edac (POLLED) zynq> Do any read operation on memory address which is more than 500MB (0x1F400000) zynq> devmem 0x1F500000 Unhandled fault: imprecise external abort (0x1406) at 0xb6eb4700 pgd = (ptrval) [b6eb4700] *pgd=3ec98831 Bus error zynq> EDAC MC0: 1 CE Bit Position: 77 Data: 0x00000000 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) EDAC MC0: 4 UE DDR ECC error type :UE Row 32064 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) zynq> zynq> To know the complete info of the edac device: zynq> cat /sys/devices/system/edac/mc/mc0/ ce_count csrow0/ mc_name rank0/ seconds_since_reset subsystem/ ue_noinfo_count ce_noinfo_count max_location power/ reset_counters size_mb ue_count uevent For each CE or UE error, ce_count and ue_count will be incremented. |
Injecting ECC Errors for ZynqMP DDRC Controller The following sysfs entries supports injecting ecc errors -> /sys/devices/system/edac/mc/mc0/inject_data_poison (to enable CE/UE) -> /sys/devices/system/edac/mc/mc0/inject_data_error (to specify address) Enable the CE/UE errors -> echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison the above command enables Correctable error injection -> echo "UE" > /sys/devices/system/edac/mc/mc0/inject_data_poison the above command enables UnCorrectable error injection Select the address to inject ECC Errors -> echo 0x7EE0EEE0 > /sys/devices/system/edac/mc/mc0/inject_data_error The above command configures Data poison registers to inject errors at the address specified As per DDRC ZynqMP controller spec, when ever a write operation detected on the address specified, it injects errors to that location and it will report the errors back, when a read operation is performed So write some data to the address specified -> devmem 0x7EE0EEE0 32 0x1234 with the above command, the controller corrupts the data at that address try reading the data from that address -> devmem 0x7EE0EEE0 EDAC MC0: 1 UE DDR ECC error type :UE Row 12544 Bank 0 Col 0 BankGroup Number 2 Block Number 64 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) Unhandled fault: synchronous external abort (0x92000210) at 0x0000007f8d666200 Bus error |
zynq> devmem 0x1F500000 Unhandled fault: imprecise external abort (0x1406) at 0xb6eb4700 pgd = (ptrval) [b6eb4700] *pgd=3ec98831 Bus error zynq> EDAC MC0: 1 CE Bit Position: 77 Data: 0x00000000 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) EDAC MC0: 4 UE DDR ECC error type :UE Row 32064 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) zynq> cat /sys/devices/system/edac/mc/mc0/ce_count 1 zynq># cat /sys/devices/system/edac/mc/mc0/ue_count 6 zynq> |
root@Xilinx-ZCU102-2017_3:~# dmesg | grep EDAC [ 1.688239] EDAC MC: Ver: 3.0.0 [ 1.691419] EDAC DEBUG: edac_mc_sysfs_init: device mc created [ 3.594032] EDAC DEBUG: edac_mc_alloc: allocating 2168 bytes for mci data (1 ranks, 1 csrows/channels) [ 3.594073] EDAC MC0: 5 CE DDR ECC error type :CE Row 0 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) [ 3.594078] EDAC DEBUG: edac_mc_add_mc_with_groups: [ 3.594082] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc0 [ 3.594117] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc0 [ 3.594180] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm0, located at csrow 0 channel 0 [ 3.594230] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank0 [ 3.594234] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow0 [ 3.594324] EDAC MC0: Giving out device to module 1 controller synps_ddr_controller: DEV synps_edac (INTERRUPT) [ 3.646086] EDAC MC0: 10 UE DDR ECC error type :UE Row 0 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) root@Xilinx-ZCU102-2017_3:~# root@Xilinx-ZCU102-2017_3:~# |
root@xilinx-zcu102-2017_3:~# echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison root@xilinx-zcu102-2017_3:~# echo 0x7EE0EEE0 > /sys/devices/system/edac/mc/mc0/inject_data_error root@xilinx-zcu102-2017_3:~# devmem 0x7EE0EEE0 32 0x1234 root@xilinx-zcu102-2017_3:~# devmem 0x7EE0EEE0 [ 38.109379] EDAC MC0: 1 CE DDR ECC error type :CE Row 16240 Bank 1 Col 0 BankGroup Number 3 Block Number 748 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) 0x00001234 root@xilinx-zcu102-2017_3:~# echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison root@xilinx-zcu102-2017_3:~# echo 0x7EE0[ 909.353767] random: crng init donenject_data_error DDD root@xilinx-zcu102-2017_3:~# devmem 0x7EE0DDD0 32 0x1234 root@xilinx-zcu102-2017_3:~# devmem 0x7EE0DDD0 [ 917.861298] EDAC MC0: 1 CE DDR ECC error type :CE Row 16240 Bank 1 Col 0 BankGroup Number 3 Block Number 472 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0) 0x00001234 root@xilinx-zcu102-2017_3:~# echo "UE" > /sys/devices/system/edac/mc/mc0/inject_data_poison root@xilinx-zcu102-2017_3:~# echo 0x6EE0DDD0 > /sys/devices/system/edac/mc/mc0/inject_data_error root@xilinx-zcu102-2017_3:~# devmem 0x6EE0DDD0 32 0x1234 root@xilinx-zcu102-2017_3:~# devmem 0x6EE0DDD0 [ 1492.761355] Unhandled fault: synchronous external abort (0x92000210) at 0x0000007fb52e9dd0 [ 1492.761367] EDAC MC0: 1 UE DDR ECC error type :UE Row 14192 Bank 1 Col 0 BankGroup Number 3 Block Number 472 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1) Bus error root@xilinx-zcu102-2017_3:~# |