The purpose of this page is to describe the Linux EDAC support for Synopsys DDR ECC
Overview
This documents provides driver details about the Synopsys DDR ECC controller driver used in Zynq and ZynqMP SOC.HW/IP Features
Zynq DDRC controller and ZynqMP SOC DDRC Controller supports single bit error correction and double bit error detection
ZynqMP DDRC controller has interrupt support and error injection support.
Zynq DDRC controller reports the single and double bit errors based on poll method.
ZynqMP DDRC controller reports the single and double bit errors based on interrupt method.
Missing features, Known Issues and Limitations
Kernel Configurations
The following kernel configuration options should be enabled for compiling the Synopsys EDAC driverCONFIG_EDAC_SYNOPSYS = y
Code Block |
---|
|
CONFIG_EDAC_SYNOPSYS: │
│ │
│ Support for error detection and correction on the Synopsys DDR │
│ memory controller. │
│ │
│ Symbol: EDAC_SYNOPSYS [=m] │
│ Type : tristate │
│ Prompt: Synopsys DDR Memory Controller │
│ Location: │
│ -> Device Drivers │
│ -> EDAC (Error Detection And Correction) reporting (EDAC [=y]) │
│ -> Main Memory EDAC (Error Detection And Correction) reporting (EDAC_MM_EDAC [=y]) │
│ Defined at drivers/edac/Kconfig:386 │
│ Depends on: EDAC [=y] &&&& EDAC_MM_EDAC [=y] &&&& (ARM [=y] || ARM64) |
Device tree Node Settings
Refer the device tree bindings doc.
Documentation/devicetree/bindings/memory-controllers/synopsys.txtFor ZynqMP SOC device tree bindings docrefer devicetree bindings docFor Zynq SOC
Code Block |
---|
|
memory-controller@f8006000 {
compatible = "xlnx,zynq-ddrc-a05";
reg = <0xf8006000 0x1000>;
}; |
For ZynqMP SOC
Code Block |
---|
|
memory-controller@fd070000 {
compatible = "xlnx,zynqmp-ddrc-2.40a";
reg = <0x0 0xfd070000 0x0 0x30000>;
interrupt-parent = <&&gic>;
interrupts = <0 112 4>;
}; |
Test Procedure
steps to reserve the test memory location for error injection
Code Block |
---|
1reservedreserved-memory {
2 #address-cells = <2>;
3 #size-cells = <2>; 4
ranges;
5 6
reserved: buffer@0 { 7
reusable; 8
reg =<0x0<0x0 0x20020000x7EE0EEE0 0x0 0x00100000>;
9 };
10 };
11
12 reserved-driver@0 { 13
compatible = "xlnx,reserved-memory"; 14
memory-region = <&reserved>; 15
};
16 |
For Zynq
Code Block |
---|
|
Check driver is probed or not
zynq> dmesg | grep edac
EDAC MC-1: Giving out device to 'xilinxps_edac' 'zynq_ddr_controller': DEV f8006000.ps7-ddrc
zynq>
Do any read operation on memory address and then write then edac driver will display some memory information
zynq> devmem 0x1F400000
0xEA000049
zynq> devmem 0x1F400000 0x5D600000
Unhandled fault: external abort on non-lineinterface (0x1018) at 0xb6f83000
Bus error
zynq> EDAC MC-1: 2 CE DDR ECC error type :CE Row 0 Bank 0 Col 512 on mc#4294967295csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0)
EDAC MC-1: 3 UE DDR ECC error type :UE Row 0 Bank 0 Col 512 on mc#4294967295csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1)
zynq>
To know the complete info of the edac device:
zynq> cat /sys/devices/system/edac/mc/mc-1/
ce_count power/ subsystem/
ce_noinfo_count rank0/ ue_count
csrow0/ reset_counters ue_noinfo_count
max_location seconds_since_reset uevent
mc_name size_mb
For each CE or UE error, ce_count and ue_count will be incremented.
|
For ZynqMP
Code Block |
---|
|
Injecting ECC Errors for ZynqMP DDRC Controller
The following sysfs entries supports injecting ecc errors
-> /sys/devices/system/edac/mc/mc0/inject_data_poison (to enable CE/UE)
-> /sys/devices/system/edac/mc/mc0/inject_data_error (to specify address)
Enable the CE/UE errors
-> echo "CE" > /sys/devices/system/edac/mc/mc0/inject_data_poison
the above command enables Correctable error injection
-> echo "UE" > /sys/devices/system/edac/mc/mc0/inject_data_poison
the above command enables UnCorrectable error injection
|
Code Block |
---|
|
Select the address to inject ECC Errors
-> echo 0x7EE0EEE0 > /sys/devices/system/edac/mc/mc0/inject_data_error
The above command configures Data poison registers to inject errors at the address specified
As per DDRC ZynqMP controller spec, when ever a write operation detected on the address specified, it injects errors to that location
and it will report the errors back, when a read operation is performed
So write some data to the address specified
-> devmem 0x7EE0EEE0 32 0x1234
with the above command, the controller corrupts the data at that address
try reading the data from that address
-> devmem 0x7EE0EEE0
EDAC MC0: 1 UE DDR ECC error type :UE Row 12544 Bank 0 Col 0 BankGroup Number 2 Block Number 64 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1)
Unhandled fault: synchronous external abort (0x92000210) at 0x0000007f8d666200
Bus error |
Expected Output
For Zynq
Code Block |
---|
|
zynq> devmem 0x1F400000
Unhandled fault: imprecise external abort (0x1406) at 0x000cb884
Bus error
zynq> EDAC MC-1: 2 CE DDR ECC error type :CE Row 0 Bank 0 Col 512 on mc#4294967295csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0)
EDAC MC-1: 3 UE DDR ECC error type :UE Row 0 Bank 0 Col 512 on mc#4294967295csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1)
zynq> cat /sys/devices/system/edac/mc/mc-1/ce_count
11
zynq> cat /sys/devices/system/edac/mc/mc-1/ue_count
8
zynq>
|
For ZynqMP
Code Block |
---|
|
root@Xilinx-ZCU102-2016_1:~# dmesg | grep EDAC
[ 1.688239] EDAC MC: Ver: 3.0.0
[ 1.691419] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[ 3.594032] EDAC DEBUG: edac_mc_alloc: allocating 2168 bytes for mci data (1 ranks, 1 csrows/channels)
[ 3.594073] EDAC MC0: 5 CE DDR ECC error type :CE Row 0 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0)
[ 3.594078] EDAC DEBUG: edac_mc_add_mc_with_groups:
[ 3.594082] EDAC DEBUG: edac_create_sysfs_mci_device: creating bus mc0
[ 3.594117] EDAC DEBUG: edac_create_sysfs_mci_device: creating device mc0
[ 3.594180] EDAC DEBUG: edac_create_sysfs_mci_device: creating dimm0, located at csrow 0 channel 0
[ 3.594230] EDAC DEBUG: edac_create_dimm_object: creating rank/dimm device rank0
[ 3.594234] EDAC DEBUG: edac_create_csrow_object: creating (virtual) csrow node csrow0
[ 3.594324] EDAC MC0: Giving out device to module 1 controller synps_ddr_controller: DEV synps_edac (INTERRUPT)
[ 3.646086] EDAC MC0: 10 UE DDR ECC error type :UE Row 0 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1)
root@Xilinx-ZCU102-2016_1:~#
root@Xilinx-ZCU102-2016_1:~#
root@Xilinx-ZCU102-2016_1:~# devmem 0x61000000
[ 28.911942] Unhandled fault: synchronous external abort (0x92000210) at 0x0000007f9038e000
[ 28.911955] EDAC MC0: 1 CE DDR ECC error type :CE Row 12416 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0)
[ 28.911964] EDAC MC0: 14 UE DDR ECC error type :UE Row 12416 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1)
Bus error
root@Xilinx-ZCU102-2016_1:~#
root@Xilinx-ZCU102-2016_1:~#
root@Xilinx-ZCU102-2016_1:~# cat /sys/devices/system/edac/mc/mc0/ce_count
6
root@Xilinx-ZCU102-2016_1:~# cat /sys/devices/system/edac/mc/mc0/ue_count
24
root@Xilinx-ZCU102-2016_1:~#
root@Xilinx-ZCU102-2016_1:~# devmem 0x61000000
root@Xilinx-ZCU102-2016_1:~# devmem 0x61000000 72000000
[ 56.159845] Unhandled fault: synchronous external abort (0x92000210) at 0x0000007f8205c000
[ 56.159858] EDAC MC0: 2 CE DDR ECC error type :CE Row 14592 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1 syndrome:0x0)
[ 56.159867] EDAC MC0: 13 UE DDR ECC error type :UE Row 14592 Bank 0 Col 0 on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:1)
Bus error
root@Xilinx-ZCU102-2016_1:~#
root@Xilinx-ZCU102-2016_1:~#cat /sys/devices/system/edac/mc/mc0/ce_count
8
root@xilinx-ZCU102-2016_1:~#cat /sys/devices/system/edac/mc/mc0/ue_count
37 |
Change log
2016.3
2016.4
2017.1
2017.2
2017.3
- Summary
- Do not use symbolic permissions
- Commits
2017.4
2022.1
Mainline status
Mainlined