This article is intended to be a guide for tweaking tuning the default configuration to shape the traffic of the HP ports to help meet your system throughput requirements.
...
Performance through the HP ports is very dependent on the traffic patterns generated by the PL masters as well as non-deterministic traffic patterns driven by software running on the processors. Both sets of masters will be competing for DDR access. The non-deterministic nature of software running on the processors makes it difficult to accurately model for optimal DDR efficiency. However, you may be able to iteratively tweak tune the default configurations to help meet your system performance requirements. This is not an exhaustive list of the available controls, but the most effective knobs to help shape the HP traffic. The data path from the HP ports in the PL to the DDR memory controller is shown below.
...
The PS-PL interface will slice AXI burst lengths greater than 16, however this conversion may negatively impact performance when heavily loaded.
Registers
Register | Offset |
---|
Bits | Description | ||
---|---|---|---|
0x0 | [2] | FABRIC_QOS_EN: 0-Enable static QoS, 1-Enable dynamic QoS (PL AXI sideband pass-through) | |
0x4 | [3:0] | CAPABILITY: Read/write command issuing capability (0-15); Number of commands minus one | |
0x8 | [3:0] | VALUE: Static QoS value (0-15) | |
0x14 | [2] | FABRIC_QOS_EN: 0-Enable static QoS, 1-Enable dynamic QoS (PL AXI sideband pass-through) | |
0x18 | [3:0] | CAPABILITY: Read/write command issuing capability (0-15); Number of commands minus one | |
0x1c | [3:0] | VALUE: Static QoS value (0-15) |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...
Peak rate, burstiness and average rate
Peak rate only
Burstiness and average rate
HP0 Registers | HP1 Registers | HP2 Registers | HP3 Registers | Description | ||||
---|---|---|---|---|---|---|---|---|
0x47118 | afifm3M_intfpd_aw_p | 0x4A118 | afifm4M_intfpd_aw_p | 0x4B118 | afifm5M_intfpd_aw_p | 0x4C118 | AW channel peak rate | |
0x4711C | afifm3M_intfpd_aw_b | 0x4A11C | afifm4M_intfpd_aw_b | 0x4B11C | afifm5M_intfpd_aw_b | 0x4C11C | AW channel burstiness allowance | |
0x47120 | afifm3M_intfpd_aw_r | 0x4A120 | afifm4M_intfpd_aw_r | 0x4B120 | afifm5M_intfpd_aw_r | 0x4C120 | AW channel average rate | |
0x47124 | afifm3M_intfpd_ar_p | 0x4A124 | afifm4M_intfpd_ar_p | 0x4B124 | afifm5M_intfpd_ar_p | 0x4C124 | AR channel peak rate | |
0x47128 | afifm3M_intfpd_ar_b | 0x4A128 | afifm4M_intfpd_ar_b | 0x4B128 | afifm5M_intfpd_ar_b | 0x4C128 | AR channel burstiness allowance | |
0x4712C | afifm3M_intfpd_ar_r | 0x4A12C | afifm4M_intfpd_ar_r | 0x4B12C | afifm5M_intfpd_ar_r | 0x4C12C | AR channel average rate | |
0x4710C | afifm3M_intfpd_qos_cntl | 0x4A10C | afifm4M_intfpd_qos_cntl | 0x4B10C | afifm5M_intfpd_qos_cntl | 0x4C10C | Enable rate regulation |
Register | Field | Bits | Description |
---|---|---|---|
*_intfpd_aw_p | aw_p | [31:24] | Channel peak rate (8b fraction of transfers per cycle) |
*_intfpd_aw_b | aw_b | [15:0] | Channel burstiness (integer transfers) |
*_intfpd_aw_r | aw_r | [31:20] | Channel average rate (12b fraction of transfers per cycle) |
*_intfpd_ar_p | ar_p | [31:24] | Channel peak rate (8b fraction of transfers per cycle) |
*_intfpd_ar_b | ar_b | [15:0] | Channel burstiness (integer transfers) |
*_intfpd_ar_r | ar_r | [31:20] | Channel average rate (12b fraction of transfers per cycle) |
*_intfpd_qos_cntrl | en_awar_rate en_ar_rate en_aw_rate | [2] [1] [0] | Enable combined rate regulation Enable AR rate regulation Enable AW rate regulation |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...
Rate_peak = floor (4096 / (100 * BL / %BW_peak)) = floor (256 / (100 * 16 / 15)) = 2 (0x2)
Burstiness = 4
...
Outstanding transaction regulation
In the QoS-400 you may specify the maximum number of outstanding transactions allowed with fractional transaction resolution. The actual number of transactions will modulate between the lower and upper bound.
...
HP0
...
HP1
...
HP2
...
HP3
...
Description
...
(0x4)
Due to quantization effects the actual rate will be lower than the target rate.
Rate (avg) = 25 * 100 * 16 / 4096 = 9.76% of MAX => 8528 * 0.0976 = 832.8 MBps
Rate (peak) = 2 * 100 * 16 / 256 = 12.5% of MAX => 8528 * 0.125 = 1066 MBps
...
Outstanding transaction regulation
In the QoS-400 you may specify the maximum number of outstanding transactions allowed with fractional transaction resolution. The actual number of transactions will modulate between the lower and upper bound.
HP0 | HP1 | HP2 | HP3 | Description | ||||
---|---|---|---|---|---|---|---|---|
0x47110 | afifm3M_intfpd_max_ot | 0x4A110 | afifm4M_intfpd_max_ot | 0x4B110 | afifm5M_intfpd_max_ot | 0x4C110 | Max number of outstanding transactions | |
0x47114 | afifm3M_intfpd_max_ot | 0x4A114 | afifm4M_intfpd_max_ot | 0x4B114 | afifm5M_intfpd_max_ot | 0x4C114 | Max number of combined outstanding transactions | |
0x4710C | afifm3M_intfpd_qos_cntl | 0x4A10C | afifm4M_intfpd_qos_cntl | 0x4B10C | afifm5M_intfpd_qos_cntl | 0x4C10C | Enable outstanding transaction regulation |
Register | Field | Bits | Description |
---|---|---|---|
*_intfpd_max_ot | ar_max_oti ar_max_otf aw_max_oti aw_max_otf | [29:24] [23:16] [13:8] [7:0] | Integer part of max outstanding AR addresses (6b) Fraction part of max outstanding AR addresses (8b) Integer part of max outstanding AW addresses (6b) Fraction part of max outstanding AW addresses (8b) |
*_intfpd_max_comb_ot | awar_max_oti awar_max_otf | [14:8] [7:0] | Integer part of max combined outstanding AW/AR addresses (6b) Fraction part of max combined outstanding AW/AR addresses (8b) |
*_intfpd_qos_cntrl | en_awar_rate en_ar_rate en_aw_rate | [7] [6] [5] | Enable combined regulation of outstanding transactions Enable regulation of outstanding read transactions Enable regulation of outstanding write transactions |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...
Info |
---|
QoS throttle control is disabled by default. |
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x0 | PORT5_TYPE PORT4_TYPE PORT3_TYPE | [15:14] [13:12] [11:10] | Port 5 Type: 0x0-BE, 0x1-LL, 0x2-Video Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video | |
0x4 | PORT5_WR_CTRL PORT5_HPR_CTRL PORT5_LPR_CTRL PORT4_WR_CTRL PORT4_HPR_CTRL PORT4_LPR_CTRL PORT3_WR_CTRL PORT3_HPR_CTRL PORT3_LPR_CTRL | [21] [20] [19] [18] [17] [16] [15] [14] [13] | QoS throttle Control on Write channel QoS throttle Control on Read HPR channel QoS throttle Control on Read LPR channel QoS throttle Control on Write channel QoS throttle Control on Read HPR channel QoS throttle Control on Read LPR channel QoS throttle Control on Write channel QoS throttle Control on Read HPR channel QoS throttle Control on Read LPR channel | |
0x8 | VALUE | [6:0] | Read HPR CAM Threshold Level | |
0xC | VALUE | [6:0] | Read LPR CAM Threshold Level | |
0x10 | VALUE | [6:0] | Write CAM Threshold Level |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...
Urgent transactions are enabled by default in FSBL which sets in the [rd|wr]_port_urgent_en bits in the PCFG[R|W]_n registers. Urgent transactions may be issued either through expiring aging counters in the DDRC or through software by writing to the DDRC_URGENT register in the DDR QoS Controller.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x510 | ARURGENT_5 AWURGENT_5 ARURGENT_4 AWURGENT_4 ARURGENT_3 AWURGENT_3 | [13] [12] [11] [10] [9] [8] | Sideband signal to indicate a DDRC Port 5 read queue urgent transaction Sideband signal to indicate a DDRC Port 5 write queue urgent transaction Sideband signal to indicate a DDRC Port 4 read queue urgent transaction Sideband signal to indicate a DDRC Port 4 write queue urgent transaction Sideband signal to indicate a DDRC Port 3 read queue urgent transaction Sideband signal to indicate a DDRC Port 3 write queue urgent transaction |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...
The Port Control registers allow software to enable and disable the DDRC ports. This can be helpful for debugging or dynamically managing ports with software.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x490 | port_en | [0] | Enables port 0 | |
0x540 | port_en | [0] | Enables port 1 | |
0x5F0 | port_en | [0] | Enables port 2 | |
0x6A0 | port_en | [0] | Enables port 3 | |
0x750 | port_en | [0] | Enables port 4 | |
0x800 | port_en | [0] | Enables port 5 |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...
The QoS values on each port are mapped into traffic classes. Variable priority reads and writes (VPR/VPW) have a timer associated with each command. When the timer reaches zero, the command is considered expired and gets elevated to the highest priority whether the command is in an XPI or a CAM. Video traffic class is mapped to VPR/VPW.
Traffic Class | Read QoS Value (default) | Read Priority Mapping | Write QoS Value (default) | Write Priority Mapping |
---|---|---|---|---|
Best Effort (BE) | 0-3 | LPR | 0-7 | NPW |
Video (V) | 4-11 | VPR | 8-15 | VPW |
Low Latency (LL) | 12-15 | HPR | N/A | N/A |
The traffic class mappings are register configurable, however you should not need to modify these. Please see ZU+ Register Reference for details if you need to modify these mappings.
Register | Offset | Description |
---|---|---|
0x6A4 | Map read traffic classes to regions and define separation levels for port 3 | |
0x6AC | Map write traffic classes to regions and define separation levels for port 3 | |
0x754 | Map read traffic classes to regions and define separation levels for port 4 | |
0x75C | Map write traffic classes to regions and define separation levels for port 4 | |
0x804 | Map read traffic classes to regions and define separation levels for port 5 | |
0x80C | Map write traffic classes to regions and define separation levels for port 5 |
Info |
---|
These registers are quasi dynamic group 3 and can only be modified when the DDRC is empty. |
...
Variable priority timeouts can be modified to trade-off latency for throughput.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x6A8 | rqos_map_timeout | [10:0] | Timeout value for read transactions on port 3 | |
0x6B0 | wqos_map_timeout | [10:0] | Timeout value for write transactions on port 3 | |
0x758 | rqos_map_timeout | [10:0] | Timeout value for read transactions on port 4 | |
0x6B0 | wqos_map_timeout | [10:0] | Timeout value for write transactions on port 4 | |
0x808 | rqos_map_timeout | [10:0] | Timeout value for read transactions on port 5 | |
0x810 | wqos_map_timeout | [10:0] | Timeout value for write transactions on port 5 |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
...
Port aging is disabled by default.
Registers | Offset | Field | Bits | Description |
---|---|---|---|---|
0x614 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 3 Initial load value of read aging counters | |
0x618 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 3 Initial load value of read aging counters | |
0x6C4 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 4 Initial load value of read aging counters | |
0x6C8 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 4 Initial load value of read aging counters | |
0x774 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 5 Initial load value of read aging counters | |
0x778 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 5 Initial load value of read aging counters |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
...
The read CAM has 64 command entries which is split into HPR and LPR/VPR sections. This ratio can be changed in the SCHED register if either of these queues are getting saturated. The write CAM is fixed at 64 command entries.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x250 | lpr_num_entries | [13:8] | Number of entries in the low priority transaction store -1 |
There are no HPR masters in the FSBL default QoS settings. So if you are using the default FSBL settings, you may want to allocate all of the read CAM for LPR/VPR.
PERFHPR1, PERFLPR1 and PERFWR1 registers allow you to specify the maximum starve cycles per queue. This value specifies the number of clocks before the queue goes critical. This timeout may be tweaked depending on your traffic patterns and latency requirement. A larger value may increase video or bulk throughput by reducing queue switching, whereas a smaller value may reduce latency by switching to another queue more quickly at the expense of DDR efficiency.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x25C | hpr_xact_run_length hpr_max_starve | [31:24] [15:0] | Number of transactions that get serviced once the HPR queue goes critical or number available (smaller of) Number of clocks the HPR queue can be starved before it goes critical. 0x0 disables starved functionality (not recommended) | |
0x264 | lpr_xact_run_length lpr_max_starve | [31:24] [15:0] | Number of transactions that get serviced once the LPR queue goes critical or number available (smaller of) Number of clocks the LPR queue can be starved before it goes critical. 0x0 disables starved functionality (not recommended) | |
0x26C | w_xact_run_length w_max_starve | [31:24] [15:0] | Number of transactions that get serviced once the WR queue goes critical or number available (smaller of) Number of clocks the WR queue can be starved before it goes critical. 0x0 disables starved functionality (not recommended) |
PERFVPR1 and PERFVPW1 registers allow you to specify a timeout range. This will group commands that are temporally located with an expired VPR/VPW command making them all expired in an attempt to improve DDR utilization.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x274 | vpr_timeout_range | [10:0] | range of the timeout value that is used for grouping the expired VPR commands in the CAM | |
0x278 | vpw_timeout_range | [10:0] | range of the timeout value that is used for grouping the expired VPW commands in the CAM |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
...
By default BG0 is mapped to A6 so consecutive bursts will ping-pong between bank groups to reduces access latency. (This may vary for earlier Vivado versions)
DDR4_64 | A31 | A30 | A29 | A28 | A27 | A26 | A25 | A24 | A23 | A22 | A21 | A20 | A19 | A18 | A17 |
---|
A16
A15
A14
A13
A12
A11
A10
A9
A8
A7
A6
A5
A4
HIF | A28 | A27 | A26 | A25 | A24 | A23 | A22 | A21 | A20 | A19 | A18 | A17 | A16 | A15 | A14 |
PHY | R14 | R13 | R12 | R11 | R10 | R9 | R8 | R7 | R6 | R5 | R4 | R3 | R2 | R1 | R0 |
DDR4_64 | A16 | A15 | A14 | A13 | A12 | A11 | A10 | A9 | A8 | A7 | A6 | A5 | A4 | A3 |
---|
HIF |
A13 |
A12 |
A11 |
A10 |
A9 |
A8 |
A7 |
A6 |
A5 |
A4 |
A3 |
A2 |
A1 |
R4
R3
R2
R1
A0 | ||||||||||||||
PHY | B1 | B0 | BG1 | C9 | C8 | C7 | C6 | C5 | C4 | C3 | BG0 | C2 | C1 | C0 |
Note |
---|
If you are using dynamic DDR configuration with a DIMM, the mapping may differ based upon FSBL configuration. If you want to disable the dynamic DDR configuration set CONFIG.PSU_DYNAMIC_DDR_CONFIG_EN = 0. |
Register | Address Mapping |
---|---|
Rank[0] | |
Bank[2:0] | |
Column[5:2] | |
Column[9:6] | |
Column[11:10] | |
Row[2:0], Row[11] | |
Row[15:12] | |
Row[17:16] | |
Bank Group[1:0] | |
Row[5:2] | |
Row[9:6] | |
Row[10] |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
...
These registers are for debugging and polling the status of the DDRC ports. DBGCAM monitors the CAM levels. PSTAT monitors the XPI outstanding commands.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
0x308 | dbg_w_q_depth dbg_lpr_q_depth dbg_hpr_q_depth | [22:16] [14:8] [6:0] | Write queue depth Low priority read queue depth High priority read queue depth | |
0x3FC | wr_port_busy_5 wr_port_busy_4 wr_port_busy_3 wr_port_busy_2 wr_port_busy_1 wr_port_busy_0 rd_port_busy_5 rd_port_busy_4 rd_port_busy_3 rd_port_busy_2 rd_port_busy_1 rd_port_busy_0 | [21] [20] [19] [18] [17] [16] [5] [4] [3] [2] [1] [0] | Indicates if there are outstanding writes on port 5 Indicates if there are outstanding writes on port 4 Indicates if there are outstanding writes on port 3 Indicates if there are outstanding writes on port 2 Indicates if there are outstanding writes on port 1 Indicates if there are outstanding writes on port 0 Indicates if there are outstanding reads on port 5 Indicates if there are outstanding reads on port 4 Indicates if there are outstanding reads on port 3 Indicates if there are outstanding reads on port 2 Indicates if there are outstanding reads on port 1 Indicates if there are outstanding reads on port 0 |
Info |
---|
These registers are read-only and may be read at run-time. |
...
APU Module base address: 0xFD5C0000
Registers
Register | Offset | Fields | BIts | Description |
---|---|---|---|---|
0x60 | AWQOS ARQOS | [19:16] [3:0] | ACE outgoing AWQOS value (0-15) ACE outgoing ARQOS value (0-15) |
Info |
---|
These registers are dynamic and may be changed at run-time. |
...