...
Table of Contents
Table of Contents | ||||
---|---|---|---|---|
|
Introduction
Performance through the HP ports is very dependent on the traffic patterns generated by the PL masters as well as non-deterministic traffic patterns driven by software running on the processors. Both sets of masters will be competing for DDR access. The non-deterministic nature of software running on the processors makes it difficult to model and predict and achieve high DDR efficiency. However, you may be able to tweak the default configurations in the PS-PL interface, interconnect switches, DDR memory controller and APU QoS to help meet your system performance requirements. This is not an exhaustive list of the available controls, but the most effective knobs to help shape the HP traffic.
This article does not NOT address:
HPC, HPM, ACE, ACP or LPD ports
CCI/QVN enablement
Impact of SMMU enablement
HP to DDR Data Path
<Add some description here>
...
Traffic Shaping
<Add some description here>
PS-PL Interface
The PS-PL interface is comprised of an AXI FIFO interface (AFI) per port to bridge the PS and PL domains. The main controls here are the QoS and the issuing capability per port. The QoS specifies the priority of the transaction which is also used to map the channel into traffic classes in the DDR memory controller. The QoS may be static or dynamic depending on your system needs. The issuing capability defines how many HP outstanding transactions may be in-flight at a given time.
...
Info |
---|
These registers are dynamic. |
FPD Interconnect Switches
...
Example #1: AFIFM QoS: HP0-3 R/W set to Best Effort (0x0) (default)
...
Example #2: AFIFM QoS: HP0 R/W set to Video (0x7), HP1-3 R/W set to Best Effort (0x0)
...
FPD Interconnect Switches
The interconnect switches are comprised of the NIC-400 with QoS-400 ARM IP which provides two traffic regulation mechanisms.
...
Info |
---|
These registers are dynamic. |
Example #3: Transaction rate regulation
Regulate an HP HP0 port to an average rate of 10% of the interconnect max data rate, but allow up to 4 catch-up transactions capped at 15% of the max data rate.
BL = 16
MAX = 533 MTps (8528 MBps)
TPS_avg = 0.1 * 533M = 53.3 MTps (852.8 MBps)
TPS_max = 0.15 * 533M = 79.95 MTps (1279.2 MBps)
Rate_avg = floor (256 / (100 * BL / %BW_avg)) = floor (4096 / (100 * 16 / 15)) = 38 (0x26)
Rate_peak = floor (4096 / (100 * BL / %BW_peak)) = floor (256 / (100 * 16 / 10)) = 1 (0x1)
Burstiness = 4
...
Outstanding transaction regulation
In the QoS-400 you may specify the maximum number of outstanding transactions allowed including fractional transactions for finer controlwith fractional transaction resolution. The actual number of transactions will modulate between the lower and upper and lower valuebound.
HP0 | HP1 | HP2 | HP3 | Description | ||||
---|---|---|---|---|---|---|---|---|
afifm2M_intfpd_max_ot | 0x47110 | afifm3M_intfpd_max_ot | 0x4A110 | afifm4M_intfpd_max_ot | 0x4B110 | afifm5M_intfpd_max_ot | 0x4C110 | Max number of outstanding transactions |
afifm2M_intfpd_max_comb_ot | 0x47114 | afifm3M_intfpd_max_ot | 0x4A114 | afifm4M_intfpd_max_ot | 0x4B114 | afifm5M_intfpd_max_ot | 0x4C114 | Max number of combined outstanding transactions |
afifm2M_intfpd_qos_cntl | 0x4710C | afifm3M_intfpd_qos_cntl | 0x4A10C | afifm4M_intfpd_qos_cntl | 0x4B10C | afifm5M_intfpd_qos_cntl | 0x4C10C | Enable outstanding transaction regulation |
...
Info |
---|
These registers are dynamic. |
Example
Regulate an HP HP0 port to 2.5 outstanding transactions.
...
The DDR QoS controller can throttle low latency and best effort traffic based on the Content Addressable Memory (CAM) levels to ensure prioritize video traffic does not get blocked. It also provides software the ability to trigger urgent AXI transactions on a per port basis to prevent higher priority traffic from blocking dynamically elevate lower priority traffic.
DDR QoS Control Module base address: 0xFD090000
Urgent Transactions
Urgent transactions are enabled by default in FSBL which sets in the [rd/wr]_port_urgent_en bits in the PCFG[R/W]_n registers. Urgent transactions may be issued either through expiring aging counters in the DDRC or through software by writing to the DDRC_URGENT register in the DDR QoS Controller.
QoS Throttle Control
The QoS DDR controller is designed to ensure that there is always space available in the CAMs for video traffic. By monitoring the individual CAM levels, the QoS DDR controller can throttle low latency and best effort traffic in favor of video traffic. The DBGCAM register can be used to monitor CAM levels to see if any are saturating.
Info |
---|
QoS throttle control is disabled by default |
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
DDRCPORT_URGENTTYPE | 0x5100x0 | ARURGENTPORT5_5TYPEAWURGENT PORT4_5TYPE ARURGENT_4 AWURGENT_4 ARURGENT_3 AWURGENT_3 [13PORT3_TYPE | [15:14] [13:12] [11:10] [10] [9] [8] | Sideband signal to indicate a DDRC Port 5 read queue urgent transaction Sideband signal to indicate a DDRC Port 5 write queue urgent transaction Sideband signal to indicate a DDRC Port 4 read queue urgent transaction Sideband signal to indicate a DDRC Port 4 write queue urgent transaction Sideband signal to indicate a DDRC Port 3 read queue urgent transaction Sideband signal to indicate a DDRC Port 3 write queue urgent transaction |
Info |
---|
These registers are dynamic. |
QoS Throttle Control
The QoS DDR controller is designed to ensure that there is always space available in the CAMs for video traffic. By monitoring the individual CAM levels, the QoS DDR controller can throttle low latency and best effort traffic in favor of video traffic. The DBGCAM register can be used to monitor CAM levels to see if any are saturating.
Info |
---|
QoS throttle control is disabled by default |
...
Register
...
Offset
...
Field
...
Bits
...
Description
...
PORT_TYPE
...
0x0
...
PORT5_TYPE
PORT4_TYPE
PORT3_TYPE
...
[15:14]
[13:12]
[11:10]
...
Port 5 Type: 0x0-BE, 0x1-LL, 0x2-Video
Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video
Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video
...
QOS_CTRL
...
0x4
...
PORT5_WR_CTRL
PORT5_HPR_CTRL
PORT5_LPR_CTRL
PORT4_WR_CTRL
PORT4_HPR_CTRL
PORT4_LPR_CTRL
PORT3_WR_CTRL
PORT3_HPR_CTRL
PORT3_LPR_CTRL
...
[21]
[20]
[19]
[18]
[17]
[16]
[15]
[14]
[13]
...
QoS throttle Control on Write channel
QoS throttle Control on Read HPR channel
QoS throttle Control on Read LPR channel
QoS throttle Control on Write channel
QoS throttle Control on Read HPR channel
QoS throttle Control on Read LPR channel
QoS throttle Control on Write channel
QoS throttle Control on Read HPR channel
QoS throttle Control on Read LPR channel
...
RD_HPR_THRSLD
...
0x8
...
VALUE
...
[6:0]
...
Read HPR CAM Threshold Level
...
RD_LPR_THRSLD
...
0xC
...
VALUE
...
[6:0]
...
Read LPR CAM Threshold Level
...
WR_THRSLD
...
0x10
...
VALUE
...
[6:0]
...
Write CAM Threshold Level
Info |
---|
These registers are dynamic. |
DDR Memory Controller
The DDR memory controller is based on the uMCTL2 DDR Memory Controller IP from Synopsys. The four HP ports funnel down to three ports S3-S5 on the DDR memory controller through the interconnect switch network. Since this is a multi-port memory controller, arbitration occurs based on the priority and direction of each request in an effort to optimize the DDR accesses.
DDRC Module base address: 0xFD070000
Arbitration
The Port Arbiter (PA) is responsible for arbitrating between the AXI Port Interfaces (XPI) and forwarding the commands to the DDR Controller (DDRC) for scheduling.
Read/write arbitration
...
Reads
Stay on reads as long as there is a timed-out read port or an expired VPR with available credit
Switch to writes if there is a timed-out write port or expired-VPW with available credit
Switch to writes when there is no read credit left and there is a pending write with available credit
Reads are prioritized over writes when everything else is equal
Writes
...
Port 5 Type: 0x0-BE, 0x1-LL, 0x2-Video Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video | ||||
QOS_CTRL | 0x4 | PORT5_WR_CTRL PORT5_HPR_CTRL PORT5_LPR_CTRL PORT4_WR_CTRL PORT4_HPR_CTRL PORT4_LPR_CTRL PORT3_WR_CTRL PORT3_HPR_CTRL PORT3_LPR_CTRL | [21] [20] [19] [18] [17] [16] [15] [14] [13] | QoS throttle Control on Write channel QoS throttle Control on Read HPR channel QoS throttle Control on Read LPR channel QoS throttle Control on Write channel QoS throttle Control on Read HPR channel QoS throttle Control on Read LPR channel QoS throttle Control on Write channel QoS throttle Control on Read HPR channel QoS throttle Control on Read LPR channel |
RD_HPR_THRSLD | 0x8 | VALUE | [6:0] | Read HPR CAM Threshold Level |
RD_LPR_THRSLD | 0xC | VALUE | [6:0] | Read LPR CAM Threshold Level |
WR_THRSLD | 0x10 | VALUE | [6:0] | Write CAM Threshold Level |
Info |
---|
These registers are dynamic. |
Urgent Transactions
Urgent transactions are enabled by default in FSBL which sets in the [rd|wr]_port_urgent_en bits in the PCFG[R|W]_n registers. Urgent transactions may be issued either through expiring aging counters in the DDRC or through software by writing to the DDRC_URGENT register in the DDR QoS Controller.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
DDRC_URGENT | 0x510 | ARURGENT_5 AWURGENT_5 ARURGENT_4 AWURGENT_4 ARURGENT_3 AWURGENT_3 | [13] [12] [11] [10] [9] [8] | Sideband signal to indicate a DDRC Port 5 read queue urgent transaction Sideband signal to indicate a DDRC Port 5 write queue urgent transaction Sideband signal to indicate a DDRC Port 4 read queue urgent transaction Sideband signal to indicate a DDRC Port 4 write queue urgent transaction Sideband signal to indicate a DDRC Port 3 read queue urgent transaction Sideband signal to indicate a DDRC Port 3 write queue urgent transaction |
Info |
---|
These registers are dynamic. |
DDR Memory Controller
The DDR memory controller is based on the Synopsys uMCTL2 DDR Memory Controller IP. The four HP ports from the PL funnel down to three AXI Port Interfaces (XPI) through the interconnect switch network. Since this is a multi-port memory controller, arbitration occurs based on the priority and direction of each request in an effort to optimize the DDR bandwidth. The XPI are serviced by the Port Arbiter (PA) and forwarded to the DDR Controller (DDRC) for scheduling. Inside the DDRC the commands from the PA are queued in either the read or write Content Addressable Memory (CAM). Once a command is selected from either CAM, it is forwarded to the PHY and out to the DDR memory.
DDRC Module base address: 0xFD070000
PA Arbitration
Read/write arbitration
Reads
Stay on reads as long as there is a timed-out read port or an expired VPR with available credit
Switch to writes if there is a timed-out write port or expired-VPW with available credit
Switch to writes when there is no read credit left and there is a pending write with available credit
Reads are prioritized over writes when everything else is equal
Writes
Stay on the writes as long as there is a timed-out write port or expired-VPW with available credit
Switch to reads if there is a timed-out read port or expired-VPR with available credit
Switch to reads if there is an HPR read port with available credit
Switch to reads when there is no write credit left and there is a pending read with available credit
2-priority level arbitration based on port aging and expired-VPR/VPW commands
2-priority level arbitration for read requests based on DDRC read priorities (HPR/LPR-VPR)
16-priority level arbitration based external AXI QoS inputs
Round-robin arbitration when everything else is equal
...
Traffic Class | Read QoS Value (default) | Read Priority Mapping | Write QoS Value (default) | Write Priority Mapping |
---|---|---|---|---|
Best Effort (BE) | 0-3 | LPR | 0-7 | NPW |
Video (V) | 4-11 | VPR | 8-15 | VPW |
Low Latency (LL) | 12-15 | HPR | N/A | N/A |
The traffic class mappings are register configurable, however you should not need to modify these. Please see ZU+ Register Reference for details if you need to modify these mappings.
Register |
---|
PCFGQOS0_n: Port 'n' Read QoS Configuration Register 0
PCFGWQOS0_n: Port 'n' Write QoS Configuration Register 0
Info |
---|
These registers are quasi dynamic group 3 and can only be modified when the DDRC is empty. |
These registers are not configurable in the PCW. If you need to remap the traffic classes, you must patch the psu_init.c.
Variable Priority Timeouts
Variable priority timeouts can be modified to trade-off latency for throughput.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
PCFGQOS1_3 | 0x6A8 | rqos_map_timeout | [10:0] | Timeout value for read transactions on port 3 |
PCFGWQOS1_3 | 0x6B0 | wqos_map_timeout | [10:0] | Timeout value for write transactions on port 3 |
PCFGQOS1_4 | 0x758 | rqos_map_timeout | [10:0] | Timeout value for read transactions on port 4 |
PCFGWQOS1_4 | 0x6B0 | wqos_Offset | Description | |
PCFGQOS0_3 | 0x6A4 | Map read traffic classes to regions and define separation levels for port 3 | ||
PCFGWQOS0_3 | 0x6AC | Map write traffic classes to regions and define separation levels for port 3 | ||
PCFGQOS0_4 | 0x754 | Map read traffic classes to regions and define separation levels for port 4 | ||
PCFGWQOS0_4 | 0x75C | Map write traffic classes to regions and define separation levels for port 4 | ||
PCFGQOS0_5 | 0x804 | Map read traffic classes to regions and define separation levels for port 5 | ||
PCFGWQOS0_5 | 0x80C | Map write traffic classes to regions and define separation levels for port 5 |
Info |
---|
These registers are quasi dynamic group 3 and can only be modified when the DDRC is empty. |
These registers are not configurable in the PCW. If you need to remap the traffic classes, you must patch the psu_init.c.
Variable Priority Timeouts
Variable priority timeouts can be modified to trade-off latency for throughput.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
PCFGQOS1_3 | 0x6A8 | rqos_map_timeout | [10:0] | Timeout value for write read transactions on port 43 |
PCFGQOS1PCFGWQOS1_53 | 0x8080x6B0 | rqoswqos_map_timeout | [10:0] | Timeout value for read write transactions on port 53 |
PCFGWQOS1PCFGQOS1_54 | 0x8100x758 | wqosrqos_map_timeout | [10:0] | Timeout value for write read transactions on port 54 |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
These registers are not configurable in the PCW, however psu_init configures with a default value. If you want to modify the timeout values, you must patch the psu_init.c.
Port Aging
Port aging provides a mechanism to elevate an XPI port's priority when it has a request, but has not been serviced. When the aging timer counts down to zero, the port is elevated to the highest priority equivalent to an expired variable priority command.
Port aging is disabled by default
Registers | Offset | Field | Bits | Description | |
---|---|---|---|---|---|
PCFGR_3 | 0x614 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 3 Initial load value of read aging counters | |
PCFGW_3 | 0x618 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 3 Initial load value of read aging counters | |
PCFGR_4 | 0x6C4PCFGWQOS1_4 | 0x6B0 | wqos_map_timeout | [10:0] | Timeout value for write transactions on port 4 |
PCFGQOS1_5 | 0x808 | rqos_map_timeout | [10:0] | Timeout value for read transactions on port 5 | |
PCFGWQOS1_5 | 0x810 | wqos_map_timeout | [10:0] | Timeout value for write transactions on port 5 |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
These registers are not configurable in the PCW, however psu_init configures with a default value. If you want to modify the timeout values, you must patch the psu_init.c.
Port Aging
Port aging provides a mechanism to elevate an XPI port's priority when it has an outstanding request, but has not been serviced after a set time. When the aging timer counts down to zero, the port is elevated to the highest priority, which is equivalent to an expired variable priority command.
Port aging is disabled by default
Registers | Offset | Field | Bits | Description |
---|---|---|---|---|
PCFGR_3 | 0x614 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 43 Initial load value of read aging counters |
PCFGW_43 | 0x6C80x618 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 43 Initial load value of read aging counters |
PCFGR_54 | 0x7740x6C4 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 54 Initial load value of read aging counters |
PCFGW_54 | 0x7780x6C8 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 54 Initial load value of read aging counters |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
These registers are not configurable in the PCW. If you want to implement port aging, you must patch the psu_init.c.
Address Mapper
The address mapper allow you to map the rank, bank group (DDR4-only), bank, column and row address lines to optimize your memory accesses based on your traffic patterns. The twelve ADDRMAP registers map individual address lines to the HIF address which is the address generated by the XPI.
The HIF address is a word address. The DDR address is a byte address.
The ZCU102 has 4 GB DDR4 x64 DIMM with a burst length of 8. The table below shows the default address mapping generated by Vivado.
By default BG0 is mapped to A6 so consecutive bursts will ping-pong between bank groups to reduces access latency. (This may not be true for earlier Vivado versions)
DDR4_64 | A31 | A30 | A29 | A28 | A27 | A26 | A25 | A24 | A23 | A22 | A21 | A20 | A19 | A18 | A17 | A16 | A15 | A14 | A13 | A12 | A11 | A10 | A9 | A8 | A7 | A6 | A5 | A4 | A3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HIF | A28 | A27 | A26 | A25 | A24 | A23 | A22 | A21 | PCFGR_5 | 0x774 | rd_port_aging_en rd_port_priority | [12] [9:0] | Enable aging function on read port 5 Initial load value of read aging counters | ||||||||||||||||
PCFGW_5 | 0x778 | wr_port_aging_en wr_port_priority | [12] [9:0] | Enable aging function on write port 5 Initial load value of read aging counters |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
These registers are not configurable in the PCW. If you want to implement port aging, you must patch the psu_init.c.
Content Addressable Memory (CAM)
The read CAM has 64 command entries which is split into HPR and LPR/VPR sections. This ratio can be changed in the SCHED register if either of these queues are getting saturated. The write CAM is fixed at 64 command entries.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
SCHED | 0x250 | lpr_num_entries | [13:8] | Number of entries in the low priority transaction store minus one. Number of entries in high priority transaction store = 64 - (lpr_num_entries + 1) |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
These registers are not configurable in the PCW. If you want to reallocate the HPR and LPR transaction stores in the read CAM, you must patch the psu_init.c.
Address Mapper
The address mapper allow you to map the rank, bank group (DDR4-only), bank, column and row address lines to optimize your memory accesses based on your traffic patterns. The twelve ADDRMAP registers map individual address lines to the HIF address which is the address generated by the XPI.
The HIF address is a word address. The DDR address is a byte address.
The ZCU102 has 4 GB DDR4 x64 DIMM with a burst length of 8. The table below shows the default address mapping generated by Vivado.
By default BG0 is mapped to A6 so consecutive bursts will ping-pong between bank groups to reduces access latency. (This may vary for earlier Vivado versions)
DDR4_64 | A31 | A30 | A29 | A28 | A27 | A26 | A25 | A24 | A23 | A22 | A21 | A20 | A19 | A18 | A17 | A16 | A15 | A14 | A13 | A12 | A11 | A10 | A9 | A8 | A7 | A6 | A5 | A4 | A3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A2HIF | A1A28 | A0A27 | PHYA26 | R14A25 | R13A24 | R12A23 | R11A22 | R10 | A21 | A20 | A19 | A18 | A17 | A16 | A15 | A14 | A13 | A12 | A11 | A10 | A9 | A8 | A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0 |
PHY | R14 | R13 | R12 | R11 | R10 | R9 | R8 | R7 | R6 | R5 | R4 | R3 | R2 | R1 | R0 | B1 | B0 | BG1 | C9 | C8 | C7 | C6 | C5 | C4 | C3 | BG0 | C2 | C1 | C0 | |
Register | Address Mapping | |||||||||||||||||||||||||||||
ADDRMAP0 | Rank | ADDRMAP1 | Bank
Note |
---|
If you are using dynamic DDR configuration with a DIMM, the mapping may differ based upon FSBL. If you want to disable the dynamic DDR configuration set CONFIG.PSU_DYNAMIC_DDR_CONFIG_EN = 0. |
Register | Address Mapping |
---|---|
ADDRMAP0 | Rank |
ADDRMAP1 | Bank |
ADDRMAP{2:4} | Column |
ADDRMAP{5:7} | Row |
ADDRMAP8 | Bank Group |
ADDRMAP{9:11} | Row |
...
These registers are not fully configurable in the PCW. If you want to remap the address lines, you must patch the psu_init.c.
Content Addressable Memory (CAM)
...
Debug/Status Registers
These registers are for debugging and polling the status of the DDRC ports. DBGCAM monitors the CAM levels, while PSTAT monitors the XPI outstanding commands.
Register | Offset | Field | Bits | Description |
---|---|---|---|---|
DBGCAM | ||||
SCHED | 0x250 | lpr_num_entries | [13:8] | Number of entries in the low priority transaction store minus one. Number of entries in high priority transaction store = 64 - (lpr_num_entries + 1) |
Info |
---|
These registers are static and must be set while the DDRC is reset. |
These registers are not configurable in the PCW. If you want to reallocate the HPR and LPR transaction stores in the read CAM, you must patch the psu_init.c.
Debug/Status Registers
These registers are for debugging and polling the status of the DDRC ports. DBGCAM monitors the CAM levels, while PSTAT monitors the XPI outstanding commands.
Register | Offset | Field | Bits | Description | |||
---|---|---|---|---|---|---|---|
DBGCAM | 0x308 | dbg_w_q_depth dbg_lpr_q_depth dbg_hpr_q_depth | [22:16] [14:8] [6:0] | Write queue depth Low priority read queue depth High priority read queue depth | |||
PSTAT | 0x3FC | wr_port_busy_5 wr_port_busy_4 wr_port_busy_3 wr_port_busy_2 wr_port_busy_1 wr_port_busy_0 rd_port_busy_5 rd_port_busy_4 rd_port_busy_3 rd_port_busy_2 rd_port_busy_1 rd_port_busy_0 | [21] [20] [19] [18] [17] [16] [5] [4] [3] [2] [1] [0] | Indicates if there are outstanding writes for 0x308 | dbg_w_q_depth dbg_lpr_q_depth dbg_hpr_q_depth | [22:16] [14:8] [6:0] | Write queue depth Low priority read queue depth High priority read queue depth |
PSTAT | 0x3FC | wr_port_busy_5 wr_port_busy_4 wr_port_busy_3 wr_port_busy_2 wr_port_busy_1 wr_port_busy_0 rd_port_busy_5 rd_port_busy_4 rd_port_busy_3 rd_port_busy_2 rd_port_busy_1 rd_port_busy_0 | [21] [20] [19] [18] [17] [16] [5] [4] [3] [2] [1] [0] | Indicates if there are outstanding writes on port 5 Indicates if there are outstanding writes on port 4 Indicates if there are outstanding writes on port 3 Indicates if there are outstanding writes on port 2 Indicates if there are outstanding writes on port 1 Indicates if there are outstanding writes on port 0 Indicates if there are outstanding reads on port 5 Indicates if there are outstanding writes for reads on port 4 Indicates if there are outstanding writes for reads on port 3 Indicates if there are outstanding writes for port 2 Indicates if there are outstanding writes for port 1 Indicates if there are outstanding writes for port 0reads on port 2 Indicates if there are outstanding reads for port 5 Indicates if there are outstanding reads for port 4 Indicates if there are outstanding reads for port 3 Indicates if there are outstanding reads for port 2 Indicates if there are outstanding reads for on port 1 Indicates if there are outstanding reads for on port 0 |
Info |
---|
These registers are read-only |
...
You may need to adjust the APU QoS if the HP ports are significantly impacting the APU DDR accesses. The result may be Very high HP bandwidth may result in APU software that freezes or runs very slowly. It is routine to set the APU QoS to 0xE which preserved the highest priority for dynamic priority escalation.
Info |
---|
The APU read and write QoS priority is set to lowest priority (00x0) by FSBL |
APU Module base address: 0xFD5C0000
...
Test Bench
Vitis 2020.1
Petalinux/Yocto 2020.1
ZCU102
...
Hardware
PL
The IPI block design for the test bench has a traffic generator (TG) connected to each HP port. Each TG is configured with a 128-bit data bus at 200MHz 250MHz and operates as a greedy master flooding the FPD interconnect and DDR with equal read and write traffic. The theoretical data rate of each TG is 3.2 4 GBps per channel. The default only TG flow control is the back pressure from the AXI bus.
...
...
Software
Linux
Monitor software Linux is running on the RPU echos the data measured from each hard APM to a terminal. Running the monitor on the RPU from OCM allows us to take snapshots of the DDR traffic when running a high level OS like Linux on the APU. Since there is no dependency on the DDR memory, the monitor software will not get blocked from accessing its memory. An advantage of this approach is it does not require JTAG to read the APM registers.
Traffic Shaping on ZCU102
Start off with 4 greedy masters and try to shape the traffic to meet an arbitrary requirement.
Calculate max DDR bandwidth of 17 GBps and efficiency with defaults.
AFI
Latency sensitive masters (APU) (Highest default priority, 14)
Real-time master (HP0:HDMI) (Video Priority, 11)
Isochronous (HP1:Video) (Video Priority, 8)
Greedy master (HP2:HDD) (BE, 3)
Greedy master (HP3:DMA) (BE 0)
QoS-400
Limit video ports
Limit greedy master
Urgent
Set one of greedy masters as urgent in software
Suggest setting port aging
Address Mapper
...
APU. The buffer memory for the TGs is reserved from Linux through the device tree node below.
Code Block |
---|
/ {
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
reserved: buffer@0 {
no-map;
reg = <0x0 0x70000000 0x0 0x08000000>;
};
};
}; |
APM Monitor
APM monitor software running on the RPU echos the data measured from each hard APM to a terminal. Running a monitor on the RPU from OCM allows us to take snapshots of the DDR traffic when running a high level OS like Linux on the APU. Since there is no dependency on the DDR memory, the monitor software will not get blocked from executing no matter how heavy the HP traffic. An advantage of this approach is it does not require JTAG to read the APM registers which can be affected by very high traffic.
Conclusions
<TODO: Add conclusions>
Related Links
Zynq UltraScale+ Device Technical Reference Manual (UG1085)
ARM® CoreLink™ NIC-400 Network Interconnect
ARM® CoreLink™ QoS-400 Network Interconnect Advanced Quality of Service
Quality of Service (QoS) in ARM Systems: An Overview, Ashley Stevens, July 2014
...