Traffic Shaping of HP Ports on Zynq UltraScale+

This article is intended to be a guide for tuning the default configuration to shape the traffic of the HP ports to help meet your system throughput requirements.

Table of Contents

Introduction

Performance through the HP ports is very dependent on the traffic patterns generated by the PL masters as well as non-deterministic traffic patterns driven by software running on the processors. Both sets of masters will be competing for DDR access. The non-deterministic nature of software running on the processors makes it difficult to accurately model for optimal DDR efficiency. However, you may be able to iteratively tune the default configurations to help meet your system performance requirements. This is not an exhaustive list of the available controls, but the most effective knobs to help shape the HP traffic. The data path from the HP ports in the PL to the DDR memory controller is shown below.

This article does not address:

  1. HPC, HPM, ACE, ACP or LPD ports

  2. CCI/QVN enablement

  3. SMMU enablement

PS-PL Interface

The PS-PL interface is comprised of an AXI FIFO interface (AFI) per port to bridge the PS and PL domains. The main controls here are the QoS and the issuing capability per port. The QoS specifies the priority of the transaction which is also used to map the channel into traffic classes in the DDR memory controller. The QoS may be static or dynamic depending on your system requirements. The issuing capability defines how many HP outstanding transactions may be in-flight at a given time.

To maximize performance, the PL devices should be configured with an AXI bus width of 128 bits and an AXI burst length (BL) of 16.

The PS-PL interface will slice AXI burst lengths greater than 16, however this conversion may negatively impact performance when heavily loaded.

AFIFM Port

Base Address

AFIFM Port

Base Address

HP0

0xFD380000 

HP1

0xFD390000

HP2

0xFD3A0000

HP3

0xFD3B0000

Registers

Register

Offset

Bits

Description

Register

Offset

Bits

Description

RDCTRL

0x0

[2]

FABRIC_QOS_EN: 0-Enable static QoS, 1-Enable dynamic QoS (PL AXI sideband pass-through)

RDISSUE

0x4

[3:0]

CAPABILITY: Read/write command issuing capability (0-15); Number of commands minus one

RDQoS

0x8

[3:0]

VALUE: Static QoS value (0-15)

WRCTRL

0x14

[2]

FABRIC_QOS_EN: 0-Enable static QoS, 1-Enable dynamic QoS (PL AXI sideband pass-through)

WRISSUE

0x18

[3:0]

CAPABILITY: Read/write command issuing capability (0-15); Number of commands minus one

WRQoS

0x1c

[3:0]

VALUE: Static QoS value (0-15)

These registers are dynamic and may be changed at run-time.

Example #1: AFIFM QoS: HP0-3 R/W set to Best Effort (0x0) (default)

Example #2: AFIFM QoS: HP0 R/W set to Video (0x7), HP1-3 R/W set to Best Effort (0x0)

FPD Interconnect Switches

The interconnect switches are comprised of the NIC-400 with QoS-400 ARM IP which provides two traffic regulation mechanisms.

  1. Transaction rate regulation

  2. Outstanding transaction regulation

Registers

The FPD General Programmer's View (GPV) Module contains the QoS-400 registers for regulating the AXI traffic through the NIC-400 switches. The table below maps the “top” NIC-400 switches. The “bottom” NIC-400 switches also have their own resister set for regulating traffic as secondary level of control. Regulating the “bottom” switches is not investigated here.

FPD GPV module base address: 0xFD700000

Transaction Rate Regulation

When regulating the traffic with the QoS-400, you must specify one of these sets.

  1. Peak rate, burstiness and average rate

  2. Peak rate only

  3. Burstiness and average rate

HP0 Registers

HP1 Registers

HP2 Registers

HP3 Registers

Description

HP0 Registers

HP1 Registers

HP2 Registers

HP3 Registers

Description

afifm2M_intfpd_aw_p

0x47118

afifm3M_intfpd_aw_p

0x4A118

afifm4M_intfpd_aw_p

0x4B118

afifm5M_intfpd_aw_p

0x4C118

AW channel peak rate

afifm2M_intfpd_aw_b

0x4711C

afifm3M_intfpd_aw_b

0x4A11C

afifm4M_intfpd_aw_b

0x4B11C

afifm5M_intfpd_aw_b

0x4C11C

AW channel burstiness allowance

afifm2M_intfpd_aw_r

0x47120

afifm3M_intfpd_aw_r

0x4A120

afifm4M_intfpd_aw_r

0x4B120

afifm5M_intfpd_aw_r

0x4C120

AW channel average rate

afifm2M_intfpd_ar_p

0x47124

afifm3M_intfpd_ar_p

0x4A124

afifm4M_intfpd_ar_p

0x4B124

afifm5M_intfpd_ar_p

0x4C124

AR channel peak rate

afifm2M_intfpd_ar_b

0x47128

afifm3M_intfpd_ar_b

0x4A128

afifm4M_intfpd_ar_b

0x4B128

afifm5M_intfpd_ar_b

0x4C128

AR channel burstiness allowance

afifm2M_intfpd_ar_r

0x4712C

afifm3M_intfpd_ar_r

0x4A12C

afifm4M_intfpd_ar_r

0x4B12C

afifm5M_intfpd_ar_r

0x4C12C

AR channel average rate

afifm2M_intfpd_qos_cntl

0x4710C

afifm3M_intfpd_qos_cntl

0x4A10C

afifm4M_intfpd_qos_cntl

0x4B10C

afifm5M_intfpd_qos_cntl

0x4C10C

Enable rate regulation

Register

Field

Bits

Description

Register

Field

Bits

Description

*_intfpd_aw_p

aw_p

[31:24]

Channel peak rate (8b fraction of transfers per cycle)

*_intfpd_aw_b

aw_b

[15:0]

Channel burstiness (integer transfers)

*_intfpd_aw_r

aw_r

[31:20]

Channel average rate (12b fraction of transfers per cycle)

*_intfpd_ar_p

ar_p

[31:24]

Channel peak rate (8b fraction of transfers per cycle)

*_intfpd_ar_b

ar_b

[15:0]

Channel burstiness (integer transfers)

*_intfpd_ar_r

ar_r

[31:20]

Channel average rate (12b fraction of transfers per cycle)

*_intfpd_qos_cntrl

en_awar_rate

en_ar_rate

en_aw_rate

[2]

[1]

[0]

Enable combined rate regulation

Enable AR rate regulation

Enable AW rate regulation

Example #3: Transaction rate regulation

Regulate HP0 port to an average rate of 10% of the interconnect max data rate, but allow up to 4 catch-up transactions capped at 15% of the max data rate.

BL = 16

MAX = 533 MTps (8528 MBps)

Tps_avg = 0.1 * 533M = 53.3 MTps (852.8 MBps)

Tps_max = 0.15 * 533M = 79.95 MTps (1279.2 MBps)

Rate_avg = floor (256 / (100 * BL / %BW_avg)) = floor (4096 / (100 * 16 / 10)) = 25 (0x19)

Rate_peak = floor (4096 / (100 * BL / %BW_peak)) = floor (256 / (100 * 16 / 15)) = 2 (0x2)

Burstiness = 4 (0x4)

Due to quantization effects the actual rate will be lower than the target rate.

Rate (avg) = 25 * 100 * 16 / 4096 = 9.76% of MAX => 8528 * 0.0976 = 832.8 MBps

Rate (peak) = 2 * 100 * 16 / 256 = 12.5% of MAX => 8528 * 0.125 = 1066 MBps

Outstanding transaction regulation

In the QoS-400 you may specify the maximum number of outstanding transactions allowed with fractional transaction resolution. The actual number of transactions will modulate between the lower and upper bound.

HP0

HP1

HP2

HP3

Description

HP0

HP1

HP2

HP3

Description

afifm2M_intfpd_max_ot

0x47110

afifm3M_intfpd_max_ot

0x4A110

afifm4M_intfpd_max_ot

0x4B110

afifm5M_intfpd_max_ot

0x4C110

Max number of outstanding transactions

afifm2M_intfpd_max_comb_ot

0x47114

afifm3M_intfpd_max_ot

0x4A114

afifm4M_intfpd_max_ot

0x4B114

afifm5M_intfpd_max_ot

0x4C114

Max number of combined outstanding transactions

afifm2M_intfpd_qos_cntl

0x4710C

afifm3M_intfpd_qos_cntl

0x4A10C

afifm4M_intfpd_qos_cntl

0x4B10C

afifm5M_intfpd_qos_cntl

0x4C10C

Enable outstanding transaction regulation

Register

Field

Bits

Description

Register

Field

Bits

Description

*_intfpd_max_ot

ar_max_oti

ar_max_otf

aw_max_oti

aw_max_otf

[29:24]

[23:16]

[13:8]

[7:0]

Integer part of max outstanding AR addresses (6b)

Fraction part of max outstanding AR addresses (8b)

Integer part of max outstanding AW addresses (6b)

Fraction part of max outstanding AW addresses (8b)

*_intfpd_max_comb_ot

awar_max_oti

awar_max_otf

[14:8]

[7:0]

Integer part of max combined outstanding AW/AR addresses (6b)

Fraction part of max combined outstanding AW/AR addresses (8b)

*_intfpd_qos_cntrl

en_awar_rate

en_ar_rate

en_aw_rate

[7]

[6]

[5]

Enable combined regulation of outstanding transactions

Enable regulation of outstanding read transactions

Enable regulation of outstanding write transactions

Example

Regulate HP0 port to 2.5 outstanding transactions.

oti = 2 (0x2)

otf = 256 * 0.5 = 128 (0x80)

DDR QoS Controller

The DDR QoS controller can throttle low latency and best effort traffic based on the Content Addressable Memory (CAM) levels to prioritize video traffic. It also provides software the ability to trigger urgent AXI transactions on a per port basis to dynamically elevate lower priority traffic.

DDR QoS Control Module base address: 0xFD090000

QoS Throttle Control

The QoS DDR controller is designed to ensure that there is always space available in the CAMs for video traffic. By monitoring the individual CAM levels, the QoS DDR controller can throttle low latency and best effort traffic in favor of video traffic. The DBGCAM register can be used to monitor CAM levels to see if any are saturating.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

PORT_TYPE

0x0

PORT5_TYPE

PORT4_TYPE

PORT3_TYPE

[15:14]

[13:12]

[11:10]

Port 5 Type: 0x0-BE, 0x1-LL, 0x2-Video

Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video

Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video

QOS_CTRL

0x4

PORT5_WR_CTRL

PORT5_HPR_CTRL

PORT5_LPR_CTRL

PORT4_WR_CTRL

PORT4_HPR_CTRL

PORT4_LPR_CTRL

PORT3_WR_CTRL

PORT3_HPR_CTRL

PORT3_LPR_CTRL

[21]

[20]

[19]

[18]

[17]

[16]

[15]

[14]

[13]

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

RD_HPR_THRSLD

0x8

VALUE

[6:0]

Read HPR CAM Threshold Level

RD_LPR_THRSLD

0xC

VALUE

[6:0]

Read LPR CAM Threshold Level

WR_THRSLD

0x10

VALUE

[6:0]

Write CAM Threshold Level

Urgent Transactions

Urgent transactions are enabled by default in FSBL which sets in the [rd|wr]_port_urgent_en bits in the PCFG[R|W]_n registers. Urgent transactions may be issued either through expiring aging counters in the DDRC or through software by writing to the DDRC_URGENT register in the DDR QoS Controller.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

DDRC_URGENT

0x510

ARURGENT_5

AWURGENT_5

ARURGENT_4

AWURGENT_4

ARURGENT_3

AWURGENT_3

[13]

[12]

[11]

[10]

[9]

[8]

Sideband signal to indicate a DDRC Port 5 read queue urgent transaction

Sideband signal to indicate a DDRC Port 5 write queue urgent transaction

Sideband signal to indicate a DDRC Port 4 read queue urgent transaction

Sideband signal to indicate a DDRC Port 4 write queue urgent transaction

Sideband signal to indicate a DDRC Port 3 read queue urgent transaction

Sideband signal to indicate a DDRC Port 3 write queue urgent transaction

DDR Memory Controller

The DDR memory controller is based on the Synopsys uMCTL2 DDR Memory Controller IP. The four HP ports from the PL funnel down to three AXI Port Interfaces (XPI) through the interconnect switch network (NIC/QoS). Since this is a multi-port memory controller, arbitration occurs based on the priority and direction of each request in an effort to optimize the DDR bandwidth. The commands in the XPI are serviced by the Port Arbiter (PA) based on a set of arbitration rules. The selected command is then forwarded to the DDR Controller (DDRC) for scheduling. Inside the DDRC the commands from the PA are queued in either the read or write Content Addressable Memory (CAM). Once a command is selected from either CAM, it is forwarded to the PHY and issued to the DDR memory.

DDRC Module base address: 0xFD070000

Port Control

The Port Control registers allow software to enable and disable the DDRC ports. This can be helpful for debugging or dynamically managing ports with software.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

PCTRL_0

0x490

port_en

[0]

Enables port 0

PCTRL_1

0x540

port_en

[0]

Enables port 1

PCTRL_2

0x5F0

port_en

[0]

Enables port 2

PCTRL_3

0x6A0

port_en

[0]

Enables port 3

PCTRL_4

0x750

port_en

[0]

Enables port 4

PCTRL_5

0x800

port_en

[0]

Enables port 5

Traffic Classes

The QoS values on each port are mapped into traffic classes. Variable priority reads and writes (VPR/VPW) have a timer associated with each command. When the timer reaches zero, the command is considered expired and gets elevated to the highest priority whether the command is in an XPI or a CAM. Video traffic class is mapped to VPR/VPW.

Traffic Class

Read QoS Value (default)

Read Priority Mapping

Write QoS Value (default)

Write Priority Mapping

Traffic Class

Read QoS Value (default)

Read Priority Mapping

Write QoS Value (default)

Write Priority Mapping

Best Effort (BE)

0-3

LPR

0-7

NPW

Video (V)

4-11

VPR

8-15

VPW

Low Latency (LL)

12-15

HPR

N/A

N/A

Register

Offset

Description

Register

Offset

Description

PCFGQOS0_3

0x6A4

Map read traffic classes to regions and define separation levels for port 3

PCFGWQOS0_3

0x6AC

Map write traffic classes to regions and define separation levels for port 3

PCFGQOS0_4

0x754

Map read traffic classes to regions and define separation levels for port 4

PCFGWQOS0_4

0x75C

Map write traffic classes to regions and define separation levels for port 4

PCFGQOS0_5

0x804

Map read traffic classes to regions and define separation levels for port 5

PCFGWQOS0_5

0x80C

Map write traffic classes to regions and define separation levels for port 5

Variable Priority Timeouts

Variable priority timeouts can be modified to trade-off latency for throughput.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

PCFGQOS1_3

0x6A8

rqos_map_timeout

[10:0]

Timeout value for read transactions on port 3

PCFGWQOS1_3

0x6B0

wqos_map_timeout

[10:0]

Timeout value for write transactions on port 3

PCFGQOS1_4

0x758

rqos_map_timeout

[10:0]

Timeout value for read transactions on port 4

PCFGWQOS1_4

0x6B0

wqos_map_timeout

[10:0]

Timeout value for write transactions on port 4

PCFGQOS1_5

0x808

rqos_map_timeout

[10:0]

Timeout value for read transactions on port 5

PCFGWQOS1_5

0x810

wqos_map_timeout

[10:0]

Timeout value for write transactions on port 5

Port Aging

Port aging provides a mechanism to elevate an XPI port's priority when it has an outstanding request, but has not been serviced after a set time. When the aging timer counts down to zero, the port is elevated to the highest priority, which is equivalent to an expired variable priority command.

Registers

Offset

Field

Bits

Description

Registers

Offset

Field

Bits

Description

PCFGR_3 

0x614

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 3

Initial load value of read aging counters

PCFGW_3 

0x618

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 3

Initial load value of read aging counters

PCFGR_4

0x6C4

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 4

Initial load value of read aging counters

PCFGW_4

0x6C8

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 4

Initial load value of read aging counters

PCFGR_5

0x774

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 5

Initial load value of read aging counters

PCFGW_5

0x778

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 5

Initial load value of read aging counters

PA Arbitration

  • Read/write arbitration

    • Reads

      • Stay on reads as long as there is a timed-out read port or an expired VPR with available credit

      • Switch to writes if there is a timed-out write port or expired VPW with available credit

      • Switch to writes when there is no read credit left and there is a pending write with available credit

      • Reads are prioritized over writes when everything else is equal

    • Writes

      • Stay on the writes as long as there is a timed-out write port or expired VPW with available credit

      • Switch to reads if there is a timed-out read port or expired VPR with available credit

      • Switch to reads if there is an HPR read port with available credit

      • Switch to reads when there is no write credit left and there is a pending read with available credit

  • 2-priority level arbitration based on port aging and expired VPR/VPW commands

  • 2-priority level arbitration for read requests based on DDRC read priorities

  • 16-priority level arbitration based external AXI QoS inputs

  • Round-robin arbitration when everything else is equal

Content Addressable Memory (CAM)

The read CAM has 64 command entries which is split into HPR and LPR/VPR sections. This ratio can be changed in the SCHED register if either of these queues are getting saturated. The write CAM is fixed at 64 command entries.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

SCHED

0x250

lpr_num_entries

[13:8]

Number of entries in the low priority transaction store -1

PERFHPR1, PERFLPR1 and PERFWR1 registers allow you to specify the maximum starve cycles per queue. This value specifies the number of clocks before the queue goes critical. This timeout may be tweaked depending on your traffic patterns and latency requirement. A larger value may increase video or bulk throughput by reducing queue switching, whereas a smaller value may reduce latency by switching to another queue more quickly at the expense of DDR efficiency.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

PERFHPR1

0x25C

hpr_xact_run_length

hpr_max_starve

[31:24]

[15:0]

Number of transactions that get serviced once the HPR queue goes critical or number available (smaller of)

Number of clocks the HPR queue can be starved before it goes critical. 0x0 disables starved functionality (not recommended)

PERFLPR1

0x264

lpr_xact_run_length

lpr_max_starve

[31:24]

[15:0]

Number of transactions that get serviced once the LPR queue goes critical or number available (smaller of)

Number of clocks the LPR queue can be starved before it goes critical. 0x0 disables starved functionality (not recommended)

PERFWR1

0x26C

w_xact_run_length

w_max_starve

[31:24]

[15:0]

Number of transactions that get serviced once the WR queue goes critical or number available (smaller of)

Number of clocks the WR queue can be starved before it goes critical. 0x0 disables starved functionality (not recommended)

PERFVPR1 and PERFVPW1 registers allow you to specify a timeout range. This will group commands that are temporally located with an expired VPR/VPW command making them all expired in an attempt to improve DDR utilization.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

PERFVPR1 

0x274

vpr_timeout_range

[10:0]

range of the timeout value that is used for grouping the expired VPR commands in the CAM

PERFVPW1

0x278

vpw_timeout_range

[10:0]

range of the timeout value that is used for grouping the expired VPW commands in the CAM

Address Mapper

The address mapper allow you to map the rank, bank group (DDR4-only), bank, column and row address lines to optimize your memory accesses based on your traffic patterns. The ADDRMAP{0:11} registers map individual address lines to the HIF address which is the word address generated by the XPI.

The ZCU102 has 4 GB DDR4 x64 DIMM with a burst length of 8. The table below shows the default address mapping generated by Vivado.

DDR4_64

A31

A30

A29

A28

A27

A26

A25

A24

A23

A22

A21

A20

A19

A18

A17

DDR4_64

A31

A30

A29

A28

A27

A26

A25

A24

A23

A22

A21

A20

A19

A18

A17

HIF

A28

A27

A26

A25

A24

A23

A22

A21

A20

A19

A18

A17

A16

A15

A14

PHY

R14

R13

R12

R11

R10

R9

R8

R7

R6

R5

R4

R3

R2

R1

R0

 

DDR4_64

A16

A15

A14

A13

A12

A11

A10

A9

A8

A7

A6

A5

A4

A3

DDR4_64

A16

A15

A14

A13

A12

A11

A10

A9

A8

A7

A6

A5

A4

A3

HIF

A13

A12

A11

A10

A9

A8

A7

A6

A5

A4

A3

A2

A1

A0

PHY

B1

B0

BG1

C9

C8

C7

C6

C5

C4

C3

BG0

C2

C1

C0

Register

Address Mapping

Register

Address Mapping

ADDRMAP0

Rank[0]

ADDRMAP1

Bank[2:0]

ADDRMAP2

Column[5:2]

ADDRMAP3

Column[9:6]

ADDRMAP4

Column[11:10]

ADDRMAP5

Row[2:0], Row[11]

ADDRMAP6

Row[15:12]

ADDRMAP7

Row[17:16]

ADDRMAP8

Bank Group[1:0]

ADDRMAP9

Row[5:2]

ADDRMAP10

Row[9:6]

ADDRMAP11

Row[10]

Debug/Status Registers

These registers are for debugging and polling the status of the DDRC ports. DBGCAM monitors the CAM levels. PSTAT monitors the XPI outstanding commands.

Register

Offset

Field

Bits

Description

Register

Offset

Field

Bits

Description

DBGCAM

0x308

dbg_w_q_depth

dbg_lpr_q_depth

dbg_hpr_q_depth

[22:16]

[14:8]

[6:0]

Write queue depth

Low priority read queue depth

High priority read queue depth

PSTAT

0x3FC

wr_port_busy_5

wr_port_busy_4

wr_port_busy_3

wr_port_busy_2

wr_port_busy_1

wr_port_busy_0

rd_port_busy_5

rd_port_busy_4

rd_port_busy_3

rd_port_busy_2

rd_port_busy_1

rd_port_busy_0

[21]

[20]

[19]

[18]

[17]

[16]

[5]

[4]

[3]

[2]

[1]

[0]

Indicates if there are outstanding writes on port 5

Indicates if there are outstanding writes on port 4

Indicates if there are outstanding writes on port 3

Indicates if there are outstanding writes on port 2

Indicates if there are outstanding writes on port 1

Indicates if there are outstanding writes on port 0

Indicates if there are outstanding reads on port 5

Indicates if there are outstanding reads on port 4

Indicates if there are outstanding reads on port 3

Indicates if there are outstanding reads on port 2

Indicates if there are outstanding reads on port 1

Indicates if there are outstanding reads on port 0

Writing Quasi Dynamic Group 3 Registers

Please see the ZU+ TRM UG1085 (1), Ch. 17 DDR Memory Controller, “Group 3: Registers that can be written when controller is empty” for the pseudo code.

APU

You may need to adjust the APU QoS if the HP ports are significantly impacting the APU DDR accesses. Very high HP bandwidth may result in APU software that freezes or runs very slowly.

APU Module base address: 0xFD5C0000

Registers

Register

Offset

Fields

BIts

Description

Register

Offset

Fields

BIts

Description

ACE_CTRL

0x60

AWQOS

ARQOS

[19:16]

[3:0]

ACE outgoing AWQOS value (0-15)

ACE outgoing ARQOS value (0-15)

Test Bench

  • Vitis 2020.1

  • Petalinux/Yocto 2020.1

  • ZCU102

Firmware

PL

The IPI block design for the test bench has a traffic generator (TG) connected to each HP port. Each TG is configured with a 128-bit data bus at 250MHz and operates as a greedy master flooding the FPD interconnect and DDR with equal read and write traffic. The theoretical data rate of each TG is 4 GBps per channel. The only TG flow control is the AXI back pressure.

Software

Linux

Linux is running on the APU. The buffer memory (0x7000_0000) for the TGs is reserved from Linux through the device tree node below. This prevents Linux from virtually mapping this segment into system memory.

/ { reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; reserved: buffer@0 { no-map; reg = <0x0 0x70000000 0x0 0x08000000>; }; }; };

APM Monitor

Running an APM monitor on the RPU in OCM allows us to measure the DDR traffic while running a high level OS like Linux on the APU. Since there is no dependency on the DDR memory, the monitor software will not get blocked from executing no matter how heavy the HP traffic. This approach does not require JTAG to read the APM registers which can be affected by very high HP traffic and also requires physical access to the JTAG connector on the board.

Related Links

  1. Zynq UltraScale+ Devices Register Reference (UG1087)

  2. Zynq UltraScale+ Device Technical Reference Manual (UG1085)

  3. ARM® CoreLink™ NIC-400 Network Interconnect

  4. ARM® CoreLink™ QoS-400 Network Interconnect Advanced Quality of Service

  5. Quality of Service (QoS) in ARM Systems: An Overview, Ashley Stevens, July 2014

© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy