Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents

Table of Contents
maxLevel2
excludeTable of Contents

Introduction

Performance through the HP ports is very dependent on the traffic patterns generated by the PL masters as well as non-deterministic traffic patterns driven by software running on the processors. Both sets of masters will be competing for DDR access. The non-deterministic nature of software running on the processors makes it difficult to model and predict and achieve high DDR efficiency. However, you may be able to tweak the default configurations in the PS-PL interface, interconnect switches, DDR memory controller and APU QoS to help meet your system performance requirements. This is not an exhaustive list of the available controls, but the most effective knobs to help shape the HP traffic.

This article does not NOT address:

  1. HPC, HPM, ACE, ACP or LPD ports

  2. CCI/QVN enablement

  3. Impact of SMMU enablement

HP to DDR Data Path

<Add some description here>

...

Traffic Shaping

<Add some description here>

PS-PL Interface

The PS-PL interface is comprised of an AXI FIFO interface (AFI) per port to bridge the PS and PL domains. The main controls here are the QoS and the issuing capability per port. The QoS specifies the priority of the transaction which is also used to map the channel into traffic classes in the DDR memory controller. The QoS may be static or dynamic depending on your system needs. The issuing capability defines how many HP outstanding transactions may be in-flight at a given time.

...

Info

These registers are dynamic.

FPD Interconnect Switches

...

Example #1: AFIFM QoS: HP0-3 R/W set to Best Effort (0x0) (default)

...

Example #2: AFIFM QoS: HP0 R/W set to Video (0x7), HP1-3 R/W set to Best Effort (0x0)

...

FPD Interconnect Switches

The interconnect switches are comprised of the NIC-400 with QoS-400 ARM IP which provides two traffic regulation mechanisms.

...

Info

These registers are dynamic.

Example #3: Transaction rate regulation

Regulate an HP HP0 port to an average rate of 10% of the interconnect max data rate, but allow up to 4 catch-up transactions capped at 15% of the max data rate.

BL = 16

MAX = 533 MTps (8528 MBps)

TPS_avg = 0.1 * 533M = 53.3 MTps (852.8 MBps)

TPS_max = 0.15 * 533M = 79.95 MTps (1279.2 MBps)

Rate_avg = floor (256 / (100 * BL / %BW_avg)) = floor (4096 / (100 * 16 / 15)) = 38 (0x26)

Rate_peak = floor (4096 / (100 * BL / %BW_peak)) = floor (256 / (100 * 16 / 10)) = 1 (0x1)

Burstiness = 4

...

Outstanding transaction regulation

In the QoS-400 you may specify the maximum number of outstanding transactions allowed including fractional transactions for finer controlwith fractional transaction resolution. The actual number of transactions will modulate between the lower and upper and lower valuebound.

HP0

HP1

HP2

HP3

Description

afifm2M_intfpd_max_ot

0x47110

afifm3M_intfpd_max_ot

0x4A110

afifm4M_intfpd_max_ot

0x4B110

afifm5M_intfpd_max_ot

0x4C110

Max number of outstanding transactions

afifm2M_intfpd_max_comb_ot

0x47114

afifm3M_intfpd_max_ot

0x4A114

afifm4M_intfpd_max_ot

0x4B114

afifm5M_intfpd_max_ot

0x4C114

Max number of combined outstanding transactions

afifm2M_intfpd_qos_cntl

0x4710C

afifm3M_intfpd_qos_cntl

0x4A10C

afifm4M_intfpd_qos_cntl

0x4B10C

afifm5M_intfpd_qos_cntl

0x4C10C

Enable outstanding transaction regulation

...

Info

These registers are dynamic.

Example

Regulate an HP HP0 port to 2.5 outstanding transactions.

...

The DDR QoS controller can throttle low latency and best effort traffic based on the Content Addressable Memory (CAM) levels to ensure prioritize video traffic does not get blocked. It also provides software the ability to trigger urgent AXI transactions on a per port basis to prevent higher priority traffic from blocking dynamically elevate lower priority traffic.

DDR QoS Control Module base address: 0xFD090000

Urgent Transactions

Urgent transactions are enabled by default in FSBL which sets in the [rd/wr]_port_urgent_en bits in the PCFG[R/W]_n registers. Urgent transactions may be issued either through expiring aging counters in the DDRC or through software by writing to the DDRC_URGENT register in the DDR QoS Controller.

QoS Throttle Control

The QoS DDR controller is designed to ensure that there is always space available in the CAMs for video traffic. By monitoring the individual CAM levels, the QoS DDR controller can throttle low latency and best effort traffic in favor of video traffic. The DBGCAM register can be used to monitor CAM levels to see if any are saturating.

Info

QoS throttle control is disabled by default

Register

Offset

Field

Bits

Description

DDRCPORT_URGENTTYPE

0x5100x0

ARURGENTPORT5_5TYPEAWURGENT

PORT4_5TYPE

ARURGENT_4

AWURGENT_4

ARURGENT_3

AWURGENT_3

[13PORT3_TYPE

[15:14]

[13:12]

[11:10]

[10]

[9]

[8]

Sideband signal to indicate a DDRC Port 5 read queue urgent transaction

Sideband signal to indicate a DDRC Port 5 write queue urgent transaction

Sideband signal to indicate a DDRC Port 4 read queue urgent transaction

Sideband signal to indicate a DDRC Port 4 write queue urgent transaction

Sideband signal to indicate a DDRC Port 3 read queue urgent transaction

Sideband signal to indicate a DDRC Port 3 write queue urgent transaction

Info

These registers are dynamic.

QoS Throttle Control

The QoS DDR controller is designed to ensure that there is always space available in the CAMs for video traffic. By monitoring the individual CAM levels, the QoS DDR controller can throttle low latency and best effort traffic in favor of video traffic. The DBGCAM register can be used to monitor CAM levels to see if any are saturating.

Info

QoS throttle control is disabled by default

...

Register

...

Offset

...

Field

...

Bits

...

Description

...

PORT_TYPE

...

0x0

...

PORT5_TYPE

PORT4_TYPE

PORT3_TYPE

...

[15:14]

[13:12]

[11:10]

...

Port 5 Type: 0x0-BE, 0x1-LL, 0x2-Video

Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video

Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video

...

QOS_CTRL

...

0x4

...

PORT5_WR_CTRL

PORT5_HPR_CTRL

PORT5_LPR_CTRL

PORT4_WR_CTRL

PORT4_HPR_CTRL

PORT4_LPR_CTRL

PORT3_WR_CTRL

PORT3_HPR_CTRL

PORT3_LPR_CTRL

...

[21]

[20]

[19]

[18]

[17]

[16]

[15]

[14]

[13]

...

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

...

RD_HPR_THRSLD

...

0x8

...

VALUE

...

[6:0]

...

Read HPR CAM Threshold Level

...

RD_LPR_THRSLD

...

0xC

...

VALUE

...

[6:0]

...

Read LPR CAM Threshold Level

...

WR_THRSLD

...

0x10

...

VALUE

...

[6:0]

...

Write CAM Threshold Level

Info

These registers are dynamic.

DDR Memory Controller

The DDR memory controller is based on the uMCTL2 DDR Memory Controller IP from Synopsys. The four HP ports funnel down to three ports S3-S5 on the DDR memory controller through the interconnect switch network. Since this is a multi-port memory controller, arbitration occurs based on the priority and direction of each request in an effort to optimize the DDR accesses.

DDRC Module base address: 0xFD070000

Arbitration

The Port Arbiter (PA) is responsible for arbitrating between the AXI Port Interfaces (XPI) and forwarding the commands to the DDR Controller (DDRC) for scheduling.

Read/write arbitration

...

Reads

  • Stay on reads as long as there is a timed-out read port or an expired VPR with available credit

  • Switch to writes if there is a timed-out write port or expired-VPW with available credit

  • Switch to writes when there is no read credit left and there is a pending write with available credit

  • Reads are prioritized over writes when everything else is equal

Writes

...

Port 5 Type: 0x0-BE, 0x1-LL, 0x2-Video

Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video

Port 4 Type: 0x0-BE, 0x1-LL, 0x2-Video

QOS_CTRL

0x4

PORT5_WR_CTRL

PORT5_HPR_CTRL

PORT5_LPR_CTRL

PORT4_WR_CTRL

PORT4_HPR_CTRL

PORT4_LPR_CTRL

PORT3_WR_CTRL

PORT3_HPR_CTRL

PORT3_LPR_CTRL

[21]

[20]

[19]

[18]

[17]

[16]

[15]

[14]

[13]

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

QoS throttle Control on Write channel

QoS throttle Control on Read HPR channel

QoS throttle Control on Read LPR channel

RD_HPR_THRSLD

0x8

VALUE

[6:0]

Read HPR CAM Threshold Level

RD_LPR_THRSLD

0xC

VALUE

[6:0]

Read LPR CAM Threshold Level

WR_THRSLD

0x10

VALUE

[6:0]

Write CAM Threshold Level

Info

These registers are dynamic.

Urgent Transactions

Urgent transactions are enabled by default in FSBL which sets in the [rd|wr]_port_urgent_en bits in the PCFG[R|W]_n registers. Urgent transactions may be issued either through expiring aging counters in the DDRC or through software by writing to the DDRC_URGENT register in the DDR QoS Controller.

Register

Offset

Field

Bits

Description

DDRC_URGENT

0x510

ARURGENT_5

AWURGENT_5

ARURGENT_4

AWURGENT_4

ARURGENT_3

AWURGENT_3

[13]

[12]

[11]

[10]

[9]

[8]

Sideband signal to indicate a DDRC Port 5 read queue urgent transaction

Sideband signal to indicate a DDRC Port 5 write queue urgent transaction

Sideband signal to indicate a DDRC Port 4 read queue urgent transaction

Sideband signal to indicate a DDRC Port 4 write queue urgent transaction

Sideband signal to indicate a DDRC Port 3 read queue urgent transaction

Sideband signal to indicate a DDRC Port 3 write queue urgent transaction

Info

These registers are dynamic.

DDR Memory Controller

The DDR memory controller is based on the Synopsys uMCTL2 DDR Memory Controller IP. The four HP ports from the PL funnel down to three AXI Port Interfaces (XPI) through the interconnect switch network. Since this is a multi-port memory controller, arbitration occurs based on the priority and direction of each request in an effort to optimize the DDR bandwidth. The XPI are serviced by the Port Arbiter (PA) and forwarded to the DDR Controller (DDRC) for scheduling. Inside the DDRC the commands from the PA are queued in either the read or write Content Addressable Memory (CAM). Once a command is selected from either CAM, it is forwarded to the PHY and out to the DDR memory.

DDRC Module base address: 0xFD070000

PA Arbitration

  • Read/write arbitration

    • Reads

      • Stay on reads as long as there is a timed-out read port or an expired VPR with available credit

      • Switch to writes if there is a timed-out write port or expired-VPW with available credit

      • Switch to writes when there is no read credit left and there is a pending write with available credit

      • Reads are prioritized over writes when everything else is equal

    • Writes

      • Stay on the writes as long as there is a timed-out write port or expired-VPW with available credit

      • Switch to reads if there is a timed-out read port or expired-VPR with available credit

      • Switch to reads if there is an HPR read port with available credit

      • Switch to reads when there is no write credit left and there is a pending read with available credit

  • 2-priority level arbitration based on port aging and expired-VPR/VPW commands

  • 2-priority level arbitration for read requests based on DDRC read priorities (HPR/LPR-VPR)

  • 16-priority level arbitration based external AXI QoS inputs

  • Round-robin arbitration when everything else is equal

...

Traffic Class

Read QoS Value (default)

Read Priority Mapping

Write QoS Value (default)

Write Priority Mapping

Best Effort (BE)

0-3

LPR

0-7

NPW

Video (V)

4-11

VPR

8-15

VPW

Low Latency (LL)

12-15

HPR

N/A

N/A

The traffic class mappings are register configurable, however you should not need to modify these. Please see ZU+ Register Reference for details if you need to modify these mappings.

Register

  • PCFGQOS0_n: Port 'n' Read QoS Configuration Register 0

  • PCFGWQOS0_n: Port 'n' Write QoS Configuration Register 0

Info

These registers are quasi dynamic group 3 and can only be modified when the DDRC is empty.

These registers are not configurable in the PCW. If you need to remap the traffic classes, you must patch the psu_init.c.

Variable Priority Timeouts

Variable priority timeouts can be modified to trade-off latency for throughput.

wqos_

Register

Offset

Field

Bits

Description

PCFGQOS1_3

0x6A8

rqos_map_timeout

[10:0]

Timeout value for read transactions on port 3

PCFGWQOS1_3

0x6B0

wqos_map_timeout

[10:0]

Timeout value for write transactions on port 3

PCFGQOS1_4

0x758

rqos_map_timeout

[10:0]

Timeout value for read transactions on port 4

PCFGWQOS1_4

0x6B0

Offset

Description

PCFGQOS0_3

0x6A4

Map read traffic classes to regions and define separation levels for port 3

PCFGWQOS0_3

0x6AC

Map write traffic classes to regions and define separation levels for port 3

PCFGQOS0_4

0x754

Map read traffic classes to regions and define separation levels for port 4

PCFGWQOS0_4

0x75C

Map write traffic classes to regions and define separation levels for port 4

PCFGQOS0_5

0x804

Map read traffic classes to regions and define separation levels for port 5

PCFGWQOS0_5

0x80C

Map write traffic classes to regions and define separation levels for port 5

Info

These registers are quasi dynamic group 3 and can only be modified when the DDRC is empty.

These registers are not configurable in the PCW. If you need to remap the traffic classes, you must patch the psu_init.c.

Variable Priority Timeouts

Variable priority timeouts can be modified to trade-off latency for throughput.

Register

Offset

Field

Bits

Description

PCFGQOS1_3

0x6A8

rqos_map_timeout

[10:0]

Timeout value for write read transactions on port 43

PCFGQOS1PCFGWQOS1_53

0x8080x6B0

rqoswqos_map_timeout

[10:0]

Timeout value for read write transactions on port 53

PCFGWQOS1PCFGQOS1_54

0x8100x758

wqosrqos_map_timeout

[10:0]

Timeout value for write read transactions on port 54

Info

These registers are static and must be set while the DDRC is reset.

These registers are not configurable in the PCW, however psu_init configures with a default value. If you want to modify the timeout values, you must patch the psu_init.c.

Port Aging

Port aging provides a mechanism to elevate an XPI port's priority when it has a request, but has not been serviced. When the aging timer counts down to zero, the port is elevated to the highest priority equivalent to an expired variable priority command.

Port aging is disabled by default

Registers

Offset

Field

Bits

Description

PCFGR_3 

0x614

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 3

Initial load value of read aging counters

PCFGW_3 

0x618

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 3

Initial load value of read aging counters

PCFGR_4

0x6C4PCFGWQOS1_4

0x6B0

wqos_map_timeout

[10:0]

Timeout value for write transactions on port 4

PCFGQOS1_5

0x808

rqos_map_timeout

[10:0]

Timeout value for read transactions on port 5

PCFGWQOS1_5

0x810

wqos_map_timeout

[10:0]

Timeout value for write transactions on port 5

Info

These registers are static and must be set while the DDRC is reset.

These registers are not configurable in the PCW, however psu_init configures with a default value. If you want to modify the timeout values, you must patch the psu_init.c.

Port Aging

Port aging provides a mechanism to elevate an XPI port's priority when it has an outstanding request, but has not been serviced after a set time. When the aging timer counts down to zero, the port is elevated to the highest priority, which is equivalent to an expired variable priority command.

Port aging is disabled by default

Registers

Offset

Field

Bits

Description

PCFGR_3 

0x614

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 43

Initial load value of read aging counters

PCFGW_4

0x6C80x618

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 43

Initial load value of read aging counters

PCFGR_54

0x7740x6C4

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 54

Initial load value of read aging counters

PCFGW_54

0x7780x6C8

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 54

Initial load value of read aging counters

Info

These registers are static and must be set while the DDRC is reset.

These registers are not configurable in the PCW. If you want to implement port aging, you must patch the psu_init.c.

Address Mapper

The address mapper allow you to map the rank, bank group (DDR4-only), bank, column and row address lines to optimize your memory accesses based on your traffic patterns. The twelve ADDRMAP registers map individual address lines to the HIF address which is the address generated by the XPI.

The HIF address is a word address. The DDR address is a byte address.

The ZCU102 has 4 GB DDR4 x64 DIMM with a burst length of 8. The table below shows the default address mapping generated by Vivado.

By default BG0 is mapped to A6 so consecutive bursts will ping-pong between bank groups to reduces access latency. (This may not be true for earlier Vivado versions)

DDR4_64

A31

A30

A29

A28

A27

A26

A25

A24

A23

A22

A21

A20

A19

A18

A17

A16

A15

A14

A13

A12

A11

A10

A9

A8

A7

A6

A5

A4

A3

HIF

A28

A27

A26

A25

A24

A23

A22

A21

PCFGR_5

0x774

rd_port_aging_en

rd_port_priority

[12]

[9:0]

Enable aging function on read port 5

Initial load value of read aging counters

PCFGW_5

0x778

wr_port_aging_en

wr_port_priority

[12]

[9:0]

Enable aging function on write port 5

Initial load value of read aging counters

Info

These registers are static and must be set while the DDRC is reset.

These registers are not configurable in the PCW. If you want to implement port aging, you must patch the psu_init.c.

Content Addressable Memory (CAM)

The read CAM has 64 command entries which is split into HPR and LPR/VPR sections. This ratio can be changed in the SCHED register if either of these queues are getting saturated. The write CAM is fixed at 64 command entries.

Register

Offset

Field

Bits

Description

SCHED

0x250

lpr_num_entries

[13:8]

Number of entries in the low priority transaction store minus one. Number of entries in high priority transaction store = 64 - (lpr_num_entries + 1)

Info

These registers are static and must be set while the DDRC is reset.

These registers are not configurable in the PCW. If you want to reallocate the HPR and LPR transaction stores in the read CAM, you must patch the psu_init.c.

Address Mapper

The address mapper allow you to map the rank, bank group (DDR4-only), bank, column and row address lines to optimize your memory accesses based on your traffic patterns. The twelve ADDRMAP registers map individual address lines to the HIF address which is the address generated by the XPI.

The HIF address is a word address. The DDR address is a byte address.

The ZCU102 has 4 GB DDR4 x64 DIMM with a burst length of 8. The table below shows the default address mapping generated by Vivado.

By default BG0 is mapped to A6 so consecutive bursts will ping-pong between bank groups to reduces access latency. (This may vary for earlier Vivado versions)

Bank

DDR4_64

A31

A30

A29

A28

A27

A26

A25

A24

A23

A22

A21

A20

A19

A18

A17

A16

A15

A14

A13

A12

A11

A10

A9

A8

A7

A6

A5

A4

A3

A2HIF

A1A28

A0A27

PHYA26

R14A25

R13A24

R12A23

R11A22

R10

A21

A20

A19

A18

A17

A16

A15

A14

A13

A12

A11

A10

A9

A8

A7

A6

A5

A4

A3

A2

A1

A0

PHY

R14

R13

R12

R11

R10

R9

R8

R7

R6

R5

R4

R3

R2

R1

R0

B1

B0

BG1

C9

C8

C7

C6

C5

C4

C3

BG0

C2

C1

C0

Register

Address Mapping

ADDRMAP0

Rank

ADDRMAP1

Note

If you are using dynamic DDR configuration with a DIMM, the mapping may differ based upon FSBL. If you want to disable the dynamic DDR configuration set CONFIG.PSU_DYNAMIC_DDR_CONFIG_EN = 0.

Register

Address Mapping

ADDRMAP0

Rank

ADDRMAP1

Bank

ADDRMAP{2:4}

Column

ADDRMAP{5:7}

Row

ADDRMAP8

Bank Group

ADDRMAP{9:11}

Row

...

These registers are not fully configurable in the PCW. If you want to remap the address lines, you must patch the psu_init.c.

Content Addressable Memory (CAM)

...

Debug/Status Registers

These registers are for debugging and polling the status of the DDRC ports. DBGCAM monitors the CAM levels, while PSTAT monitors the XPI outstanding commands.

Register

Offset

Field

Bits

Description

DBGCAM

SCHED

0x250

lpr_num_entries

[13:8]

Number of entries in the low priority transaction store minus one. Number of entries in high priority transaction store = 64 - (lpr_num_entries + 1)

Info

These registers are static and must be set while the DDRC is reset.

These registers are not configurable in the PCW. If you want to reallocate the HPR and LPR transaction stores in the read CAM, you must patch the psu_init.c.

Debug/Status Registers

These registers are for debugging and polling the status of the DDRC ports. DBGCAM monitors the CAM levels, while PSTAT monitors the XPI outstanding commands.

Register

Offset

Field

Bits

Description

DBGCAM

0x308

dbg_w_q_depth

dbg_lpr_q_depth

dbg_hpr_q_depth

[22:16]

[14:8]

[6:0]

Write queue depth

Low priority read queue depth

High priority read queue depth

PSTAT

0x3FC

wr_port_busy_5

wr_port_busy_4

wr_port_busy_3

wr_port_busy_2

wr_port_busy_1

wr_port_busy_0

rd_port_busy_5

rd_port_busy_4

rd_port_busy_3

rd_port_busy_2

rd_port_busy_1

rd_port_busy_0

[21]

[20]

[19]

[18]

[17]

[16]

[5]

[4]

[3]

[2]

[1]

[0]

Indicates if there are outstanding writes for 0x308

dbg_w_q_depth

dbg_lpr_q_depth

dbg_hpr_q_depth

[22:16]

[14:8]

[6:0]

Write queue depth

Low priority read queue depth

High priority read queue depth

PSTAT

0x3FC

wr_port_busy_5

wr_port_busy_4

wr_port_busy_3

wr_port_busy_2

wr_port_busy_1

wr_port_busy_0

rd_port_busy_5

rd_port_busy_4

rd_port_busy_3

rd_port_busy_2

rd_port_busy_1

rd_port_busy_0

[21]

[20]

[19]

[18]

[17]

[16]

[5]

[4]

[3]

[2]

[1]

[0]

Indicates if there are outstanding writes on port 5

Indicates if there are outstanding writes on port 4

Indicates if there are outstanding writes on port 3

Indicates if there are outstanding writes on port 2

Indicates if there are outstanding writes on port 1

Indicates if there are outstanding writes on port 0

Indicates if there are outstanding reads on port 5

Indicates if there are outstanding writes for reads on port 4

Indicates if there are outstanding writes for reads on port 3

Indicates if there are outstanding writes for port 2

Indicates if there are outstanding writes for port 1

Indicates if there are outstanding writes for port 0reads on port 2

Indicates if there are outstanding reads for port 5

Indicates if there are outstanding reads for port 4

Indicates if there are outstanding reads for port 3

Indicates if there are outstanding reads for port 2

Indicates if there are outstanding reads for on port 1

Indicates if there are outstanding reads for on port 0

Info

These registers are read-only

...

You may need to adjust the APU QoS if the HP ports are significantly impacting the APU DDR accesses. The result may be Very high HP bandwidth may result in APU software that freezes or runs very slowly. It is routine to set the APU QoS to 0xE which preserved the highest priority for dynamic priority escalation.

Info

The APU read and write QoS priority is set to lowest priority (00x0) by FSBL

APU Module base address: 0xFD5C0000

...

Test Bench

  • Vitis 2020.1

  • Petalinux/Yocto 2020.1

  • ZCU102

...

Hardware

PL

The IPI block design for the test bench has a traffic generator (TG) connected to each HP port. Each TG is configured with a 128-bit data bus at 200MHz 250MHz and operates as a greedy master flooding the FPD interconnect and DDR with equal read and write traffic. The theoretical data rate of each TG is 3.2 4 GBps per channel. The default only TG flow control is the back pressure from the AXI bus.

...

...

Software

Linux

Monitor software Linux is running on the RPU echos the data measured from each hard APM to a terminal. Running the monitor on the RPU from OCM allows us to take snapshots of the DDR traffic when running a high level OS like Linux on the APU. Since there is no dependency on the DDR memory, the monitor software will not get blocked from accessing its memory. An advantage of this approach is it does not require JTAG to read the APM registers.

Traffic Shaping on ZCU102

Start off with 4 greedy masters and try to shape the traffic to meet an arbitrary requirement.

Calculate max DDR bandwidth of 17 GBps and efficiency with defaults.

AFI

  1. Latency sensitive masters (APU) (Highest default priority, 14)

  2. Real-time master (HP0:HDMI) (Video Priority, 11)

  3. Isochronous (HP1:Video) (Video Priority, 8)

  4. Greedy master (HP2:HDD) (BE, 3)

  5. Greedy master (HP3:DMA) (BE 0)

QoS-400

  1. Limit video ports

  2. Limit greedy master

Urgent

  1. Set one of greedy masters as urgent in software

  2. Suggest setting port aging

Address Mapper

...

APU. The buffer memory for the TGs is reserved from Linux through the device tree node below.

Code Block
/ {
        reserved-memory {
                #address-cells = <2>;
                #size-cells = <2>;
                ranges;

                reserved: buffer@0 {
                        no-map;
                        reg = <0x0 0x70000000 0x0 0x08000000>;
                };
        };
};

APM Monitor

APM monitor software running on the RPU echos the data measured from each hard APM to a terminal. Running a monitor on the RPU from OCM allows us to take snapshots of the DDR traffic when running a high level OS like Linux on the APU. Since there is no dependency on the DDR memory, the monitor software will not get blocked from executing no matter how heavy the HP traffic. An advantage of this approach is it does not require JTAG to read the APM registers which can be affected by very high traffic.

Conclusions

<TODO: Add conclusions>

Related Links

  1. Zynq UltraScale+ Devices Register Reference

  2. Zynq UltraScale+ Device Technical Reference Manual (UG1085)

  3. ARM® CoreLink™ NIC-400 Network Interconnect

  4. ARM® CoreLink™ QoS-400 Network Interconnect Advanced Quality of Service

  5. Quality of Service (QoS) in ARM Systems: An Overview, Ashley Stevens, July 2014

...