Atomic Operations With Exclusive Access

This page describes atomic operations using load/store exclusive instructions of a CPU.

Table of Contents


This page is targeted to the software engineer designing an embedded software system with a Xilinx SoC or FPGA. This page was written by an embedded software engineer so that hardware details may be a bit higher level. As multicore CPU systems become prevalent, such as Zynq 7K and MPSoC, the use of shared memory across CPUs is needed. But shared memory can introduce new challenges with how to manage the update of the memory.

This topic is not unique to Xilinx but Xilinx system specifics justify a more detailed description than found in general documentation and the Xilinx product documentation. This topic also appears to be fairly straightforward with a quick glance, but there are a lot of details for a complete solution which can make it challenging. The page only provides an introduction and does not intend to provide a complete solution. The hope is to peak the readers curiosity to dig into the details when required and to help identify key challenges.

All Xilinx embedded systems rely on the AXI protocol in which CPU instructions read and write system memory using a master interface and a memory controller provides a slave interface. A master may be other devices such as DMA rather than only a CPU. The terms master and slave are used to provide a more general solution.

Exclusive Accesses

Shared memory that is read and written by multiple masters at the same time can cause unpredictable results. Exclusive accesses provide a method to allow memory to be accessed exclusively by a master such as a CPU.

AXI Protocol

The following text is taken from the Arm AXI4 specification and the reader is encouraged to refer to the specification for more details beyond the following introduction.

The exclusive access mechanism can provide semaphore-type operations without requiring the bus to remain dedicated to a particular master for the duration of the operation. This means the semaphore-type operations do not impact either the bus access latency or the maximum achievable bandwidth.

The AxLOCK signals select exclusive access, and the RRESP and BRESP signals indicate the success or failure of the exclusive access read or write respectively. The slave requires additional logic to support exclusive access. The AXI protocol provides a mechanism to indicate when a master attempts an exclusive access to a slave that does not support it.

CPU Instructions

Most CPUs provide instructions for exclusive access. The ARM CPUs used by Xilinx, including the Cortex A9/A53/A72/R5 and MicroBlaze, provide these instructions. The ARM CPUs provide the LDREX/STREX instructions and MicroBlaze provides the LWX/SWX instructions.

ARM Instruction Details

The following details are taken from the ARM documentation only to summarize and the reader is encouraged to use the ARM documentation ARM Synchronization Primitives for more details.

The LDREX and STREX instructions split the operation of atomically updating memory into two separate steps. Together, they provide atomic updates in conjunction with exclusive monitors that track exclusive memory accesses. Load-Exclusive and Store-Exclusive must only access memory regions marked as Normal.

The LDREX instruction loads a word from memory, initializing the state of the exclusive monitor(s) to track the synchronization operation. For example, LDREX R1, [R0] performs a Load-Exclusive from the address in R0, places the value into R1 and updates the exclusive monitor(s). The instructions are used in pairs to load from a memory address while attempting to lock the memory for an exclusive operation followed by a store to the memory.

The STREX instruction performs a conditional store of a word to memory. If the exclusive monitor(s) permit the store, the operation updates the memory location and returns the value 0 in the destination register, indicating that the operation succeeded.

If the exclusive monitor(s) do not permit the store, the operation does not update the memory location and returns the value 1 in the destination register. This makes it possible to implement conditional execution paths based on the success or failure of the memory operation.

For example, STREX R2, R1, [R0] performs a Store-Exclusive operation to the address in R0, conditionally storing the value from R1 and indicating success or failure in R2.

Memory Controller

Memory controllers act as a slave in the AXI protocol. The memory controller must support the locking of the AXI protocol for exclusive access. Not all Xilinx memory controllers support exclusive accesses as documented in each product documentation.

The support for exclusive access (locking) in the memory controller is generally referred to as a monitor. The monitor provides a state machine to track the address and lock state of the exclusive access. The memory controller may provide more than one monitor as a monitor is required for each exclusive access that can be done simultaneously. A monitor for each slave port of the memory controller is typical.

Some memory controllers, such as the DDR controller in Zynq 7K, have the exclusive monitors located in each port (interface) of the controller and each is independent such that locking cannot be accomplished with masters that access the DDR from two different ports. For example, a MicroBlaze in the PL and the ARM Cortex A9 cannot use a DDR memory address for exclusive access. The DDR memory controller of MPSoC works differently in that the exclusive monitors are global in the controller rather than specific to a port so there are no such limitations as with Zynq 7K.

The following table provides a summary of some of the memory controllers and the reader is encouraged to verify the details in the Xilinx product documentation as there are many details for using the exclusive access beyond this summary.


Supports Exclusive Access

Number of Monitors


Supports Exclusive Access

Number of Monitors

Zynq 7K DDR



Zynq 7K OCM












AXI Infrastructure

The AXI Infrastructure is any AXI IP between the master and the slave in this discussion. The infrastructure must also support the exclusive access as some may not. Examples of this include the AXI Interconnect and Smart Connect IP blocks. Smart Connect is expected to support locking in the 2020.1 release. AXI Interconnect supports locking. The reader is encouraged to verify the details in Xilinx product documentation.

Atomic Operations

Atomic operations refer to operations which are indivisible, such as a read modify write to a memory address. At first glance the reader may think why would I need to do this if I am running an operating system such as Linux or even a lighter weight RTOS such as FreeRTOS as the OS provides mechanisms for mutual exclusion such as the mutex and semaphore. OS mechanisms can be built on top of atomic operations but tend to be heavier and also incorporate scheduling so that a caller may be blocked. Spinlocks are an example of atomic operations that may only be available in kernel space.

Is Exclusive Access Enough?

Most modern CPUs have become more complex and the weakly ordered memory model is part of that complexity. Because of the weakly ordered memory model barrier instructions are required to ensure the order of memory access completion. Barriers are an additional component required together with exclusive locking to build a complete atomic solution. Barriers are a complex topic in themselves for which complete papers are written. For an understanding of the details of barriers the reader should refer to other sources such as ARM documentation at

Example Application

Atomic operations can be useful across CPUs in a bare metal or lightweight RTOS like FreeRTOS such as running on Zynq 7K with Cortex A9s in AMP mode. Each CPU can use a test and set function to determine if work has been completed in shared memory.

Do It Yourself (DIY) Assembly Language

It is pretty easy to find the code to implement an atomic operation using assembly language. Since it is assembly language it may be CPU specific, you just found it so you don’t really know how well it works, and time spent on it is time not spent on an application that sells product.

A test and set function is a typical example as shown below (prototyped in Zynq 7K).

/* * int TestAndSet(int *pointer, int OldValue, int NewValue) * * Returns : * 1 if *pointer has been modified to NewValue * 0 if *pointer has not been modfied * * R0 : Pointer * R1 : Old Value * R2 : New Value */ .section .text .globl TestAndSet TestAndSet: stmfd sp!, {lr} /* save the LR to allow a return to the caller */ ldrex r3, [r0] /* load R3 with pointer contents marking exclusive */ cmp r3, r1 /* *Pointer == Old Value? */ bne TestAndSet_Err /* not equal */ strex lr, r2, [r0] /* try to store if equal */ teq lr, #0 /* store successful? */ bne TestAndSet_Err /* if not, error out */ mov r0, #1 /* return TRUE if swap occurred */ ldmia sp!, {pc} TestAndSet_Err: clrex mov r0, #0 /* cmp and swap failed; return FALSE */ ldmia sp!, {pc}

Note that this example does not utilize any barriers which may be needed as atomic operations are more than just exclusive accesses as you will see later in the page.

C Programming Language

The following diagram illustrates how the C programming language has evolved over the years.

C11 Atomics

Most modern compilers, such as GCC and CLANG, support the C11 standard. The Xilinx GCC compiler provided in the Xilinx tools supports C11 and has done so for quite some time. GCC had pretty complete support for C11 in version 4.9. Why does the C11 standard really matter?

The C11 standard incorporated some features in the language that are useful and make C a better language. Atomic data types and functions were added so that atomic operations no longer need to written in assembly language.

Atomic Flags

Atomic flags can be used to mimic the test and set functionality of the DIY assembly language example prior in the page. The following function prototypes, as seen in stdatomic.h, illustrate a small part of the Atomics API that is provided to support atomic flags.

Bool atomic_flag_test_and_set (volatile atomic_flag *);

void atomic_flag_clear (volatile atomic_flag *);

Test And Set Example

The following example illustrates using C11 atomic flags. This example was prototyped on Zynq 7K with the Cortex A9s in AMP mode. The example allows both CPUs to use the UART mutually exclusively. Note the use of the *_explicit version of the functions which specifies the memory ordering and will be discussed in more detail later in the page.


#include <stdio.h> #include "platform.h" #include "xil_printf.h" #include <stdatomic.h> atomic_flag *uart_busy_ptr = (atomic_flag *)0x300000; int main() { _Bool already_set; init_platform(); atomic_flag_clear(uart_busy_ptr); /* Only done on 1st CPU to boot */ while (1) { already_set = atomic_flag_test_and_set_explicit(uart_busy_ptr, memory_order_seq_cst); if (!already_set) { print("Hello World CPU 0\n\r"); atomic_flag_clear_explicit(uart_busy_ptr, memory_order_seq_cst); } } cleanup_platform(); return 0; }

Atomics Memory Ordering

The atomic functions use a memory ordering which determines the barriers should be added in the functions at specific points. All functions default to a specific memory order and the *_explicit functions allow the memory order to be specified. The memory ordering, as with barriers, is a complex topic for which the reader should refer to other documents, such as A Tutorial Introduction to the ARM and POWER Relaxed Memory Models.

One method of implementation could be to use the strictest memory ordering which is likely to have correct behavior but sacrifices performance. This is illustrated in the test and set example above using the *explicit functions with the memory_order_seq_cst memory order.

The following snippet from stdatomic.h illustrates the memory ordering which can be used based on the exact application of the atomic.

typedef enum { memory_order_relaxed = __ATOMIC_RELAXED, memory_order_consume = __ATOMIC_CONSUME, memory_order_acquire = __ATOMIC_ACQUIRE, memory_order_release = __ATOMIC_RELEASE, memory_order_acq_rel = __ATOMIC_ACQ_REL, memory_order_seq_cst = __ATOMIC_SEQ_CST } memory_order;


The CPU caches may also affect the atomic solution and the details need to be understood by the user.

Prototyping with Zynq 7K using the Cortex A9s in AMP mode showed the following results. A memory location in DDR was shared between the A9s for atomic access. Atomic access in DDR showed that the data cache for the exclusive access memory must be either non-cached or cached with the share bit set to allow coherency between the CPUs. The default static MMU table in the Xilinx standalone BSP for bare metal defines DDR to be cachable and shareable such that the test and set application works.

A Peek Under The Hood

Atomics by nature require more instructions such that users must realize atomic variables will be slower than normal variables. The following code disassembly illustrates the code generated to declare and initialize an atomic variable and then increment the atomic variable. Notice that by default the code includes memory barriers (the dmb instructions) which will affect the performance.

Other Challenges To Consider

Lock Clearing

The exclusive access instructions, such as LDREX and STREX, are designed to be used in pairs with minimal instructions between them ideally. But what happens if the CPU is interrupted and then somehow does not come back to the STREX immediately? There are ways the lock is cleared.

Exclusive Access From The PL

Zynq 7K incorporates the exclusive access monitors in the DDR controller on each port of the controller. This likely indicates that a MicroBlaze in the PL and a Cortex A9 in the PS cannot use a shared DDR memory location for exclusive locking. This conclusion is only based on the documentation and has not been prototyped.


The Xilinx Mutex or Mailbox IP might be an alternative when atomics cannot be used with the required memory.

© Copyright 2019 - 2022 Xilinx Inc. Privacy Policy