X Server Screen Rotation on Arm Mali-400

X Server Screen Rotation on Arm Mali-400

This page describe implementation of X server screen rotation on the Zynq UltraScale+ MPSoC platform.

Table of Contents

Introduction

Why do we need screen rotation? Obviously, handheld devices require dynamic changes of display orientation based on accelerometer data. But stationary devices can benefit from this feature even more. Think of a signage display with vertical orientation, some important application designed for portrait layout, or even regular users that prefer to install their monitors sideways.

X Server RandR Extension

RandR stands for “Rotate and Reflect”. This X Server extension adds support for rotation and reflection of the X Server root window. These transformations are applied to the fully composed root window pixmap, not the individual application windows. The extension implements screen transformations by using additional per display output shadow buffers injected into the display pipeline between X Server compositor output and CRTC, thus allowing root pixmap transform be applied to match the user requested display orientation.

xorg-render.gif

Enabling Screen Rotation in the ARMSOC Driver

RandR expects a device specific X Server video output driver to handle the shadow buffer manager. The generic ARMSOC driver that we use in our Linux distributions does not support RandR. In order to implement such support, three CRTC callbacks should be defined:

static const xf86CrtcFuncsRec drmmode_crtc_funcs = { ... .shadow_allocate = drmmode_shadow_allocate, .shadow_create = drmmode_shadow_create, .shadow_destroy = drmmode_shadow_destroy, };
  • shadow_allocate- this callback performs the required shadow DMA buffer allocation:

bo = armsoc_bo_new_with_dim(pARMSOC->dev, width, height, pScrn->bitsPerPixel, pScrn->bitsPerPixel, ARMSOC_BO_SCANOUT);

We use ARMSOC buffer allocation utilities for this purpose. Here we also register a new framebuffer to the CRTC:

ret = armsoc_bo_add_fb(bo);

This method is a wrapper around drmModeAddFB() that adds a new framebuffer to the CRTC fb list. We also save a shadow buffer reference in the driver’s internal structure for future use:

struct ARMSOCRec *pARMSOC = ARMSOCPTR(pScrn); ... pARMSOC->shadow = bo;
  • shadow_create - this callback creates a pixmap that corresponds to the previously allocated shadow buffer:

... PixmapPtr pixmap; pixmap = (*pScreen->CreatePixmap)(pScreen, 0, 0, armsoc_bo_depth(bo), 0); ... (*pScreen->ModifyPixmapHeader)(pixmap, armsoc_bo_width(bo), armsoc_bo_height(bo), armsoc_bo_depth(bo), armsoc_bo_bpp(bo), armsoc_bo_pitch(bo), armsoc_bo_map(bo)); return pixmap;

Here we adjust the pixmap header to fit the supported pixel format and attach the input buffer object to the created pixmap as the private data pointer.

  • shadow_destroy - this callback frees the previously allocated shadow buffer:

armsoc_bo_unreference(bo);

On the X video driver level, we are responsible only for the buffer resources management. RandR handles the rest: synchronization, rendering pipeline adjustment, buffer lifetime management, etc.

GPU Accelerated Composition

RandR effectively incorporates an extra rendering step into the X Server composition pipeline. Such picture composition by default is performed via the pixman library, which effectively applies a required transformation to every pixel (3x3 matrix per 3d vector multiplication + normalization). Even using neon vector instructions, it is extremely slow. One way to optimize this step is to use GPU rendering. X Server already has an extension that performs such a task - glamor. Unfortunately, after a few experiments, it became clear that this extension wouldn’t help our case. A number of roadblocks, such as lack of support for required EGL/GLES extensions, nonconformant behavior, etc. in the Mali 400 SW stack, appeared on the way.

Inspired by glamor, a similar helper library named repulsion was introduced within the ARMSOC driver. This utility handles simple buffer-to-buffer copy with transform operations, which significantly improve overall rendering performance. Exact performance measurements are presented below. This optimization is optional (enabled by default), and can be switched on and off via an X config file:

Section "Device" Identifier "ZynqMP" Driver "armsoc" ... Option "AccelerateComposition" "true" EndSection

Accelerated composition is very reliable and has a non-accelerated composition fallback, so it is recommended to have it enabled. The only situation where users might want to disable it is when they are experiencing GPU resource contention (texture memory shortage, etc.).

Repulsion Library

This library is implemented as a pair of header and source files that are integrated into ARMSOC and compiled and linked together with the rest of the ARMSOC driver code.

Interface

Repulsion has a very simple interface of just three functions:

  • armsoc_repulsion_init: this function creates and initializes EGL and GLES contexts and also pre-allocates a set of objects: EGL pixel buffer backed surface, GLES shader program, GLES target texture, GLES vector buffer object and GLES index buffer object. If the initialization succeeds, an opaque pointer to struct ARMSOCRepulsion is returned or NULL otherwise.

struct ARMSOCRepulsion *armsoc_repulsion_init(void);
  • armsoc_repulsion_release: this function deinitializes the provided ARMSOCRepulsion instance and frees all resources.

void armsoc_repulsion_release(struct ARMSOCRepulsion *repulsion);
  • armsoc_repulsion_composite: this function renders the content of the src buffer object into the dest buffer, applying the provided xform_matrix transformation. The function returns true on success or false in case of failure.

bool armsoc_repulsion_composite(struct ARMSOCRepulsion *repulsion, struct armsoc_bo *src, struct armsoc_bo *dest, float xform_matrix[3][3]);

Integration

Repulsion is initialized during ARMSOC driver screen initialization ARMSOCScreenInit() callback:

pARMSOC->repulsion = armsoc_repulsion_init();

Within the same callback, we also install a custom picture compositor hook ARMSOCComposite() that after some checks, calls the repulsion compositor and fallbacks to the cached compositor (pARMSOC->composite_proc) if the repulsion composition fails:

static Bool ARMSOCCanAccelerateComposition(CARD8 op, PicturePtr src, PicturePtr mask, PicturePtr dest, CARD16 width, CARD16 height) { ... } static void ARMSOCComposite(CARD8 op, PicturePtr src, PicturePtr mask, PicturePtr dest, INT16 x_src, INT16 y_src, INT16 x_mask, INT16 y_mask, INT16 x_dest, INT16 y_dest, CARD16 width, CARD16 height) { ... Bool can_accelerate = ARMSOCCanAccelerateComposition(op, src, mask, dest, width, height); ... if (can_accelerate && armsoc_repulsion_composite(pARMSOC->repulsion, src_bo, dest_bo, xform_matrix)) { } else { /* Fallback to saved compositor if accelerated composition fails */ pARMSOC->composite_proc(op, src, mask, dest, x_src, y_src, x_mask, y_mask, x_dest, y_dest, width, height); } ... } ... static void ARMSOCScreenInit(SCREEN_INIT_ARGS_DECL) { ... PictureScreenPtr ps; ... pARMSOC->repulsion = armsoc_repulsion_init(); ps = GetPictureScreen(pScreen); pARMSOC->composite_proc = ps->Composite; ps->Composite = ARMSOCComposite; ... return TRUE; }

We de-initialize repulsion in the ARMSOCCloseScreen() screen close callback.

Diagnostics and Debugging

Repulsion incorporates X Server logging capabilities and would report all errors and warnings into a standard X Server log file. Here is the list of repulsion-specific info/warning/error log entries that can occur during X Server execution (NB: most of the error logs will report EGL or GLES error code together with human readable error description):

  • ARMSOC: Repulsion initialized - repulsion compositor successfully initialized.

  • ARMSOC: ERROR: Out of memory - out of memory condition.

  • ARMSOC: ERROR: Failed to initialize EGL: 0x%04x (%s) - EGL subsystem failed to initialize.

  • ARMSOC: ERROR: Failed to initialize GLES: 0x%04x (%s) - GLES subsystem failed to initialize.

EGL specific error logs:

  • ARMSOC: ERROR: Failed to create dest EGL image: 0x%04x (%s) - failure to create destination EGL image.

  • ARMSOC: ERROR: Failed to create src EGL image: 0x%04x (%s) - failure to create source EGL image.

GLES specific error logs:

  • ARMSOC: ERROR: Failed to complete framebuffer - failure to attach and configure all buffers required for GLES rendering.

  • ARMSOC: ERROR: Failed to create texture: 0x%04x (%s) - failure to create GLES texture.

  • ARMSOC: ERROR: Failed to create index buffer: 0x%04x (%s) - failure to create GLES index buffer.

  • ARMSOC: ERROR: Failed to create vertex buffer: 0x%04x (%s) - failure to create GLES vertex buffer.

  • ARMSOC: ERROR: Failed to compile shader: 0x%04x (%s) - failure to compile or link GLES shader program.

GLES asynchronous warning / error logs.

  • ARMSOC: ERROR: GLES2: %s - async GLES error detected.

  • ARMSOC: WARNING: GLES2: %s - async GLES warning condition detected.

How To Use Screen Rotation

Users have 2 ways to enable screen rotation:

  • Dynamically via the xrandr command line utility:

user@host$ sudo xrandr -d :0 --output DP-1 --rotate right
  • On X Server startup via the configuration file (/etc/X11/xorg.conf):

Section "Monitor" Identifier "DefaultMonitor" Option "Rotate" "left" EndSection

Note: rotation and reflection options provided via xrandr override those in the X server config file.

Performance

Performance measurements were conducted on a ZCU106 board for different display resolutions and color depths using glmark2 running in fullscreen mode. PetaLinux 2023.2 was used as the host system with default X Server, Mali SW stack and driver configurations. The only changes to the default PetaLinux build are the screen rotation patches (attached below). Estimated FPS shows composition performance without application overhead (glmark2 creates extra pressure on GPU resources).

Display Resolution

Depth/BPP

Rotation

Acceleration

Composition Time, ms

estimated FPS

glmark FPS

Display Resolution

Depth/BPP

Rotation

Acceleration

Composition Time, ms

estimated FPS

glmark FPS

4K (4096x2160)

24/32

None

None

 

N/A

73

4K (4096x2160)

24/32

Inverted

None

1060.01

0.94

1

4K (4096x2160)

24/32

Left

None

1120.96

0.89

0

4K (4096x2160)

24/32

Inverted

GLES

89.69

11.15

9

4K (4096x2160)

24/32

Left

GLES

30.80

32.47

21

4K (4096x2160)

16/16

None

None

 

N/A

87

4K (4096x2160)

16/16

Inverted

None

1008.01

0.99

1

4K (4096x2160)

16/16

Left

None

1080.19

0.93

1

4K (4096x2160)

16/16

Inverted

GLES

31.82

31.43

22

4K (4096x2160)

16/16

Left

GLES

22.97

43.54

25

FHD (1920x1080)

24/32

None

None

 

N/A

203

FHD (1920x1080)

24/32

Inverted

None

239.79

4.17

4

FHD (1920x1080)

24/32

Left

None

250.02

4.00

4

FHD (1920x1080)

24/32

Inverted

GLES

7.24

138.12

80

FHD (1920x1080)

24/32

Left

GLES

7.57

132.10

72

FHD (1920x1080)

16/16

None

None

 

N/A

179

FHD (1920x1080)

16/16

Inverted

None

235.60

4.24

4

FHD (1920x1080)

16/16

Left

None

237.64

4.21

4

FHD (1920x1080)

16/16

Inverted

GLES

5.39

185.53

89

FHD (1920x1080)

16/16

Left

GLES

5.55

180.18

93

Implementation

Exact implementation of the screen rotation features follows below. These patches are applicable against the https://github.com/Xilinx/meta-xilinx/ repo.

Related Links

© 2025 Advanced Micro Devices, Inc. Privacy Policy