Kodi and SuperTuxKart on Panfrost

1 Apr 2019

Back in October, Panfrost ran some simple benchmarks, like glmark. Five months later, Panfrost has grown from running benchmarks to real-world apps, like Kodi, and 3D games like SuperTuxKart and Neverball.

Since the previous post, there have been major improvements across every part of the aspect culminating in this milestone. On the kernel side, my co-contributors Tomeu Vizoso and Rob Herring have created a modern kernel driver, suitable for mainline inclusion. Panfrost now uses this upstream-friendly driver, rather than relying on a modified legacy kernel as in the past. The new kernel module is currently under review for mainline inclusion. You can read more about this progress on Tomeu’s blog.

Outside the kernel, however, the changes have been no less significant. Early development was constrained to our own project repositories, as the code was not yet ready to general users. In early February, thanks in part to the progress on the kernel-space, we flew out of our nest, and Panfrost was merged into upstream Mesa, the central repository for free software graphics drivers. Development now occurs in-tree in Mesa.

We have continued decoding new aspects of the hardware and implementing support in the driver. A few miscellaneous additions include cube maps, gl_PointSize and gl_PointCoord, linear depth rendering, performance counters, and new shader instructions.

One area of particular improvement has been our understanding of the hardware formats (like “4-element vector of 32-bit floats” or “single 16-bit unsigned normalized integer”). In Panfrost’s early days, we knew magic numbers to distinguish a few of the most common formats, but the underlying meanings of the voodoo patterns were elusive. Further, the format bits for textures and attributes were not unified, further hindering the diversity of supported formats available. However, Connor Abbott and I have since identified the underlying meaning of the format codes for textures, attributes, and framebuffers. This new understanding allows for the magic numbers to be replaced by a streamlined format selection routine, mapping Gallium’s formats to the hardware’s and supporting the full spectrum of formats required for a conformant driver. Panfrost is now passing texture format tests for OpenGL ES 2.0.

From a performance standpoint, various optimizations have been added. In particular, a fast path likely relating to the “tiler” in the hardware was discovered. When this fast path is used, performance on geometry heavy scenes skyrockets. In one extreme demo (shading the Stanford bunny), performance more than tripled, and these gains trickle down to real-world games.

Features aside, one of the key issues with an early driver is the brittleness and instability. Accordingly, to guarantee robustness, I now test with the drawElements Quality Program (dEQP), which includes comprehensive code correctness tests. Although we’re still a while away from conformance, I now systematically step through identified issues and resolved the bugs, translating to fixes across every aspect of the driver.

One real-world benefactor of these fixes is the Kodi media center, which today works well using Panfrost to achieve a fluid interface on Midgard devices. For standalone installations of Kodi, today there are experimental images featuring Kodi and Panfrost. To further improve fluidity, Kodi and Panfrost can even interoperate with the video decoding acceleration, contingent on cooperative kernel drivers.

For users more inclined to gaming, some 3D games are beginning to show signs of life with Panfrost. For instance, the classic (OpenGL ES 2.0) backend of the ever-popular kart racing game, SuperTuxKart, now renders with some minor glitches with Panfrost. Performance is playable on simple tracks, though we have many opportunities for optimization. To bring up this racing game, I added support for complex control flow in the compiler. Traditionally, control flow is discouraged in graphics, due to the architecture of desktop GPUs (thread “warps”). However, Midgard does not feature these traditional optimizations, negating the performance penalty for branching from control flow. The implementation required new bookkeeping in the compiler, as well as an investigation into long jumps due to the size of the game’s “uber-shader”. In total, this compiler improvement – paired with assorted bug fixes – allows SuperTuxKart to run.

Likewise, Neverball is playable (and fun!) with Panfrost, although there are rendering anomalies relating to the currently unimplemented legacy feature “point sprites”. In contrast to Kodi and SuperTuxKart, which make liberal use of custom shaders, Neverball is implemented with purely fixed-function desktop OpenGL. This poses an interesting challenge, as Midgard is designed specifically for embedded graphics; the blob does not support this desktop superset. But that’s no reason we can’t!

Like most modern free software OpenGL drivers, Panfrost is built atop the modular “Gallium” architecture. This architecture abstracts away interface details, like desktop versus embedded OpenGL, normalizing differences to allow drivers to focus on the hardware itself. This abstraction means that by implementing Panfrost as an embedded driver atop Gallium, we get a partial desktop OpenGL implementation “free”.

Of course, there is functionality in the desktop superset that does not exist in the embedded profile. While Gallium tries to paper over these differences, the driver is required to implement features like point sprites and alpha testing to expose the corresponding desktop functions. So, the bring-up of desktop OpenGL applications like Neverball has led me to implement some of this additional functionality. Translating the “alpha test” to a conditional discard instruction in the fragment shader works. Similarly, translating “point sprites” to the modern equivalent, gl_PointCoord, is planned.

Interestingly, the hardware does support some functionality only available through the full desktop profile. It is unknown how many “hidden features” of this type are supported; as the blob does not appear to use them, these features were discovered purely by accident on our part. For instance, in addition to the familiar set of “points, lines, and triangles”, Midgard can natively render quadrilaterals and polygons. The existence of this feature will suggested by the corresponding performance counters, and the driver-side mechanics were determined by manual bruteforce of the primitive selection bits. Nevertheless, now that these bonus features are understood, quads can be drawn from desktop applications without first translating to indexed triangles in software. Similarly, it appears in addition to the embedded standard of boolean occlusion queries, setting a chicken bit enable the hardware’s hidden support for precise occlusion counters, a useful desktop feature.

Going forward, although the implementation of OpenGL ES 2.0 is approaching feature-completeness, we will continue to polish the driver, guided by dEQP. Orthogonal to conformance, further optimization to improve performance and lower memory usage is on the roadmap.

It’s incredible to reflect back and realise just one year ago, work had not even begun on writing a real OpenGL driver. Yet here we are today with an increasingly usable, exclusive free software, hardware-accelerated desktop with Mali Midgard graphics.

Frost on.

Back to home