# A Moving Mesa Midgard Cube

22 Apr 2018

In the last Panfrost status update, a transitory “half-way” driver was presented, with the purpose of easing the transition from a standalone library abstracting the hardware to a full-fledged OpenGL ES driver using the Mesa and Gallium3D infrastructure.

Since then, I’ve completed the transition, creating such a driver, but retaining support for out-of-tree testing.

Almost everything that was exposed with the custom half-way interface is now available through Gallium3D. Attributes, varyings, and uniforms all work. A bit of rasterisation state is supported. Multiframe programs work, as do programs with multiple non-indexed, direct draws per frame.

The result? The GLES test-cube demo from freedreno runs using the Mali T760 GPU present in my RK3288 laptop, going through the Mesa/Gallium3D stack. Of course, there’s no need to rely on the vendor’s proprietary compilers for shaders – the demo is using shaders from the free, NIR-based Midgard compiler.

Look ma, no blobs!

In the past three weeks since the previous update, all aspects of the project have seen fervent progress, culminating in the above demo. The change list for the core Gallium driver is lengthy but largely routine: abstracting features about the hardware which were already understood and integrating it with Gallium, resolving bugs which are discovered in the process, and repeating until the next GLES test passes normally. Enthusiastic readers can read the code of the driver core on GitLab.

Although numerous bugs were solved in this process, one in particular is worthy of mention: the “tile flicker bug”, notorious to lurkers of our Freenode IRC channel, #panfrost. Present since the first render, this bug resulted in non-deterministic rendering glitches, where particular tiles would display the background colour in lieu of the render itself. The non-deterministic nature had long suggested it was either the result of improper memory management or a race condition, but the precise cause was unknown. Finally, the cause was narrowed down to a race condition between the vertex/tiler jobs responsible for draws, and the fragment job responsible for screen painting. With this cause in mind, a simple fix squashed the bug, hopefully for good; renders are now deterministic and correct. Huge thanks to Rob Clark for letting me use him as a sounding board to solve this.

In terms of decoding the command stream, some miscellaneous GL state has been determined, like some details about tiler memory management, texture descriptors, and shader linkage (attribute and varying metadata). By far, however, the most significant discovery was the operation of blending on Midgard. It’s… well, unique. If I had known how nuanced the encoding was – and how much code it takes to generate from Gallium blend state – I would have postponed decoding like originally planned.

In any event, blending is now understood. Under Midgard, there are two paths in the hardware for blending: the fixed-function fast path, and the programmable slow path, using “blend shaders”. This distinction has been discussed sparsely in Mali documentation, but the conditions for the fast path were not known until now. Without further ado, the fixed-function blending hardware works when:

• The blend equation is either ADD, SUBTRACT, or REVERSE_SUBTRACT (but not MIN or MAX)
• The “dominant” blend function is either the source/destination colour/alpha, or the special case of a constant ONE or ZERO (but not a constant colour or anything fancier), or the additive complement thereof.
• The non-dominant blend function is either identical to the dominant blend function, or one of the constant special cases.

If these conditions are not met, a blend shader is used instead, incurring a presently unknown performance hit.

By dominant and non-dominant modes, I’m essentially referring to the more complex and less complex blend functions respectively, comparing between the functions for the source and the destination. The exact details of the encoding are a little hairy beyond the scope of this post but are included in the corresponding Panfrost headers and the corresponding code in the driver.

In any event, this separation between fixed-function and programmable blending is now more or less understood. Additionally, blend shaders themselves are now intelligible with Connor Abbott’s Midgard disassembler; blend shaders are just normal Midgard shaders, with an identical ISA to vertex and fragment shaders, and will eventually be generated with the existing NIR compiler. With luck, we should be able to reuse code from the NIR compiler for the vc4, an embedded GPU lacking fixed-function hardware for any blending whatsoever. Additionally, blend shaders open up some interesting possibilities; we may be able to enable developers to write blend shaders themselves in GLSL through a vendored GL extension. More practically, blend shaders should enable implementation of all blend modes, as this is ES 3.2 class hardware, as well as presumably logic operations.

Command-stream work aside, the Midgard compiler also saw some miscellaneous improvements. In particular, the mystery surrounding varyings in vertex shaders has finally been cracked. Recall that gl_Position stores are accomplished by writing the screen-space coordinate to the special register r27, and then including a st_vary instruction with the mysterious input register r1 to the appropriate address. At the time, I had (erroneously) assumed that the r27 store was responsible for the write, and the subsequent instruction was a peculiar errata workaround.

New findings shows it is quite the opposite: it is the store instruction that does the store, but it uses the value of r27, not r1 for its input. What does the r1 signify, then? It turns out that two different registers can be used for varying writes, r26 and r27. The register in the store instruction selects between these registers: a value of zero uses r26 whereas a value of one uses r27. Why, then, are there two varying source registers? Midgard is a VLIW architecture, in this case meaning that it can execute two store instructions simultaneously for improved performance. To achieve this parallelism, it needs two source registers, to be able to write two different values to the two varyings.

This new understanding clarifies some previous peculiar disassemblies, as the purpose of writes to r26 are now understood. This discovery would have been easier had r26 not also represented a reference to an embedded constant!

More importantly, it enables us to implement varying stores in the vertex shader, allowing for smoothed demos, like the shading on test-cube, to work. As a bonus, it cleans up the code relating to gl_Position writes, as we now know they can use the same compiler code path as writes to normal varyings.

Besides varyings, the Midgard compiler also saw various improvements, notably including a basic register allocator, crucial for compiling even slightly nontrivial shaders, such as that of the cube.

Beyond Midgard, my personal focus, Bifrost has continued to see sustained progress. Connor Abbott has continued decoding the new shader ISA, uncovering and adding disassembler support for a few miscellaneous new instructions and in particular branching. Branching under Bifrost is somewhat involved – the relevant disassembler commit added over two hundred lines of code – with semantics differing noticeably from Midgard. He has also begun work porting the panwrap infrastructure for capturing, decoding, and replaying command streams from Midgard to Bifrost, to pave the way for a full port of the driver to Bifrost down the line.

While Connor continues work on his disassembler, Lyude Paul has been working on a Bifrost assembler compatible with the disassembler’s output, a milestone necessary to demonstrate understanding of the instruction set and a useful prerequisite to writing a Bifrost compiler.

Going forward, I plan on cleaning up technical debt accumulated in the driver to improve maintainability, flexibility, and perhaps performance. Additionally, it is perhaps finally time to address the elephant in the command stream room: textures. Prior to this post, there were two major bugs in the driver: the missing tile bug and the texture reading bug. Seeing as the former was finally solved with a bit of persistence, there’s hope for the latter as well.

May the pans frost on.