22 Apr 2018
In the last Panfrost status update, a transitory “half-way” driver was presented, with the purpose of easing the transition from a standalone library abstracting the hardware to a full-fledged OpenGL ES driver using the Mesa and Gallium3D infrastructure.
Since then, I’ve completed the transition, creating such a driver, but retaining support for out-of-tree testing.
Almost everything that was exposed with the custom half-way interface is now available through Gallium3D. Attributes, varyings, and uniforms all work. A bit of rasterisation state is supported. Multiframe programs work, as do programs with multiple non-indexed, direct draws per frame.
The result? The GLES test-cube
demo from
freedreno
runs using the Mali T760 GPU present in my RK3288 laptop, going through
the Mesa/Gallium3D stack. Of course, there’s no need to rely on the
vendor’s proprietary compilers for shaders – the demo is using shaders
from the free, NIR-based Midgard
compiler.
Look ma, no blobs!
In the past three weeks since the previous update, all aspects of the project have seen fervent progress, culminating in the above demo. The change list for the core Gallium driver is lengthy but largely routine: abstracting features about the hardware which were already understood and integrating it with Gallium, resolving bugs which are discovered in the process, and repeating until the next GLES test passes normally. Enthusiastic readers can read the code of the driver core on GitLab.
Although numerous bugs were solved in this process, one in particular
is worthy of mention: the “tile flicker bug”, notorious to lurkers of
our Freenode IRC channel,
#panfrost
. Present since the first render, this bug
resulted in non-deterministic rendering glitches, where particular tiles
would display the background colour in lieu of the render itself. The
non-deterministic nature had long suggested it was either the result of
improper memory management or a race condition, but the precise cause
was unknown. Finally, the cause was narrowed down to a race condition
between the vertex/tiler jobs responsible for draws, and the fragment
job responsible for screen painting. With this cause in mind, a simple
fix squashed the bug, hopefully for good; renders are now deterministic
and correct. Huge thanks to Rob Clark for letting
me use him as a sounding board to solve this.
In terms of decoding the command stream, some miscellaneous GL state has been determined, like some details about tiler memory management, texture descriptors, and shader linkage (attribute and varying metadata). By far, however, the most significant discovery was the operation of blending on Midgard. It’s… well, unique. If I had known how nuanced the encoding was – and how much code it takes to generate from Gallium blend state – I would have postponed decoding like originally planned.
In any event, blending is now understood. Under Midgard, there are two paths in the hardware for blending: the fixed-function fast path, and the programmable slow path, using “blend shaders”. This distinction has been discussed sparsely in Mali documentation, but the conditions for the fast path were not known until now. Without further ado, the fixed-function blending hardware works when:
ADD
,
SUBTRACT
, or REVERSE_SUBTRACT
(but not
MIN
or MAX
)ONE
or
ZERO
(but not a constant colour or anything fancier), or
the additive complement thereof.If these conditions are not met, a blend shader is used instead, incurring a presently unknown performance hit.
By dominant and non-dominant modes, I’m essentially referring to the more complex and less complex blend functions respectively, comparing between the functions for the source and the destination. The exact details of the encoding are a little hairy beyond the scope of this post but are included in the corresponding Panfrost headers and the corresponding code in the driver.
In any event, this separation between fixed-function and programmable
blending is now more or less understood. Additionally, blend shaders
themselves are now intelligible with Connor Abbott’s Midgard
disassembler; blend shaders are just normal Midgard shaders, with an
identical ISA to vertex and fragment shaders, and will eventually be
generated with the existing NIR compiler. With luck, we should be able
to reuse code from the NIR compiler for the vc4
,
an embedded GPU lacking fixed-function hardware for any blending
whatsoever. Additionally, blend shaders open up some interesting
possibilities; we may be able to enable developers to write blend
shaders themselves in GLSL through a vendored GL extension. More
practically, blend shaders should enable implementation of all blend
modes, as this is ES 3.2 class hardware, as well as presumably logic
operations.
Command-stream work aside, the Midgard compiler also saw some
miscellaneous improvements. In particular, the mystery surrounding
varyings in vertex shaders has finally been cracked. Recall
that gl_Position
stores are accomplished by writing the
screen-space coordinate to the special register r27
, and
then including a st_vary
instruction with the mysterious
input register r1
to the appropriate address. At the time,
I had (erroneously) assumed that the r27
store was
responsible for the write, and the subsequent instruction was a peculiar
errata workaround.
New findings shows it is quite the opposite: it is the store
instruction that does the store, but it uses the value of
r27
, not r1
for its input. What does
the r1
signify, then? It turns out that two different
registers can be used for varying writes, r26
and
r27
. The register in the store instruction selects between
these registers: a value of zero uses r26
whereas a value
of one uses r27
. Why, then, are there two varying source
registers? Midgard is a VLIW architecture, in this case meaning that it
can execute two store instructions simultaneously for improved
performance. To achieve this parallelism, it needs two source registers,
to be able to write two different values to the two varyings.
This new understanding clarifies some previous peculiar
disassemblies, as the purpose of writes to r26
are now
understood. This discovery would have been easier had r26
not also represented a reference to an embedded constant!
More importantly, it enables us to implement varying stores in the
vertex shader, allowing for smoothed demos, like the shading on
test-cube
, to work. As a bonus, it cleans up the code
relating to gl_Position
writes, as we now know they can use
the same compiler code path as writes to normal varyings.
Besides varyings, the Midgard compiler also saw various improvements, notably including a basic register allocator, crucial for compiling even slightly nontrivial shaders, such as that of the cube.
Beyond Midgard, my personal focus, Bifrost has continued to see
sustained progress. Connor Abbott has continued decoding the new shader
ISA, uncovering and adding disassembler support for a few miscellaneous
new instructions and in particular branching. Branching under Bifrost is
somewhat involved – the relevant disassembler commit added over two
hundred lines of code – with semantics differing noticeably from
Midgard. He has also begun work porting the panwrap
infrastructure for capturing, decoding, and replaying command streams
from Midgard to Bifrost, to pave the way for a full port of the driver
to Bifrost down the line.
While Connor continues work on his disassembler, Lyude Paul has been working on a Bifrost assembler compatible with the disassembler’s output, a milestone necessary to demonstrate understanding of the instruction set and a useful prerequisite to writing a Bifrost compiler.
Going forward, I plan on cleaning up technical debt accumulated in the driver to improve maintainability, flexibility, and perhaps performance. Additionally, it is perhaps finally time to address the elephant in the command stream room: textures. Prior to this post, there were two major bugs in the driver: the missing tile bug and the texture reading bug. Seeing as the former was finally solved with a bit of persistence, there’s hope for the latter as well.
May the pans frost on.