The Word of the Week Is…. Shaders!

5 Mar 2018

About two weeks ago, I published a screenshot of a smoothed triangle rendered with a free software driver on a Mali T760 with binary shaders.

But… binary shaders? C’mon, I shouldn’t stoop that low! What good is it to have a free software driver if we’re dependant on a proprietary mystery blob to compile our shaders, arguably the most important capability of a modern graphics driver?

There was little excuse – even then the shader instruction set was partially understood through the work of Connor Abbott back in 2013. At the time, Connor decoded the majority of arithmetic (ALU) and load-store instructions; additionally, he wrote a disassembler based on his findings. It is hard to overstate the magnitude of Connor’s contributions here; decoding a modern instruction set like Midgard is a major feat, of comparable difficulty to decoding the GPU’s command stream itself. In any case, though, his work resulted in detailed documentation and a disassembler strictly for prototyping work, never meant for real world use.

Naturally enough, I did the unthinkable, by linking directly to the disassembler’s internal library from the command stream tracer. After cleaning up the disassembler code a bit, massaging its output into normal assembly rather than a collection of notes-to-self, the relevant source code for our smoothed triangle changed from:

FILE *f_shader_12 = fopen("shader_12.bin", "rb");
fread(shader_12, 1, 4096, f_shader_12);
fclose(f_shader_12);

(where shader_12.bin is a nontrivial blob extracted from the command stream containing the compiled shaders as well as some other unused code), to a much more readable:

const char shader_src_2[] = R"(
    ld_vary_16 r0.xy, 0.xyxx, 0xA01E9E

    vmul.fmov r0, r24.xxxx, hr0
    fb.write 0x1808
    
    vmul.fmov r0, r24.xxxx, r0
    fb.write 0x1FF8
)";

pandev_shader_assemble(shader_12 + 288, shader_src_2);

There are still some mystery hex constants there, but the big parts are understood for fragment shaders at least. Vertex shaders are a little more complicated, but having this disassembly will make those much easier to understand as well.

In any event, having this disassembly embedded into the command stream isn’t any good without an assembler…

…so, of course, I then wrote a Midgard assembler. It’s about five hundred lines of Python, plus Pythonised versions of architecture definitions from the disassembler. This assembler isn’t too pretty or performant, but as long as it works, it’s okay; the real driver will use an emitter written directly in C and bypassing the assembly phase.

Indeed, this assembler, although still incomplete in some areas, works very well for the simple shaders we’re currently experimenting with. In fact, a compiled binary can be disassembled and then reassembled with our tools, yielding bit identical output.

That is, we can be even more reckless and call out to this prototype assembler from within the command stream. Look Ma, no blobs!

There is no magic. Although Midgard assembly is a bit cumbersome, I have been able to write some simple fragment shaders in assembly by hand, using only the free toolchain. Woohoo!


Sadly, while Connor’s 2013-era notes were impressive, they were lacking in a few notable areas; in particularly, he had not made any progress decoding texture words. Similarly, the elusive fbwrite field was never filled in. Not an issue – Connor and I decoded much of the texture pipeline, fbwrite, and branching. Many texture instructions can now be disassembled without unknown words! And of course, for these simpler texture instructions, we can reassemble them again bit-identical.


But we’ve been quite busy. Although the above represents quite a bit of code, that didn’t take the entirety of two weeks, of course. The command stream saw plenty of work, too, but that isn’t quite as glamorous as shaders. I decoded indexed draws, which now appear to work flawlessly. More interestingly, I began work investigating texture and sampler descriptors. A handful of fields are known there, as well as the general structure, although I have not yet successfully replayed any textures, nor have I looked into texture swizzling. Additionally, I identified a number of minor fields relating to: glFrontFace, glLineWidth, attribute and uniform count, framebuffer dimensions, depth/stencil enables, face culling, and vertex counts. Together, I estimate I’ve written about 1k lines of code since the last update, which is pretty crazy.

So, what’s next in the pipeline?

Textures, of course! I’d also like to clean up the command stream replays, particularly relating to memory allocation, to ensure there are no large gaps in our understanding of the hardware.

After that, well, it’ll be time to dust off the NIR compiler I began at the end of January… and start moving code into Mesa!

The future is looking bright for the Panfrost driver.

Back to home