In the last update of the free software Panfrost driver, I unveiled the Midgard shader compiler. In the two weeks since then, I’ve shifted my attention from shaders back to the command stream, the fixed-function part of the pipeline. A shader compiler is only useful if there’s a way to run the shaders, after all!

The basic parts of the command stream have been known since the early days of the project, but in the past weeks, I methodically went through the OpenGL ES 2.0 specification searching for new features, writing test code to iterate the permutations, discovering how the feature is encoded in the command stream, and writing a decoder for it. This tedious process is at the heart of any free graphics driver project, but with patience, it is effective.

Thus, since the previous post, I have decoded the fields corresponding to: framebuffer clear flags, fragment discard hinting, viewports, blend shaders, blending colour masks, antialiasing (MSAA), face culling, depth factor/units, the stencil test, the depth test, depth ranges, dithering, texture channel swizzling, texture compare functions, texture wrap modes, alpha coverage, and attribute/varying types.

That was a doozy!

This marks an important milestone: excepting textures, framebuffer objects, and fancy blend modes, the command stream needed for OpenGL ES 2.0 is almost entirely understood. For context on why those features are presently missing, we have not yet been able to replay a sample with textures or framebuffer objects, presumably due to a bug in the replay infrastructure. Until we can do this, no major work can occur for them. Figuring this bit out is high priority, but work on this area is mixed in with work on other parts of the project, to avoid causing a stall (and a lame blog post in two weeks with nothing to report back). As for fancy blend modes, our hardware has a peculiar design involving programmable blending as well as a fixed-function subset of the usual pipeline. Accordingly, I’m deferring work on this obscure feature until the rest of the driver is mature.

On the bright side, we do understand more than enough to begin work on a real driver. Thus, I cordially present the one and only Half-Way Driver! Trademark pending. Name coined by yours truly about five minutes ago.

The premise for this driver is simple: to verify that our understanding of the hardware is sound, we need to write a driver that is higher level than the simple decoded replays. And of course, we want to write a real driver, within Mesa and using Gallium3D infrastructure; after all, the end-goal of the project is to enable graphics applications to use the hardware with free software. It’s pretty hard to drive the hardware without a driver – I should know.

On the other hand, it is preferable to develop this driver independently of Mesa and Gallium3D, to retain control of the flow of the codebase, to speed up development, and to simplify debugging. Mesa and Gallium3D are large codebases; while this is necessary for production use, the sheer number of lines of code contained becomes a cumbersome burden to early driver development. As an added incentive to avoid building within their infrastructure, Mesa recompiles are somewhat slow with hardware like mine: as stated, I use my, ahem, low-power RK3288 laptop for development. Besides, while I’m still discovering new aspects to the hardware in each development session, I could do without the looming, ever-present risk of upstream merge conflicts.

The solution – the creatively named Half Way Driver – is a driver that is half-way between the opposite development strategies of a replay-driven, independent toy driver versus a mature in-tree Mesa driver. In particularly, the idea is to abstract a working replay into command stream constructors that follow Gallium3D conventions, including the permissively licensed Gallium3D headers themselves. This approach combines the benefits of each side: development is fast and easy, build times are short, and once the codebase is mature, it will be simple to move into Mesa itself and gain, almost for free, support for OpenGL, along with a number of other compatible state trackers. As an intermediate easing step, we may hook into this out-of-tree driver from softpipe, the reference software rasteriser in Gallium3D, progressively replacing software functionality with hardware-accelerated routines as possible.

In any event, this new driver is progressing nicely. At the moment, only clearing uses the native Gallium3D interface; the list of Galliumified functions will expand shortly. On the other hand, with a somewhat lower level interface, corresponding closely to the command stream, the driver supports the basic structures needed for rendering 3D geometry and running shaders. After some debugging, taking advantage of the differential tracing infrastructure originally built up to analyse the blob, the driver is able to support multiple draws over multiple frames, allowing for some cute GPU-accelerated animations!

Granted, by virtue of our capture-replay-decode workflow, the driver is not able to render anything that a previous replay could not, greatly limiting my screenshot opportunities. C’est la vie, je suppose. But hey, trust that seeing multiple triangles with different rendering states drawn in the same frame is quite exciting when you’ve been mashing your head against your keyboard for hours comparing command stream traces that are thousands of lines long.

In total, this work-in-progress brings us much closer to having a real Gallium3D driver, at which point the really fun demos start. (I’m looking at you, es2gears!)

On the shader side, progress continues to be steady. In the course of investigating blending on Midgard, including the truly bizarre “blend shaders” required for nontrivial blend modes, I uncovered a number of new opcodes relating to integers. In particular, the disassembler is now aware of the bitwise operations, which are used in this blend shader. For the compiler, I introduced a few new workarounds, presumably due to hardware errata, whose necessity was uncovered by improvements in the command stream.

For Bifrost shaders, Connor has continued his work decoding the instruction set. Notably, his recent changes enable complete disassembly of simple vertex shaders. In particular, he discovered a space-saving trick involving a nuanced mechanism for encoding certain registers, which disambiguated his previous disassembled shaders. Although he realised this fact earlier on, it’s also worth noting that there are great similarities to Midgard vertex shaders which were uncovered a few weeks ago – good news for when a Bifrost compiler is written! Among other smaller changes, he also introduced support for half-floats (fp16) and half-ints (int16), which implies a new set of instruction opcodes. He has also gathered initial traces of the Bifrost command stream, with an intent of gauging the difficulty in porting the current Midgard driver to Bifrost as well, allowing us to test shaders on the elegant new Gxx chips. In total, understanding of Bifrost progresses well; while Midgard is certainly leading the driver effort, the gap is closing.

In the near future, we’ll be Galliumising the driver. Stay tuned for scenes from our next episode!

This page is licensed under the CC BY-SA 4.0. Spread free culture!

Support me on Liberapay!

Back to blog