Ooo, Shiny Textured Cube!

17 May 2018

A textured cube with a reflection

The libre Midgard driver, Panfrost has reached a number of new milestones, culminating in the above screenshot, demonstrating:

Warning: the following is ruthlessly technical and contains My Little Pony references. Proceed at your own risk.

Textures are by far the most significant addition. Although work decoding their command stream and shader instructions had commenced months ago, we hadn’t managed to replay a texture until May 3, let alone implement support in the driver. The lack of functional textures was the only remaining showstopper. We had poured in long hours debugging it, narrowing down the problem to the command stream, but nothing budged. No permutation of the texture descriptor or the sampler descriptor changed the situation. Yet, everyone was sure that once we figured it out, it would have been something silly in hindsight.

It was.

OpenGL’s textures in the command stream are controlled by the texture and sampler descriptors, corresponding to Vulkan’s textures and samplers respectively. They were the obvious place to look for bugs.

They were not the culprit.

Where did the blame lie, then?

The shader descriptor.

Midgard’s shader descriptor, a block in the command stream which configures a shader, has a number of fields: the address of the compiled shader binary, the number of registers used, the number of attributes/varyings/uniforms, and so forth. I thought that was it. A shader descriptor from a replay with textures looked like this (reformatted for clarity):

struct mali_shader_meta shader_meta = {
    .shader = (shader_memory + 1920) | 5,
    // XXX shader zero tripped
    .attribute_count = 0,
    .varying_count = 1,
    .uniform_registers = (0 << 20) | 0x20e00,
};

That is, the shader code is at shader_memory + 1920; as a fragment shader, it uses no attributes but it does receive a single varying; it does not use any uniforms. All accounted for, right?

What’s that comment about, “XXX shader zero tripped”, then?

There are frequently fields in the command stream that we observe to always be zero, for various reasons. Sometimes they are there for padding and alignment. Sometimes they correspond to a feature that none of our tests had used yet. In any event, it is distracting for a command stream log to be filled with lines like:

.zero0 = 0,
.zero1 = 0,
.zero2 = 0,
.zero3 = 0,

In an effort to keep everything tidy, fields that were observed to always be zero are not printed. Instead, the tracer just makes sure that the unprinted fields (which default to zero by the C compiler) are, in fact, equal to zero. If they are not, a warning is printed, stating that a “zero is tripped”, as if the field were a trap. When the reader of the log sees this line, they know that the replay is incomplete, as they are missing a value somewhere; a field was wrongly marked as “always zero”. It was a perfect system.

At least, it would have been a perfect system, if I had noticed the warning.

I was hyper-focused on the new texture and sampler descriptors, on the memory allocations for the texture memory itself, on the shader binaries – I was hyper-focused on textures that I only skimmed the rest of the log for anomalies.

If I had – when I finally did, on that fateful Thursday – I would have realised that the zero was tripped. I would have committed a change like:

-               if (t->zero1)
-                       panwrap_msg("XXX shader zero tripped\n");
+               //if (t->zero1)
+               //      panwrap_msg("XXX shader zero tripped\n");
 
+               panwrap_prop("zero1 = %" PRId16, t->zero1);
        panwrap_prop("attribute_count = %" PRId16, t->attribute_count);
        panwrap_prop("varying_count = %" PRId16, t->varying_count);

I would have then discovered that “zero1” was mysteriously equal to 65537 for my sample with a texture. And I would have noticed that suddenly, texture replay worked!

Everything fell into place from then. Notice that 65537 in decimal is equal to 0x10001 in hex. With some spacing included for clarity, that’s 0x 0001 0001. Alternatively, instead of a single 32-bit word, it can be interpreted as two 16-bit integers: two ones in succession. What two things do we have one of in the command stream?

Textures and samplers!

Easy to enough to handle in the command stream:

    mali_ptr shader;
-       u32 zero1;
+
+       u16 texture_count; 
+       u16 sampler_count;
 
    /* Counted as number of address slots (i.e. half-precision vec4's) */
    u16 attribute_count;

After that, it was just a matter of moving code from the replay into the real driver, writing functions to translate Gallium commands into Midgard structures, implementing a routine in the compiler to translate NIR instructions to Midgard instructions, and a lot of debugging. A week later, all the core code for textures was in place… almost.

The other big problem posed by textures is their internal format. In some graphics systems, textures are linear, the most intuitive format; that is, a pixel is accessed in the texture by texture[y*stride + x]. However, for reasons of cache locality, this format is a disaster for a GPU; instead, textures are stored “tiled” or “swizzled”. This article offers a good overview of tiled texture layouts.

Texture tiling is great and powerful for hardware. It is less great and powerful for driver writers. Decoding the swizzling algorithm would have been a mammoth task, orthogonal to the command stream and shader work for textures. 3D drivers are complex – textures have three major components that are each orthogonal to each other.

It would have been hopeless… if libv had not already decoded the layout when writing limare! The heavy lifting was done, all released under the MIT license. In an afternoon’s work, I extracted the relevant code from limare, cleaned it up a bit, and made it up about 20% faster (Abstract rounding). The resulting algorithm is still somewhat opaque, but it works! In a single thread on my armv7 RK3288 laptop, about 355, RGBA32 1080p textures can be swizzled in 10 seconds flat.

I then integrated the swizzling code with the Gallium driver, et voilà, vraimente– non, non, ce fois, c’est vrai – je ne mens pas! – euh, bon, je doivais finir d’autres tâches avant pouvoir démontrer test-cube-textured, mais…. voilà! (Sorry for speaking Prancy.)

Textures?

Textures!


On the Bifrost side, Lyude Paul has continued her work writing an assembler. The parser, a surprisingly complex task given the nuances of the ISA, is now working reliably. Code emission is in nascent stages, and her assembler is now making progress on instruction encoding. The first instructions have almost been emitted. May many more instructions follow.

However, an assembler for Bifrost is no good without a free driver to use it with; accordingly, Connor Abbott has continued his work investigating the Bifrost command stream. It continues to demonstrate considerable similarities to Midgard; luckily, much of the driver code will be shareable between the architectures. Like the assembler, this work is still in early stages, implemented in a personal branch, but early results look promising.

And a little birdy told me that there might be T880 support in the pipes.

Back to home