Buffer textures on the Apple GPU

19 Dec 2022

This weekend’s project: implementing buffer textures on the Apple GPU, including support for RGB32 textures. That’s needed for GL 4.0, and the hardware doesn’t support it. But why should that stop us from having a little fun? (-:

In this episode: generating ubershaders to implement full GL 4.0 class buffer textures with zero shader variants.

We know we can use the hardware for non-RGB32 textures, and we know how to implement RGB32 loads ourselves, so we just need to determine the format and branch accordingly. Roughly:

index = min(index, texture_size - 1);

if (format == RGB32) {
    u32 *raw = (u32 *) buffer;
    uvec3 texel = raw[index * 3];
    return uvec4(texel, 1);
} else {
    uint x = index % 1024;
    uint y = index / 1024;
    return texelFetch(tex, uvec2(x, y));
}

This compiles to about 17 instructions, in the usual case. The good news is that, as long as the texture itself is uniform, the compiler can hoist all of the “decode texture descriptors in the shader” instructions into a “preamble” shader, so performance shouldn’t be /that/ bad. Specifically, the preamble shader calculates the following expressions and stuffs them into uniform registers for us:

these are calculated by reading from the AGX texture descriptor and decoding it the way the hardware would. That means we can use a single descriptor format and a single shader, regardless of the format or size of the buffer. That means the code will work basically as-is not only for OpenGL, but also Vulkan, including with bindless textures. The driver just needs to build the appropriate 2D texture descriptor and the compiler’s texture lowering takes care of the rest. Pretty crafty, eh?

There’s one really awful hack in here: we need somewhere to stash the texture size, because the size we use for the 2D texture is rounded up. But we don’t want to pass that information via a side channel, because that would require special lowerings in both the GL and VK drivers, and would make bindless tricky.

So… we reuse some fields of the hardware texture descriptor that are otherwise unused…

It probably violates the hardware’s specification, but luckily, I don’t have the spec!

What I can’t know can’t hurt me :~)

Back to home