dEQP-GLES3 doesn't test XFB with an index buffer Goals for this yak shave: Pave the way for conformant Valhall Improve Bifrost performance Delete a pile of code from the GLSL compiler Fix a pile of piglits Get transform feedback on AGX for free To do that, I need: A nir pass that turns store_output into store_global + ALU according to transform feedback Specialize a compute shader that does just that Run that compute shader for each vertex... The third piece is nontrivial because of index buffers and tessellation requirements (all of this is currently broken in Panfrost) I guess for the full desktop GL we want a 2 pass approach Pass 1 tessellates quads/polygons, unrolls strips/fans, and produces a flat index buffer of just POINTS, LINES, or TRIANGLES. Pass 2 runs for each element of the flat index buffer (* instance count), and runs the XFB-specialized vertex shader with vertex ID = flat index and instance ID per workgroup.y Pass 2 is "easy", and is basically what Panfrost already does. Pass 1 is complicated due to primitive restart. (...Is pass 1 possible to vectorize at all if primitive restart is used? Man, OpenGL sucks hard.) 15:35 I guess pass 1 is a serial bottleneck if primitive restart is used. That's known at draw time, I suppose.