Dual texturing notes: Both textures must be 2D, sampled at the same exact coordinate, indexed 0...3, sampled as floating-point, and with the default LOD mode for the shader stage (computed in frag shaders) 09:30 texture2D(uTexture0, coord) + texture2D(uTexture1, coord) meets all the criteria, basically anything fancier does not Bizarrely the texture operation descriptor encodes the second staging register write. This is highly unusual for Bifrost. Model in the IR as two staging register destinations, leave the register in the IR descriptor as zero, and fix up in bi_pack. IR needs a bit of surgery for multiple stagings like that, but I typed out that patch a week ago for something else. Need to clean up that code, though. Each texturing op in a dual texture TEXC is a strict subset of {TEXS_2D.f32, TEXS_2D.f16}. So add an optimization pass to fuse them. Realistically fusing across a basic block boundary will hurt more shaders than help (increasing register pressure, doing redundant work, etc). So this can be a purely local pass. Dual texture ops are related by their coordinates. So use an appropriate data structure (e.g. a hash table indexed by coordinates' bi_index) to accelerate fusing dual tex ops. Then with these reductions, the optimization pass remains θ(n). Adding a skip flag to an instruction that doesn't have it is incorrect, but removing a skip flag from one that has it is is correct (just slow). So set the fused TEXC's skip to the logical AND of the unfused TEXS flags. This requires the optimization pass to run after the helper invocation analysis pass.