Dual texturing notes:

Both textures must be 2D, sampled at the same exact coordinate, indexed 0...3,
sampled as floating-point, and with the default LOD mode for the shader stage
(computed in frag shaders) 09:30

texture2D(uTexture0, coord) + texture2D(uTexture1, coord) meets all the
criteria, basically anything fancier does not

Bizarrely the texture operation descriptor encodes the second staging register
write. This is highly unusual for Bifrost. Model in the IR as two staging
register destinations, leave the register in the IR descriptor as zero, and fix
up in bi_pack. IR needs a bit of surgery for multiple stagings like that, but I
typed out that patch a week ago for something else. Need to clean up that code,
though.

Each texturing op in a dual texture TEXC is a strict subset of {TEXS_2D.f32,
TEXS_2D.f16}. So add an optimization pass to fuse them.  Realistically fusing
across a basic block boundary will hurt more shaders than help (increasing
register pressure, doing redundant work, etc). So this can be a purely local
pass.

Dual texture ops are related by their coordinates. So use an appropriate data
structure (e.g. a hash table indexed by coordinates' bi_index) to accelerate
fusing dual tex ops. Then with these reductions, the optimization pass remains
θ(n).

Adding a skip flag to an instruction that doesn't have it is incorrect, but
removing a skip flag from one that has it is is correct (just slow). So set the
fused TEXC's skip to the logical AND of the unfused TEXS flags. This requires
the optimization pass to run after the helper invocation analysis pass.