Compiler --------- RA: [x] Register preloading for better RA [x] Enable 2x occupancy for <32 regs (v7+) [x] Message preload LD_VAR_IMM and VARTEX (v7+) Scheduler: [x] Legalize uniform loads/constants [x] Wire dependency slot flags from RSD in [x] Add packing support for all the formats [x] Generate appropriate DAG [x] Construct clauses, selecting tuples per design paper [x] Emit *IADD/*ISUB [x] Optimize delays somehow [ ] Optimize out staging register barriers via round robin(?) [ ] Optimize scoreboard indices Opt passes: [x] Legalize swizzles (in order to enable fp16 with vectorizer) [x] Optimize UBO access to FAU [x] CSE to clean up texture lowering Modifiers: [x] Fuse abs/neg into floating point ops [x] Fuse clamp into floating point ops [ ] Fuse type converts into widen/extend [-] Fuse not into bitwise/shifts ops (sources and dest, via De Morgan's) [-] Fuse shifts together with bitwise ops [x] Fuse b2i into comparisons via result type Others: [x] TEXS for cube maps [x] Wire in nir_opt_large_constants (constants to either UBO loads or PC-relative LOADs) [x] Wire in nir_lower_scratch (arrays to TLS) [x] Omit ATEST in blit shaders [x] Native fsin/fcos [x] Fuse LD_VAR_IMM with TEXS -> VARTEX [x] Dual texturing [x] Data flow analysis for helper invocations (skip and tdd) [ ] Data flow analysis for varyings (sample and update) Data structures ---------------- [x] Proper early-z/fpk handling [x] 2*src*dst handling for fixed-function blend [x] Report barriers correctly [x] Overdraw flags (v7) [ ] Allow primitive reorder (v7) [x] Pre-frame shaders for faster blits [ ] Empty tile handling (v7)