Metaprogramming NES emulator with tosh

25 Jan 2017

Over the summer, I wrote an NES emulator in Scratch. Well, really, it was only from Scratch; the code itself was written in tosh by Tim Radvan. tosh is a text-based front-end to Scratch, which lets me manage large projects with vim, git, and – wait for it – metaprogramming! Metaprogramming, in its essence, means generating tosh code from other bits of tosh code to construct large programs with minimum effort. In ScratchNES, I used metaprogramming to generate the CPU emulator for the NES. It’s not magic!

At the heart of the NES is the Ricoh 2A03, featuring 6502 missing decimial mode and some support code. The 6502 in turn was a popular 8-bit CPU governed by a simple instruction set, with a variety of instructions and a handful of addressing modes. The Virtual 6502 project maintains a nice reference for the instruction set which you might like to check out. In any event, there are a handful of ways of interpreting this instruction set. The first (and perhaps the most intuitive) way is to notice the patterns in opcode encoding, and intelligently decode instructions like the original PLA would. While this is a nice idea, it’s rather complex and slow in Scratch. Sorry, tried that last time!

Another possible technique, a technique which is rather popular for emulators in general, would be to manually decode opcodes, starting from a massive, handwritten if-else chain. Sorry, unrolling that much code by hand is so 2016. This leaves us the masochistic third choice – generate code!

From here, the path is rather clear. We write out stubs for each addressing modes and each instruction, in addition to stub-generators for increased indirection and insanity. From here, we just need to write a program to stitch the code together. But how do we at least tell the code-generator which opcodes correspond to which addressing modes and instructions? Even though it would probably take ten minutes, why waste another bad idea?

Remember that tabular instruction set reference I linked above? It follows a rather regular structure. We can write a simple parser for that file, emitting the instruction table for the metaprogrammer. The final question is how to map efficiently from opcodes to tosh code at runtime.j Normally, jump tables are preferred here, but Scratch, unlike its sister Snap!, unfortunately lacks first-class functions. The naive solution would be an if-else chain. For four opcodes, this might look something like:

if x = 0 then
    op ABC
else if x = 1 then
    op DEF
else if x = 2 then
    op GHI
else if x = 3 then
    op JKL
end

Unfortunately, this solution is linear to the number of opcodes. That means, it would take 256 comparisons to execute one BRK instruction! We can do better use a binary search. This way, it will only take 8 comparisons:

if x > 1 then
    if x > 2 then
        op JKL
    else
        op GHI
    end
else
    if x > 0 then
        op DEF
    else
        op ABC
    end
end

Admittedly, the code isn’t the prettiest, but that’s alright, since we can using yet another bit of meta-programming to generate that for us too! With this final insight and some messy glue code, we have a mostly-performant, mostly-functional CPU emulator!

That being said, I did promise an NES emulator, not a 6502 emulator or an NES CPU emulator or a Ricoh 2A03 emulator. There are a number of peripherals: the PPU for graphics, the APU for sound, joystick logic, and memory mappers. I won’t discuss the details of these implementations here since they are fairly standard as far as NES emulators go. If this is interesting to you, see the NESDev wiki. As long as the CPU emulator maintains a cycle count internally – which is relatively easy to add, since cycles are listed in the reference – these modules are separate. With a lot of sweat later, you’ll converge on a working emulator!

Where to now? It turns out that for all of the fun you can have emulating a 6502, the CPU isn’t the bottleneck in the emulator. The real issue is the PPU (graphics), since Scratch is slow. Thanks to pen quirks, even phosphorus, an optimising Scratch transpiler, renders slowly. Nevertheless, this does demonstrate metaprogramming with Scratch is possible, if not particularly useful! Going forward, a natural optimization is static recompilation, where a particular game is compiled directly as opposed to a generic emulator. And since our CPU core is encapsulated so nicely, separating instructions and addressing modes from the general logic, much of the infrastructure is already there as mentioned by Graham Toal. There’s still quite a few difficulties posed, especially with a fast PPU – but if you’re using static recompilation, perhaps CHR-ROM could turn into costumes and sprites into clones? Either way, it would certainly be a fun project for the industrious reader!

To a friend of mine – you know who you are – tosh! Tosh tosh, tosh tosh tosh, tosh :-)

Back to home