Retroassembler to C

Yesterday I drove from California's Bay Area to the Central Coast, a four-hour drive, to pick up some vintage HP equipment so that I could save on shipping and having another damn crate in my garage that I'd have to cut up and stick in the garbage a piece at a time. Anyway, during those eight hours, while listening to the [Retro Computing Roundtable] podcast, I had a stray thought.

Probably not a powerful enough stray thought to do this, though:

Honestly, I need another project like I need a hole in the head.

Honestly, I need another project like I need a hole in the head.

The thought went like this. Emulators exist for retro processors such as the [6502] and [Z80]. Actually, emulators exist for whole systems: [MAME] (Multi Arcade Machine Emulator / Multi Emulator Super System), for one. [Dosbox] runs [on the Internet Archive], so you can play old MS-DOS games in your browser.

Anyway, these emulators faithfully reproduce the processor down to the instruction level. Now, when I read an assembly listing, sometimes I translate the instructions to C in order to help me understand what's going on. What if you could do that for the whole program and then compile the resulting C code?

Before continuing, anyone interested should check out Jamulator, a project which takes 6502 binaries, generates assembly language from it, and uses that as the input language to LLVM, which then compiles it for a native processor. It's brilliant.

There's a whole host of problems with this idea, though. For one emulators are plenty fast enough. Also, there's the whole flow analysis thing that [IDA] does so well but still needs human input for. And, LLVM is pretty tough to develop for (I've tried). But it's getting easier.

Consider the task of adding two 16-bit numbers on an 8-bit processor. On a 6502, it would go something like this:

LDA SRC1      ; A <- low byte of SRC1
ADD SRC2      ; A <- A + low byte of SRC2
STA DEST      ; low byte of DEST <- A
LDA SRC1+1    ; A <- high byte of SRC1
ADC SRC2+1    ; A <- A + high byte of SRC2 + carry
STA DEST+1    ; high byte of DEST <- A

Now, a naive translation to C would result in something like this:

#include <stdint.h>

void add16(uint8_t* src1, uint8_t* src2, uint8_t* dest)
{
    uint8_t a = *src1;
    a += *src2;
    uint8_t c = a < *src1;
    *dest = a;
    a = *(src1+1);
    a += *(src2+1) + c;
    *(dest+1) = a;
}

Using the Compiler Explorer, I was able to compile this into X86 assembly using clang (at optimization level 3):

add16: # @add16
  mov al, byte ptr [rsi]
  add al, byte ptr [rdi]
  mov byte ptr [rdx], al
  mov al, byte ptr [rsi + 1]
  adc al, byte ptr [rdi + 1]
  mov byte ptr [rdx + 1], al
  ret

That's as close as we're going to get to optimized code. Clang recognized the carry trick.

Note that clang didn't go the extra step of recognizing this as a 16-bit add. For one, this would be an unaligned add. For another, compilers generally excel in optimizing code from the top down, not from the bottom up like we're trying to do. Nevertheless, clang did a great job here.

Not quite with gcc.

add16:
  movzx eax, BYTE PTR [rdi]
  add al, BYTE PTR [rsi]
  setc cl
  mov BYTE PTR [rdx], al
  movzx eax, BYTE PTR [rsi+1]
  add al, BYTE PTR [rdi+1]
  add eax, ecx
  mov BYTE PTR [rdx+1], al
  ret

Here gcc did recognize the carry (hence the SETC instruction), but failed to take advantage of that in the later add, instead having to add twice. gcc did not recognize that we only used one bit in the uint8_t c, and so was forced to add the entire carry uint8_t.

I even tried hinting that c was a single bit:

#include <stdint.h>

typedef struct flags
{
    unsigned c:1;
} flags;

void add16(uint8_t* src1, uint8_t* src2, uint8_t* dest)
{
    flags f;

    uint8_t a = *src1;
    a += *src2;
    f.c = a < *src1;
    *dest = a;
    a = *(src1+1);
    a += *(src2+1) + f.c;
    *(dest+1) = a;
}

Nope, same output. clang still generated the correct output. Note that clang 4.0.1 did not fare very well, and it was only by clang 5.0.0 that the output became nicely optimized.

Anyway, that's all I wanted to say. I don't intend this to go any further, but it sure would make an interesting project for someone else!