Retroassembler to C

Yesterday I drove from California's Bay Area to the Central Coast, a four-hour drive, to pick up some vintage HP equipment so that I could save on shipping and having another damn crate in my garage that I'd have to cut up and stick in the garbage a piece at a time. Anyway, during those eight hours, while listening to the [Retro Computing Roundtable] podcast, I had a stray thought.

Probably not a powerful enough stray thought to do this, though:

Honestly, I need another project like I need a hole in the head.

Honestly, I need another project like I need a hole in the head.

The thought went like this. Emulators exist for retro processors such as the [6502] and [Z80]. Actually, emulators exist for whole systems: [MAME] (Multi Arcade Machine Emulator / Multi Emulator Super System), for one. [Dosbox] runs [on the Internet Archive], so you can play old MS-DOS games in your browser.

Anyway, these emulators faithfully reproduce the processor down to the instruction level. Now, when I read an assembly listing, sometimes I translate the instructions to C in order to help me understand what's going on. What if you could do that for the whole program and then compile the resulting C code?

Before continuing, anyone interested should check out Jamulator, a project which takes 6502 binaries, generates assembly language from it, and uses that as the input language to LLVM, which then compiles it for a native processor. It's brilliant.

There's a whole host of problems with this idea, though. For one emulators are plenty fast enough. Also, there's the whole flow analysis thing that [IDA] does so well but still needs human input for. And, LLVM is pretty tough to develop for (I've tried). But it's getting easier.

Consider the task of adding two 16-bit numbers on an 8-bit processor. On a 6502, it would go something like this:

LDA SRC1      ; A <- low byte of SRC1
ADD SRC2      ; A <- A + low byte of SRC2
STA DEST      ; low byte of DEST <- A
LDA SRC1+1    ; A <- high byte of SRC1
ADC SRC2+1    ; A <- A + high byte of SRC2 + carry
STA DEST+1    ; high byte of DEST <- A

Now, a naive translation to C would result in something like this:

#include <stdint.h>

void add16(uint8_t* src1, uint8_t* src2, uint8_t* dest)
{
    uint8_t a = *src1;
    a += *src2;
    uint8_t c = a < *src1;
    *dest = a;
    a = *(src1+1);
    a += *(src2+1) + c;
    *(dest+1) = a;
}

Using the Compiler Explorer, I was able to compile this into X86 assembly using clang (at optimization level 3):

add16: # @add16
  mov al, byte ptr [rsi]
  add al, byte ptr [rdi]
  mov byte ptr [rdx], al
  mov al, byte ptr [rsi + 1]
  adc al, byte ptr [rdi + 1]
  mov byte ptr [rdx + 1], al
  ret

That's as close as we're going to get to optimized code. Clang recognized the carry trick.

Note that clang didn't go the extra step of recognizing this as a 16-bit add. For one, this would be an unaligned add. For another, compilers generally excel in optimizing code from the top down, not from the bottom up like we're trying to do. Nevertheless, clang did a great job here.

Not quite with gcc.

add16:
  movzx eax, BYTE PTR [rdi]
  add al, BYTE PTR [rsi]
  setc cl
  mov BYTE PTR [rdx], al
  movzx eax, BYTE PTR [rsi+1]
  add al, BYTE PTR [rdi+1]
  add eax, ecx
  mov BYTE PTR [rdx+1], al
  ret

Here gcc did recognize the carry (hence the SETC instruction), but failed to take advantage of that in the later add, instead having to add twice. gcc did not recognize that we only used one bit in the uint8_t c, and so was forced to add the entire carry uint8_t.

I even tried hinting that c was a single bit:

#include <stdint.h>

typedef struct flags
{
    unsigned c:1;
} flags;

void add16(uint8_t* src1, uint8_t* src2, uint8_t* dest)
{
    flags f;

    uint8_t a = *src1;
    a += *src2;
    f.c = a < *src1;
    *dest = a;
    a = *(src1+1);
    a += *(src2+1) + f.c;
    *(dest+1) = a;
}

Nope, same output. clang still generated the correct output. Note that clang 4.0.1 did not fare very well, and it was only by clang 5.0.0 that the output became nicely optimized.

Anyway, that's all I wanted to say. I don't intend this to go any further, but it sure would make an interesting project for someone else!

Reverse-engineering the 1977 Unisonic 21 calculator/game (part 5.1)

Here's the next part of the segment driver circuit.

segment_sch_driver_left2.png

The output on the left goes to the latch-type output circuit from the previous post. On the right we have three inputs to this circuit. IN_A and X2 are common to all segments. IN_21C is an input specific to the segment at pin 21.

We can already guess that C3 is likely a bootstrap capacitor, so plays no logical role. It is also likely that C2 is a boosting capacitor for Q1. That means that X2 is the boosting signal.

Since C2 and C3 are boosting capacitors, we can ignore their effect in the steady state and concentrate on how the thing works logically.

X2 gates IN_21C at Q6. So the gate of Q5 is low only when Q6 is on and IN_21C is low.

Now, when Q5 is on, IN_A gets passed through to the gate of Q4. So when IN_A is low, that turns Q4 on, pulling the output low.

As we saw in part 5, pulling this output low will cause the output pin to latch high. The output can only go low when it is reset by that CE/CG combination.

So we've deduced that the output pin gets latched high only when IN_A goes low, IN_21C goes low, and X2 goes low.

Now, let's try simulating this. For this simulation, I am going to use the [MIC94030], because it is pretty much the only four-terminal PMOS left in existence, and I plan (hopefully) to use it for a dis-integrated version of the Unisonic 21.

segment_sch_driver_left2_sim.png

It's a little busy, but I want to call your attention to a few things. First, I removed all the capacitors, and added pull-up resistors at all the gates.

Second, even though there is no existing SPICE model for this PMOS, I derived three parameters from the datasheet (Vt0 from the nominal threshold, and kp and lambda from the saturation graphs). So I have no idea if the simulation is accurate.

I've labeled successive gates in the circuit as nodes g1, g2, g3 and g4. Note that g1 goes low whenever IN_B and IN_21C are low, and that g2, g3, and g4 go low when all three inputs are low, as predicted. The output goes high when all three inputs are low.

What is important is the voltage at each gate when it is low. Going from g1 to g4, we appear to be losing a volt after every transistor. This is expected: without being boosted, you lose a threshold. This was explored in [the previous post about bootstraps].

It seems this loss of four volts is not enough to ruin the output of the last transistor which, after all, would have to have a gate voltage below 11v to turn on. Still, it would be nice to fix the issue. We can do that by adding bootstrap capacitors.

Think of bootstrap capacitors as adding feedback from the output back to the gate. If the output goes a little low, that just makes the gate go lower.

Here's one bootstrap capacitor. I set its capacitance so that you can just see the sag in the output.

segment_sch_driver_left2_sim_bootstrap1.png

We can see now that thanks to the bootstrap capacitor, g3 is able to reach a lower voltage than before. This gets translated to a lower voltage at g4. The capacitor, however, is not large enough to maintain its voltage over the period of the pulse.

Here I've increased the bootstrap capacitance from 1n to 10n:

segment_sch_driver_left2_sim_bootstrap10n.png

Interestingly, adding a capacitor across the transistor M4 screws things up:

segment_sch_driver_left2_sim_bootstrap10n_screwup.png

It seems that this capacitor is causing trouble. But adding one across M2 works fine, reducing the low voltage at g4 as expected:

segment_sch_driver_left2_sim_bootstrap10n_noscrewup.png

I suspect bootstrap capacitors only work if one terminal of the transistor is grounded.

Anyway, this is another apparently successful simulation, meaning that it can probably be made physical.

Reverse-engineering the 1977 Unisonic 21 calculator/game (part 5)

Previously, I found some clock signals (X1, Y1, X2, Y2).

clock2_simplified_pulses.jpg

These signals get spread throughout the chip to control the timing of various bits and pieces. One piece they go to is labeled in the main patent as "7-Segment Decode Logic".

block-diagram.jpg

The decode block has three inputs (aside from the assumed clock inputs): accumulator, C flip/flop, zero-suppression logic. The accumulator has four bits, and the C flip/flop is described as being able to be set, reset, tested, or loaded from the carry output of the adder.

The decode logic is described in this way: "The 10 data or segment output signals are generated at the output of a holding flip-flop in the segment decode logic shown in the lower right-hand portion of Fig. 3. The holding flip-flop for segment signals can be loaded through 7 segment display logic driven by the accumulator, or by dedicated bits when special symbols are required. The holding flip-flop for segment signal SEGP is loaded from the C flip-flop."

"SEGP" is likely the dot in each 7-segment display. The description also talks about 10 signals, rather than 7 (or 8 including the dot). This is because in the description of the overall block diagram, there are "up to 10 data outputs with options for discrete or seven-segment decoding logic, designated SEG0 through SEG9".

So, I started by working my way backwards from a known segment pin on the chip. 

segment_driver_left.jpg
segment_sch_driver_left.png

Please note that in the actual chip, VSS is connected to ground and VDD is connected to -14.6v. Because this confuses me, I consider the lower voltage as ground, and the higher voltage as VDD. Essentially I raised all voltages by 14.6v and relabeled the power connections.

As an aside, I noted that the PPS-4 datasheet states that the PPS-4 uses negative logic: a logic 1 is the most negative voltage, while a logic 0 is the most positive voltage.

First, I labeled the pin "SEG_21" because it is pin 21. I don't know what order the segments are in, so I just labeled them according to their pin. There are four signals common to all segments: CE, CG, IN_A, and IN_B. I don't know what they do yet, so the names are a bit random. There is also one signal coming from what appears to be the actual decode logic. I labeled this signal IN_21C.

Driving the pin is a large open-drain FET inverter.

Q3's gate is permanently connected to ground, which means it is on all the time. It seems that whatever C1 is, it is connected to gate of the output FET. The signals CE and CG appear to manipulate the state of C1. For example, with CG and CE both low, that would charge up capacitor C1 to VDD. If CE were high, through, it would discharge C1.

Let's simulate that:

segment_sim_driver_left.jpg

The result is not expected. The first time CG goes low, we do expect the capacitor to discharge when CE is high. Then, CG goes high which turns M1 off. At that point, I wouldn't expect CE to have any effect. But it does, because of the unfortunate substrate connection of M1. The body diode of M1 immediately conducts when CE goes low.

Luckily, LTSpice has a four-terminal PMOS. Let's use that instead.

segment_sim_driver_left_4pmos.jpg

This makes much more sense. Now when CG is low, the capacitor charges or discharges based on CE. When CG is high, the capacitor keeps its state. And the state of C1 is what comes out of M3, which would be the segment pin.

I made C1 relatively large on purpose so I could see it charge and discharge. Making it smaller just makes it charge and discharge faster.

All together, it looks like this section of the circuit is some kind of memory. But there is a problem. Well, two problems. The only four-terminal PMOS I know of is the MIC94030/31/50/51. So if that goes out of production, this circuit becomes nonviable.

But the main problem is the gate of M3. It can be pulled down by other FETs. What happens then?

Well, certainly M3 will be forced on, so the output state at the pin will be high. But I'm more concerned with what happens to capacitor C1. If the gate of M3 is pulled low, capacitor C1 must charge up. Then, if the gate of M3 is released, capacitor C1 will maintain the output state.

segment_sim_driver_left_4pmos_pulse.jpg

While the gate of M3 is pulled low, you don't want to turn M1 on, because then if CE is high, you would short through M1 and M2. Hopefully that doesn't happen.

So in the end, we're left with some kind of latching output. We can set the output high by connecting the gate of M3 temporarily to ground. We can clear this state by setting CE high and then pulsing CG low. We can also set the state by setting CE low and then pulsing CG low.