Documenting the ToupCam wireless protocol

[ToupTek] makes some nice microscope cameras. For my integrated circuit imaging microscope, I wanted to move away from my current Canon DSLR, which has survived a few tens of thousands of mechanical shutter release activations, so I'd like a cheaper backup.

 Canon Rebel T3i mounted on my Olympus trinocular BHMJ metallurgical microscope.

Canon Rebel T3i mounted on my Olympus trinocular BHMJ metallurgical microscope.

 I used a generic [ NDPL-2 (2x) adapter ]. It was quite loose and didn't even come with an O-ring, so I just hot-glued the thing. Even then it was a little wobbly so I put some paper shims in there with the glue. Surprisingly, the images come out almost, but not quite, perpendicular to the light path.

I used a generic [NDPL-2 (2x) adapter]. It was quite loose and didn't even come with an O-ring, so I just hot-glued the thing. Even then it was a little wobbly so I put some paper shims in there with the glue. Surprisingly, the images come out almost, but not quite, perpendicular to the light path.

AmScope has [a wireless camera] which seems to perform well. A little research showed that AmScope basically sells rebranded cameras from [ToupTek]. Even AmScope's camera software and microscope camera adapters are just rebranded.

 AmScope's HD205-WU camera. It is a rebranded [ T  oupTek X  CAM1080PHB ]. You also need the adapter, [ AmScope RU050 ], or [ ToupTek FMA050 ].

AmScope's HD205-WU camera. It is a rebranded [ToupTek XCAM1080PHB]. You also need the adapter, [AmScope RU050], or [ToupTek FMA050].

Now, the camera doesn't have a convenient logic level trigger input like the DSLR does, so I need some way of triggering the camera. The camera, which I'll start calling ToupCam, has a WiFi interface, a USB interface, an HDMI interface, and an SD card slot.

 ToupCam mounted on an AmScope inspection microscope, showing the HDMI output. Yes, that's a mouse cursor in the top part of the "E".

ToupCam mounted on an AmScope inspection microscope, showing the HDMI output. Yes, that's a mouse cursor in the top part of the "E".

The USB interface is, in fact, just a USB port. You can either plug in a mouse or the included WiFi USB dongle. With the mouse, you can move the cursor around and pull up the native interface on the HDMI monitor. With the WiFi dongle attached, the camera presents itself as an Access Point.

If you connect your computer's WiFi to that access point, you can then run ToupTek's ToupView (AmScope's AmScope) software, and it will find the camera. You can then control the camera from the software.

There's just one problem. If your computer is connected to the camera's WiFi AP, you can't connect to the Internet. Maybe if you had two WiFi adapters you could do it, but then you'd have to be sure to route Internet traffic to your Internet-connected adapter.

 I had no Internet access when I took this screenshot.

I had no Internet access when I took this screenshot.

So the camera sends live video, and can also send a "snap" (i.e. higher-resolution image). You can also send it commands changing various controls such as exposure.

Now, ideally, I'd like to connect to this camera using the BeagleBone Black that I already use to control the microscope stage and the camera trigger, and save the images to an SD card. However, the WiFi protocol is undocumented.

Luckily, WireShark can show me the WiFi packets.

wireshark-toupcam-connect.JPG

192.168.7.1 is the camera, and 192.168.7.2 is the address the camera gave to my computer. We can see that video is sent from the camera to the computer via the RTP protocol. It appears the camera looks like an ordinary IP camera accessible through an RTSP URL.

Of course, I don't care about video unless I want to display the video via the BBB, which could be convenient for focusing. What I really want to do is (a) get an image and (b) adjust controls.

So, I hit the Snap button and see what happens...

wireshark-toupcam-snap.JPG

Apparently snapping an image involves connecting to port 555 and sending this package that starts with "GE".

The response appears to be a GE reply, with a jpg file in there somewhere.

 The red portion is the request, the blue portion is the response. I've highlighted the beginning of the jpg file.

The red portion is the request, the blue portion is the response. I've highlighted the beginning of the jpg file.

Also note that there's a suspicious-looking "GEPJ" backwards "JPEG" in there. The 0x001E1E29 just prior to the JPG file appears to be the size of the enclosed JPG file, which goes from offset 0x6D to 0x1E1E95.

There's also another length before that, 0x001E1E3A, so it appears there's yet another wrapper, and a 0x001E1E4A before even that, so yet another layer of wrappers.

We don't really care about the details of these wrappers, just that they exist and can be used to get to the actual JPG file content.

Now, in terms of the controls, here's what happened when I fiddled around with the exposure setting:

wireshark-toupcam-exposure.jpg

From what I can tell, ToupCam is using the [Connectionless Network Protocol] (CLNP) to port 12000. The only thing I recognized in the data field is 0xB4 (180), which is what the exposure value is.

The rest of the CLNP packets come from the ToupCam itself, and appear to indicate the current settings of everything. What these settings are, I don't know. But in the data above, it appears that 03 is like a register number, 04 never changes so could be a payload length, and then 000000B4 is the data.

Here's the data for the CLNP packet coming back from the ToupCam reporting the exposure value: 00 00 00 00 02 03 04 00 00 00 B4. That 03 is in the position that changes for the CLNP packets from ToupCam, along with the last two bytes, so is probably a register number. For example, for register 01, the value is E290 (58000), a suspiciously round number. Color temperature in 1/10 K? Register 0A contains 2BC (700), another round number. There's a gamma setting of 7.00 that might be this register.

So the next step is to write a simple program which attempts to connect to the camera, pull a still image, and modify some values.

Building LMARV-1: a tangible RISC-V processor, part 1

A new project, hooray!

Background

RISC-V is not a processor in the sense of an ARM processor or x86. It is, in fact, an open specification for an Instruction Set Architecture (ISA). This means that the instruction set is standardized by the [RISC-V Foundation], and then anyone can implement [that instruction set] free of any licenses, royalties, or legal requirements.

RISC-V is the result of much study of existing ISAs. The result is regular, which makes an instruction decoder easy to create. It's also modular. There is only one subset of the ISA that is required to be implemented, the 32-bit integer set. This is called the 32I extension, meaning that all instructions and registers are 32 bits. Everything else is optional. For example, the M extension include multiplication and division instructions. A is for atomic operations. F and D are for single- and double-precision floating point. The G extension means "General", and means that you implement all of IMAF and D.

In general, if your implementation doesn't implement a particular G extension, you will have to provide libraries that do the same thing in software. Otherwise a compiler wouldn't be able to compile certain standard expressions.

There's also a 64-bit extension, 64I, and a 128-bit extension, 128I. There is a 32E extension, which is for smaller (e.g. embedded) processors which halves the number of required integer registers. Quad-precision floating points are in the Q extension. There's even an extension, C, for compressed instructions, which allows 16-bit and variable-length instructions. 

Some other interesting extensions, which have not yet been frozen (i.e. fully developed and agreed upon), are V (vector instructions), L (decimal floating point, as in calculators), and B (bit manipulation).

Speaking of extensions, RISC-V is extensible. If you have some piece of specialized hardware that you want custom instructions for, there is a whole range of opcodes reserved for that. Of course, your compiler would have to support those custom opcodes, or you could just wrap the assembly language.

And speaking of instructions, it should be pretty clear that the ISA is a reduced-instruction set. There are fewer than 50 instructions in the G extension! All the instructions are very simple. There aren't even any condition codes or flags such as carry, zero, or overload (these can be handled by other instructions).

RISC-V specifies four levels of privilege. The highest level is the machine level, which is the only required level. At this level all instructions have access to all of memory and all peripherals. The next level down is the hypervisor level, which is for things like virtual machines. Then there is the supervisor level below that, for operating systems and kernels. Finally, the lowest privilege level is the user level, which is for applications.

The plan

Most, if not all, projects implementing a RISC-V processor use an FPGA. However, I want to build a RISC-V processor that you can see and touch the insides of. So that you can learn about how a RISC-V processor actually works by observing it. I plan to use MSI and LSI chips, so things like buffers, flip flops, and so on. As few programmable chips as possible.

This first part is going to be about building the registers for the processor, which I call the LMARV-1 (Learn Me A RISC-V, level 1). It is level 1 because I only plan on implementing the 32I extension. Later levels add more features.

An instruction

There are several [formats of instructions], but the most interesting one specifies two source registers and a destination register:

xxxxxxx2222211111xxxdddddxxxxxxx

Here, d stands for a destination register, 1 for one source register, and 2 for a second source register. For example, adding two registers and storing the result in a third would use this instruction format.

Another format only specifies a destination and one source register. And a third format only has a destination register, for example for storing an immediate value. But the interesting thing is that the destination and source registers are all in the same position in the instruction. The instruction set is therefore regular. 

Registers

There are 32 registers called x0-x31, in addition to a program counter register, pc. In the 32I extension, these are all 32-bit registers. Interestingly, x0 is a fixed value, zero. Writing to it does nothing, and reading from it always yields zero. Compilers often set aside one register for a fixed zero value, and this just formalizes that.

The RISC-V spec also declares which registers are for what use. This is called the ABI spec, or Application Binary Interface. Each register has an alias name in the ABI. For example the name for x0 is "zero".

I am not too concerned with the ABI, since I'm not writing a compiler. I'm building the hardware that the compiler will write programs for.

So, here's my idea of what a single register should look like:

For this setup, I'm using a 74LVT16374, which is a 16-bit D flip-flop. LV means it's a low-voltage (3.3 volt) part, and T is the technology used, called ABT, or [BiCMOS]. This is like TTL but also low-power like CMOS.

IMG_20180208_193016.jpg

As a destination register, we can clock data from a destination bus into this register. We can also have two source buses, and use the 74LVTH162541 16-bit tristate buffer. This controls which source bus the register outputs on: one, the other, both, or none.

There are LEDs which can show the state of the register, which is important for a visible processor.

Now of course, these are all 16-bit parts. There are no equivalent 32-bit parts. So we would have to multiply the number of chips per register by two, making six. And then there are 31 registers (aside from the zero register), so 96 chips.

Here's an alternate setup:

IMG_20180208_193023.jpg

Here, I store the same value in three places: two sources and a display (the unlabeled box at the bottom, which is also a '16374). This has several advantages: cost, and drive capability. The '16374 can drive 32mA, while the '162541 can only drive 12mA.

And here it is!

IMG_20180206_163640.jpg

The LEDs on it are specifically 3mm flangeless LEDs. The reason they are flangeless is that the flange on the LED adds to the 3mm width, which would require the board to be about 10% larger.

The two card edges go into two PCIe x4 slots, for a total of 196 signals. Some of these signals are used for power and ground between bits on the bus, for signal integrity.

You can see a [video version] of this post on my YouTube channel.

And, all of [my schematics] (KiCAD) are on GitHub and are [Open Source Hardware].

 

Retroassembler to C

Yesterday I drove from California's Bay Area to the Central Coast, a four-hour drive, to pick up some vintage HP equipment so that I could save on shipping and having another damn crate in my garage that I'd have to cut up and stick in the garbage a piece at a time. Anyway, during those eight hours, while listening to the [Retro Computing Roundtable] podcast, I had a stray thought.

Probably not a powerful enough stray thought to do this, though:

 Honestly, I need another project like I need a hole in the head.

Honestly, I need another project like I need a hole in the head.

The thought went like this. Emulators exist for retro processors such as the [6502] and [Z80]. Actually, emulators exist for whole systems: [MAME] (Multi Arcade Machine Emulator / Multi Emulator Super System), for one. [Dosbox] runs [on the Internet Archive], so you can play old MS-DOS games in your browser.

Anyway, these emulators faithfully reproduce the processor down to the instruction level. Now, when I read an assembly listing, sometimes I translate the instructions to C in order to help me understand what's going on. What if you could do that for the whole program and then compile the resulting C code?

Before continuing, anyone interested should check out Jamulator, a project which takes 6502 binaries, generates assembly language from it, and uses that as the input language to LLVM, which then compiles it for a native processor. It's brilliant.

There's a whole host of problems with this idea, though. For one emulators are plenty fast enough. Also, there's the whole flow analysis thing that [IDA] does so well but still needs human input for. And, LLVM is pretty tough to develop for (I've tried). But it's getting easier.

Consider the task of adding two 16-bit numbers on an 8-bit processor. On a 6502, it would go something like this:

LDA SRC1      ; A <- low byte of SRC1
ADD SRC2      ; A <- A + low byte of SRC2
STA DEST      ; low byte of DEST <- A
LDA SRC1+1    ; A <- high byte of SRC1
ADC SRC2+1    ; A <- A + high byte of SRC2 + carry
STA DEST+1    ; high byte of DEST <- A

Now, a naive translation to C would result in something like this:

#include <stdint.h>

void add16(uint8_t* src1, uint8_t* src2, uint8_t* dest)
{
    uint8_t a = *src1;
    a += *src2;
    uint8_t c = a < *src1;
    *dest = a;
    a = *(src1+1);
    a += *(src2+1) + c;
    *(dest+1) = a;
}

Using the Compiler Explorer, I was able to compile this into X86 assembly using clang (at optimization level 3):

add16: # @add16
  mov al, byte ptr [rsi]
  add al, byte ptr [rdi]
  mov byte ptr [rdx], al
  mov al, byte ptr [rsi + 1]
  adc al, byte ptr [rdi + 1]
  mov byte ptr [rdx + 1], al
  ret

That's as close as we're going to get to optimized code. Clang recognized the carry trick.

Note that clang didn't go the extra step of recognizing this as a 16-bit add. For one, this would be an unaligned add. For another, compilers generally excel in optimizing code from the top down, not from the bottom up like we're trying to do. Nevertheless, clang did a great job here.

Not quite with gcc.

add16:
  movzx eax, BYTE PTR [rdi]
  add al, BYTE PTR [rsi]
  setc cl
  mov BYTE PTR [rdx], al
  movzx eax, BYTE PTR [rsi+1]
  add al, BYTE PTR [rdi+1]
  add eax, ecx
  mov BYTE PTR [rdx+1], al
  ret

Here gcc did recognize the carry (hence the SETC instruction), but failed to take advantage of that in the later add, instead having to add twice. gcc did not recognize that we only used one bit in the uint8_t c, and so was forced to add the entire carry uint8_t.

I even tried hinting that c was a single bit:

#include <stdint.h>

typedef struct flags
{
    unsigned c:1;
} flags;

void add16(uint8_t* src1, uint8_t* src2, uint8_t* dest)
{
    flags f;

    uint8_t a = *src1;
    a += *src2;
    f.c = a < *src1;
    *dest = a;
    a = *(src1+1);
    a += *(src2+1) + f.c;
    *(dest+1) = a;
}

Nope, same output. clang still generated the correct output. Note that clang 4.0.1 did not fare very well, and it was only by clang 5.0.0 that the output became nicely optimized.

Anyway, that's all I wanted to say. I don't intend this to go any further, but it sure would make an interesting project for someone else!