How I didn't crack the Voynich Manuscript

I’ve been interested on and off in the Voynich Manuscript, mostly because it appeals to my appreciation of old occult books. Looking at the book, it’s clear there is a lot of structure in its writing. It’s clearly not, say, solely a work of art such as the Codex Seraphinianus. The history of the Voynich and the cracking attempts against it can be found all over the place. I particularly enjoyed Nick Pelling’s book, Curse of the Voynich. It might be fun, I thought, to take a look at it myself.

First step was assigning letters to each glyph. There is a standard format called EVA that has been used to transcribe the Voynich, but I found it too cumbersome. Although EVA can be transformed to any other format due to its expressiveness, that transformation would also rely on an already-completed transcription. I really wanted to start with a fresh eye.

Warning! Yes, I am aware of other researchers’ theories. The idea is that I’ll use my own theory and see where it takes me. Maybe I’ll make a bad decision. In any case, I reserve the right to modify my theories!

Using the images of the Voynich as my starting point, I looked through the first few pages and settled on a basic alphabet of glyphs, closely modeled on EVA:

Voynich glyphs

Again, this is just my initial take: 25 glyphs. I’m not taking into account other, more rare glyphs at this point. Major differences between my notation and EVA are that my ‘g’ is EVA ‘m’, my ‘v’ is EVA ’n’, my ‘q’ is EVA ‘qo’, and my ‘x’ is EVA ‘ch'. I also specifically include an ‘m' and an ’n' symbol for EVA ‘iin’ and ‘in’, respectively.

The one ‘q’ I found with an accent above it, I just decided to render as ‘Q’. Likewise, the ‘x’ with an accent above I render as ‘X’. The tabled gallows characters ‘f’, ‘k’, ‘p’, and ’t’ are represented as capital letters. This doesn’t mean that they are the same character. This is just a mapping of glyphs to ASCII.

Next, I transcribed some pages using this mapping. I decided to use ‘^’ for a character indicating the beginning of a “paragraph”, including the beginning of any obviously disconnected text areas. Similarly, ‘$’ marks the end of a “paragraph”. A period is a space between “words”. Occasionally I had to make a judgement call as to whether there was a space or not. Furthermore, lines always end in either a period or, if the line is the last in a paragraph, an end-paragraph symbol.

Finally, any character that I could not figure out or that doesn’t fit in the mapping is replaced by an asterisk.

Thus, the first paragraph on the first page is transcribed as follows:

F1r

F1r t

Again, there are many caveats: I had to make judgement calls on some glyphs and spacing. I’m also aware that Nick Pelling has a theory that how far the swoop on the v goes is significant, and clearly I’m ignoring that.

In any case, after transcribing a few pages, I became aware of certain patterns, such as -am nearly always appearing at the end of a word, and three-letter words being common. I did a character frequency analysis, and found that ‘o’ was the most common character, with ‘y’, ‘a’, and ‘x’ being about 50% of the frequency of ‘o’. Way at the bottom were 'F', 'f', and 'Q'.

Then I did another frequency analysis, this time of trigrams. Most common were ‘xol’, ‘dam’, and ‘xor’. Then I asked, how often do the various trigrams appear at the beginning of a word and at the end of a word? The top five beginning trigrams were ‘xol’, ‘dam’, ‘xor’, ‘Xol’, and ‘xod’, while the top five ending trigrams were ‘xol’, ‘xor’, ‘dam’, ‘Xol’, and ‘ody'.

Are these basic lexemes? I began to suspect that lexemes could encode letters or syllables.

I turned to page f70v2, which is the Pisces page. There were 30 labels, each next to a lady. The labels were things like ‘otolal’, ‘otalar’, ‘otalag’, ‘dolarag’… Most looked like the words were composed of three two-letter lexemes: ‘ot’, ‘ol’, ‘al’, ‘ar’, ‘dol’, ‘ag’, and so on.

So I decided to look for more lexemes by picking the most common trigram, ‘xol’, and looking for words containing it. These were: ‘xol’, ‘otxol’, ‘o*xol’, ‘ypxol’, ‘dxol’, ‘xolam’, ‘xolo’, ‘xololy’, ‘xolTog’, ‘opxol’, ‘btxol’, ‘xoldy’, ‘xolols’. That would make as the lexemes ‘ot-’, ‘yp-‘, ‘d-‘, ‘-am’, ‘-o’, ‘-oly’, ‘-Tog’, ‘op-‘, ‘bt-‘, ‘-dy’, and ‘-ols’.

Similarly, for ‘xor’, we get lexemes ‘d-‘, ‘xeop-‘, ‘k-‘, ‘ot-‘, ‘bp-‘, ‘ok-‘, ’t-‘, ‘-am’, and ‘Xk-‘. At least in the first few pages.

So certain lexemes seemed to come up often, namely ‘ot-‘, ‘ok-‘, ‘op-‘, ‘d-‘, ‘-am’, ‘-dy’.

One possibility that no doubt has already been discounted decades ago is that each lexeme corresponds to a letter. So something common like ‘xol’ could be ‘e’. Because there are so many lexemes, clearly a letter could be encoded by more than one lexeme. Or maybe each lexeme encodes a syllable, so ‘xol’ could be ‘us’. I tend to doubt the polyalphabetic hypothesis, since then I would expect to see perhaps more uniform statistics.

Maybe the labels in Pisces encode numbers. If that’s the case, then by Benford’s Law, ‘ot-‘ would probably encode the numeral 1, since that appears in the first position 16 out of 30 times, and ‘ok-‘ could be 2, appearing 8 times.

Anyway, that’s as far as I’ve gotten. 

Machine Learning: Sparse RBMs

In the previous article on Restricted Boltzmann Machines, I did a variety of experiments on a simple data set. The results for a single layer were not very meaningful, and a second layer did not seem to add anything interesting.

In this article, I'll work with adding sparsity to the RBM algorithm. The idea is that without somehow restricting the number of output neurons that fire, any random representation will work to recover the inputs, even if that representation has no organizational power. That is, the representation learned will likely not be conducive to learning higher-level representations. Sparsity adds the constraint that we want only a fraction of the output neurons to fire.

The way to do this is by driving the bias of an output neuron more negative if it fires too often over the training set. Or, if doesn't fire enough, increase the bias. Octave code here. The specific function I added is lateral_inhibition.

I used the same data set based on horizontal and vertical lines, 5000 patterns. I settled on 67 output neurons, 500 epochs, with momentum, changing halfway through. I decided that a fraction of 0.05 would be interesting, meaning that on average I would want 67 x 0.05 = 3.35 output neurons activated over all 5000 patterns. In order to compensate for the tendency for biases to be very negative due to the sparsity constraint, I set the penalty for weight magnitudes to zero so that weights can become stronger to overcome the effect of the bias.

The change in bias is controlled by a sparsity parameter. I ran the experiment with various sparsity parameters from 0 (no sparsity) to 100, and here are the costs and activations:

LibreOfficeScreenSnapz003

LibreOfficeScreenSnapz004

The magnitude of the sparsity parameter doesn't seem to have much effect. Although you can't tell from the graph, there is in fact a small downward trend in the average active outputs. Right around 10, the average active outputs reaches the desired 3.35, where it stays up to about 80, and then it starts dropping again. So 10 seems like a good setting for this parameter.

Here are the patterns that each neuron responds to, with differing sparsity parameters:

 

Sparsity 0:

OctaveScreenSnapz008

Sparsity 1:

OctaveScreenSnapz009

Sparsity 5:

OctaveScreenSnapz010

Sparsity 10:

OctaveScreenSnapz011

Sparsity 50:

OctaveScreenSnapz012

Sparsity 100:

OctaveScreenSnapz013

Sparsity 200:

OctaveScreenSnapz014

With no sparsity, we get the expected near-random plaid patterns, and nearly all neurons have something to say about any given pattern. With even a little sparsity, however, the patterns do clean themselves up, although not by much, and by sparsity 200, the network learns nothing at all.

One possibility that the patterns really don't look that sparse is that we wanted the average neurons activated over the entire data set to be 5%. But how much of the data set actually contains a non-empty image? In fact, about 49% of the data set is empty.

What if we require that no data instance be empty? This time a sparsity of 0.05 ends up with a relatively terrible log J of -1.8, compared to the previous result of about -2.4. However, increasing the sparsity to 0.07 gives us a log J of -2.6, which is better than before. This is also expected, since more neurons will be able to represent patterns more closely. And yet, we get better representation anyway:

Sparsity 10:

OctaveScreenSnapz015

Sparsity 20:

OctaveScreenSnapz016

Sparsity 50:

OctaveScreenSnapz017

Sparsity 100 has very poor results.

The visualization is a bit misleading, because although there are pixels that are other than full white, those pixels don't imply that the neuron will be activated with high probability for those other pixels. The maximum weight turns out to be 11.3, with the minimum being -4.7. The visualization routine clips the values of the weights to [-1,+1] meaning that anything -1 or lower is black, while anything +1 or higher is white. However, by using visualize(max(W+c', -0.5)), we can take into account some of the threshold represented by the (reverse) bias from output to input. We also clip at -0.5 so that we can at least see the outlines of each neuron.

So here is another run with sparsity 50:

OctaveScreenSnapz018

We can see that, in fact, each neuron does respond to a different line, and that just about 18 lines are represented, as expected.

Reverse engineered part

After fiddling around with the part from the previous article, I think I might have a reverse engineered technical diagram. I still don't know enough about early 20th century mechanical design techniques to know if this is what they would have done, but it should be enough to at least remanufacture this part.

I also realized that I haven't actually described the part! There are two registers on the typical Monroe calculator, an upper register which indicates operation count (useful for multiplication and division) and a lower register which indicates total. There's a crank which, when turned one way, zeroes out the upper register, and when turned the other way, zeroes out the lower register. The part that I reverse engineered is shown in the original 1920 US patent 1,396,612 by Nelson White, "Zero setting mechanism" in Figure 5. In the patent, the part, 32, is described as follows:

 

The shaft 60 is normally locked or held against rotation by a rigid arm 32, pivoted upon the shaft 84, and at its free end engaging a peripheral notch 33, of a plate or disk 34, secured to the gear 12...

 

So the next step might be to make an OpenSCAD file for the part, and put it on Thingiverse so that anyone can recreate the part. It probably can't be 3D-printed at this point, since it really needs to be a metal part. Even Shapeways, which can 3D print metal parts from stainless steel combined with bronze, can only achieve a 1mm detail, and this thing is much more detailed than that.

Full-sized files in various formats: AI | PDF | SVG | PNG

UPDATE: See the thing on Thingiverse.

Carriage pawl reveng