How I didn't crack the Voynich Manuscript

I’ve been interested on and off in the Voynich Manuscript, mostly because it appeals to my appreciation of old occult books. Looking at the book, it’s clear there is a lot of structure in its writing. It’s clearly not, say, solely a work of art such as the Codex Seraphinianus. The history of the Voynich and the cracking attempts against it can be found all over the place. I particularly enjoyed Nick Pelling’s book, Curse of the Voynich. It might be fun, I thought, to take a look at it myself.

First step was assigning letters to each glyph. There is a standard format called EVA that has been used to transcribe the Voynich, but I found it too cumbersome. Although EVA can be transformed to any other format due to its expressiveness, that transformation would also rely on an already-completed transcription. I really wanted to start with a fresh eye.

Warning! Yes, I am aware of other researchers’ theories. The idea is that I’ll use my own theory and see where it takes me. Maybe I’ll make a bad decision. In any case, I reserve the right to modify my theories!

Using the images of the Voynich as my starting point, I looked through the first few pages and settled on a basic alphabet of glyphs, closely modeled on EVA:

Voynich glyphs

Again, this is just my initial take: 25 glyphs. I’m not taking into account other, more rare glyphs at this point. Major differences between my notation and EVA are that my ‘g’ is EVA ‘m’, my ‘v’ is EVA ’n’, my ‘q’ is EVA ‘qo’, and my ‘x’ is EVA ‘ch'. I also specifically include an ‘m' and an ’n' symbol for EVA ‘iin’ and ‘in’, respectively.

The one ‘q’ I found with an accent above it, I just decided to render as ‘Q’. Likewise, the ‘x’ with an accent above I render as ‘X’. The tabled gallows characters ‘f’, ‘k’, ‘p’, and ’t’ are represented as capital letters. This doesn’t mean that they are the same character. This is just a mapping of glyphs to ASCII.

Next, I transcribed some pages using this mapping. I decided to use ‘^’ for a character indicating the beginning of a “paragraph”, including the beginning of any obviously disconnected text areas. Similarly, ‘$’ marks the end of a “paragraph”. A period is a space between “words”. Occasionally I had to make a judgement call as to whether there was a space or not. Furthermore, lines always end in either a period or, if the line is the last in a paragraph, an end-paragraph symbol.

Finally, any character that I could not figure out or that doesn’t fit in the mapping is replaced by an asterisk.

Thus, the first paragraph on the first page is transcribed as follows:


F1r t

Again, there are many caveats: I had to make judgement calls on some glyphs and spacing. I’m also aware that Nick Pelling has a theory that how far the swoop on the v goes is significant, and clearly I’m ignoring that.

In any case, after transcribing a few pages, I became aware of certain patterns, such as -am nearly always appearing at the end of a word, and three-letter words being common. I did a character frequency analysis, and found that ‘o’ was the most common character, with ‘y’, ‘a’, and ‘x’ being about 50% of the frequency of ‘o’. Way at the bottom were 'F', 'f', and 'Q'.

Then I did another frequency analysis, this time of trigrams. Most common were ‘xol’, ‘dam’, and ‘xor’. Then I asked, how often do the various trigrams appear at the beginning of a word and at the end of a word? The top five beginning trigrams were ‘xol’, ‘dam’, ‘xor’, ‘Xol’, and ‘xod’, while the top five ending trigrams were ‘xol’, ‘xor’, ‘dam’, ‘Xol’, and ‘ody'.

Are these basic lexemes? I began to suspect that lexemes could encode letters or syllables.

I turned to page f70v2, which is the Pisces page. There were 30 labels, each next to a lady. The labels were things like ‘otolal’, ‘otalar’, ‘otalag’, ‘dolarag’… Most looked like the words were composed of three two-letter lexemes: ‘ot’, ‘ol’, ‘al’, ‘ar’, ‘dol’, ‘ag’, and so on.

So I decided to look for more lexemes by picking the most common trigram, ‘xol’, and looking for words containing it. These were: ‘xol’, ‘otxol’, ‘o*xol’, ‘ypxol’, ‘dxol’, ‘xolam’, ‘xolo’, ‘xololy’, ‘xolTog’, ‘opxol’, ‘btxol’, ‘xoldy’, ‘xolols’. That would make as the lexemes ‘ot-’, ‘yp-‘, ‘d-‘, ‘-am’, ‘-o’, ‘-oly’, ‘-Tog’, ‘op-‘, ‘bt-‘, ‘-dy’, and ‘-ols’.

Similarly, for ‘xor’, we get lexemes ‘d-‘, ‘xeop-‘, ‘k-‘, ‘ot-‘, ‘bp-‘, ‘ok-‘, ’t-‘, ‘-am’, and ‘Xk-‘. At least in the first few pages.

So certain lexemes seemed to come up often, namely ‘ot-‘, ‘ok-‘, ‘op-‘, ‘d-‘, ‘-am’, ‘-dy’.

One possibility that no doubt has already been discounted decades ago is that each lexeme corresponds to a letter. So something common like ‘xol’ could be ‘e’. Because there are so many lexemes, clearly a letter could be encoded by more than one lexeme. Or maybe each lexeme encodes a syllable, so ‘xol’ could be ‘us’. I tend to doubt the polyalphabetic hypothesis, since then I would expect to see perhaps more uniform statistics.

Maybe the labels in Pisces encode numbers. If that’s the case, then by Benford’s Law, ‘ot-‘ would probably encode the numeral 1, since that appears in the first position 16 out of 30 times, and ‘ok-‘ could be 2, appearing 8 times.

Anyway, that’s as far as I’ve gotten.