1997/09/18, posted by Rene Zandbergen

The small castle of f86r6 is probably just a fantasy castle, but the components are very much of the style of Piemonte and surroundings, in N.Italy. There are more buildings drawn on this page, but also a volcano which has nothing to do with N.Italy. Now Southern Italy has at least three famous ones....


1999/9/9, posted by Rene Zandbergen

Have you seen pictures of Fenis castle in the Aosta valley, N.Italy? If I recall correctly, this has at least one round tower. The nearby one of St.Pierre even has a church tower which looks like the one in the Voynich MS.

If anything, this castle confirms the N. Italian provenance of the Voynich MS. It does not at all have to represent a real existing castle. These were sometimes used to symbolise 'Earth'.
Date: Fri, 12 Jul 1996 14:29:22 +0100
From: Michael Roe

Besides the sunflower, the VMS contains many images that also occur in other alcehmenical works:

* On folio 79v the dew (or divine radiance) falls down onto the recumbent female figure. (My guess is that is also an allegory for the chemical process of fractional distillation)

* f68r1 and f68r show the Sun over the Moon and then the Moon over the Sun (Solar dominant over Lunar and Lunar dominant over Solar)

* f66r is the first ``alchemical'' folio after the simple ``herbal'' material at the beginning of the MS. This folio shows a recumbent female figure.
By itself, this could mean almost anything. But the following folios have alchemical images, so it is tempting to also interpret this as an alchemical image, the female/lunar principle about to be transformed by the process which is described in the following folios.

* A four-fold division of the cosmos is a recurring theme in the VMS, and also in alchemical texts. See for example f67v2 and f85-86r2.

* Plants with faces occur in alchemical MSS (e.g. the Rosarium Philosphorum refered to by Rene), and also in the VMS.

* f82v shows what appears to be a rainbow (It's a pity that I havn't got a color reproduction of this ---- it would be useful to check the colours!)

Then there's the dragon (f25v), the serpents (f49r) ...

There are probably many others that I've left out.

The image that should be there (if the alchemical interpretation is correct), but which seems to be missing is the union of the male and female principle.
So maybe it all means something completly different.

Date: Wed, 18 Dec 1996 16:46:33 -0500 (EST)
From: Karl Kluge

My previously announced silly idea: in the centre circle of the there are some objects (reportedly

*My* previously announced idea is that this (mega-foldout on ff.85-86) represents the old alchemic notion of how the four elements earth, air, fire, and water are created from the qualities wet, dry, hot, and cold. The circle in the upper right with the T-O map and little castle would be earth. The structure is then something like this:

fire O -- O -- O earth
|\ | /|
| \ | / |
| \ | / |
hot O--- O -- O cold
| / | \ |
| / | \ |
|/ | \|
air O -- O -- O water
1997/4/10, posted by Karl Kluge


label-initial/final and running text line-initial/final stats differ: using a label corpus from f68r1, f70v2, f72r2, f88r, and f100r

4O is line initial 17.9% in the mss, < 1% in the labels
AM is line final 11.5% in the mss, 3.6% in the labels
OF is line initial 2.1% in the mss, 22.2% in the labels
OP is line initial 3.5% in the mss, 33.9% in the labelsinitial 3.5% in the mss, 33.9% in the labels

1997/4/28, posted by Denis Mardle

I have also produced some interesting counts on positions of labels in words. OE89 for instance ( a label on f99v) is always the end of a word or more often a word on its own when, of 16 occurrences 10 are at the end of a line and one at the end of para 2 on f99v whereas text ( less plant labels ) ends the page with ZOE89 on f82v; SOE89 similarly on f89v1 and OEFCCOE89 on f99v.
Compare the Entropies of Known Repetitive and Non-repetitive texts

1997/3/23, posted by Dennis Stallings

One of the most striking characteristics of the VMs is the text's repetitiousness. From time to time, some have suggested that it is simply a very repetitious text.

decided to take samples of known repetitious texts (food recipes, religious texts, catalogs) and compare their second-order entropies with those of known texts that should be less repetitious (prose fiction, essays).

I looked at the following measures:

h0: zero-order entropy (log2 of the number of different
h1: first-order entropy
h2: second-order entropy
h1 - h2: difference between first- and second order entropies
% rel h2: (h1 - h2) as a percentage of h1, that is

% rel h2 = (h1 - h2) / h1 * 100

*Discussion of Results*

Some runs were on several portions of a large text, because of MONKEY's size limitation - 1 Kings, both KJV and Vulgate, and F. Bacon's Essays. The ranges of rel % h2 were:

23.7 - 22.7 = 1.0, King James 1 Kings
17.7 - 17.3 = 0.4, Latin Vulgae 1 Kings
20.6 - 20.4 = 0.2, Francis Bacon Essays.

These figures give some idea of how reproducible MONKEY results are on a text sample of 32,000 characters.

In the Jacobean English texts, the difference in percent points of rel % h2 between the highest and lowest values was 24.5 - 20.5 = 4.0. The same difference for modern English texts was 23.8 - 20.1 = 3.7. (The presumably repetitious cajun.txt was slightly less so (20.1) than the short story crane.txt (20.3). The numbers for Jacobean and modern English do not seem significantly different. The rel % h2's of the Latin Vulgate Bible texts were not significantly different from that of the presumably less repetitious Boethius text boecons.lat. The difference between non-repetitious English texts and Latin texts is about 20.4 - 17.5 = 2.9. The really significant differences are between Voynich and natural language texts. Even taking the most repetitious English text (KJV Joshua, 24.5) and the least repetitious Voynich text (Herbal-A in Currier, 34.7) gives a difference of 10.2, much greater than the range of repetitiousness between various English texts (3.7 or 4.0). Equally significant is the difference between the various Voynich transcription alphabets. The range from Frogguy to Currier is:

Voynich A: 43.9 - 34.7 = 9.2
Voynich B: 42.1 - 34.9 = 7.2

Once again, this is greater than the range of repetitiousness seen in English texts. EVA is more repetitious than Frogguy, which I would not have expected. Both Frogguy and EVA use combinations of characters to represent single Currier letters. The difference between Latin and English may be due to the same sort of thing: Latin represents much fewer phonemes by multiple characters than English does.

1997/11/7, posted by Rene Zandbergen

It has been pointed out in the recent past that the value of h2 by itself is a very incomplete 'characteristic' of the language in question, unlike, say, the values of mean and standard deviation for a normal distribution, which essentially tell you all you need. Now if one were to compute (sum(x^^3)/N)^^(1/3) for a normal distribution, one would not obtain a value that 'completely describes' it. However, if one finds a value which does not fit together with mean and sigma, one knows that 'something is wrong' and one should not discard this information, even if it's incomplete.

So, whereas I agree that entropy is not a very good measure of the properties of a particular text.
Jim Reedsの仮説

"I think:
(1) the VMS was written in Europe by a literate European, and
(2) if it has a plain text, it is in a widely used European language such as Latin or Italian. Why by a "literate European"? Because the author clearly knows the ordinary Latin alphabet, a distorted and elaborated version of which forms the VMS character set. If he usually only wrote in Arabic or Hebrew, say, his letters would not look the way they do. I suppose
(3) the author must had had some contact with cryptography, which in 1470 (to make up a date) meant he had some contact with some potentate's secretary.

(4) the book was not written by a non-European,
(5) was not written in a non-European language, and
(6), on the grounds of anachronism, was not written in a deliberately invented artificial language (but I don't mean to rule out a kind of spontaneously generated glossolalic sort of writing, or "outsider" art" writing, etc)."

Relative frequencies of initial letters of lines

Relative frequencies of initial letters of paragraphs


Language A,

Language B,


> Likewise the strange thing that ends line 6 of folio 24v,
> which I would write in advanced Frogguy:

> s
>c-lj a 2 A-2

I would say: a correction. The writer forgot the s and inserted it later.
1997/2/1, posted by Dennis Stallings

If you analyze Japanese written in Latin characters (romaji), you get a low entropy. This is because of the severe phonotactic constraints of Japanese. It's close to true that a Japanese syllable may begin with zero or one consonant, have one vowel, and end with -n or nothing.

However, Japanese can also be written in hiragana or katakana (syllabic characters), due to the very fact of the severe phonotactic constraints. You have about 26 Latin characters, plus perhaps the long vowels, giving 25-30 characters for romaji. You have 50-60 characters in each kana set (although as I recall the kana don't indicate vowel length).

With romaji I'm sure even the normalized second-order entropy would be low. With kana I'm sure it would be higher. How much higher depends on word frequencies in Japanese and any rules Japanese might have for combining syllables.

1999/7/23, posted by Dennis Stallings

"Understanding the Second-Order Entropies of Voynich Text", May 11, 1998: http://www2.micro-net.com/~ixohoxi/voy/mbpaper.htm

... I struggled with the entropy concept. My reasoning went: Voynichese ought to "mean the same thing" in Frogguy as in Currier, despite the fact that there's a big difference in their entropy profiles. Likewise, Japanese and Hawaiian "say the same thing" whether they are written in phonemic (romaji in Japanese) or syllabic notation (hiragana or katakana in Japanese).

I've finally realize that my reasoning was false. You *can* transmit more information per character by using a larger character set, as with 71 characters for Japanese kana versus 22 characters for romaji. The question is: how well is a character set of a given size being used?

I still think that the comparison of h1 and h2, which I used in my paper, is useful. I also think that one could define the size of the character set by taking the characters that constitute 0.5% or 1.0% and subtracting the number of such characters from the total number of characters. From that number, one could calculate an h0 that would be meaningful.

1999/7/23, posted by Gabriel Landini

There are further problems in Japanese. One thing is "reading" Japanese, and the other is "listening" to Japanese. For example the word "shu" (2 characters in hira/katakana) has many different Kanjis, all with different meanings. So while reading gives you the exact character, the phonetic alphabets do not (and of course the spoken language doesn't either). This is the "context sensitive" aspect of Japanese: how on earth do you know which of the tens of "shu" you are referring to? Answer: because of what comes before and after it. Of course that this does not concern us because we do not know what voynichese should sound like.

1999/8/16, posted by Karl Kluge

My understanding is that cryptographers don't use entropy because it doesn't have clean distributional results unlike the various standard statistics such as Index of Coincidence. Modulo that, while you may not be able to use entropy to determine what the letters, words, or language are, that doesn't mean that given a specific cryptographic hypothesis regarding the alphabet and cipher system that entropy can't serve as a test of such a hypothesis.
1996/12/10, posted by Bob Richmond

The language appears to have a small number of phonemes. The languages of the Malayo-Polynesian family, as many observers have suggested, are the most likely possibility. The many languages of the Philippines make that area I think a very likely candidate, since the Spanish began extensive colonization there around 1565, with many Roman Catholic priests in isolated outposts.

Imagine then a young priest posted to the Philippines in the late 16th century. He reduced a local language to writing, as was becoming a widespread practice then - he would have known, for instance, about literary Nahuatl (since the Philippines were a province of Mexico!). Isolated, though, he went native, succumbed to the pleasures of the flesh, and kept some sort of record, using his invented alphabet and the local language. Perhaps he simply recorded his amorous doings with his wife - such records can become very repetitious.


1997/10/31, posted by Rene Zandbergen
When Malay was mentioned in two contexts, I did not realise just how many features Malay written in an Arabic script would have in common with Voynichese. The prefixes and suffixes, the short words, the full-word repetitions, the absense of repeated characters.

1997/10/31, posted by Dennis Stallings
Jacques discussed the Jawi (Arabic) script used for Malay in the Voylist archives. Jawi does represent vowels, but in a complex manner.

However, I think you would see the same thing with Malay even in Latin orthography. And I'm pretty sure Malay would be a low-entropy language. It's in the Malayo-Polynesian group, and visually it looks low-entropy.
Leo Levitov published his purported solution of the Voynich Manuscript in *Solution Of The Voynich Manuscript: A Liturgical Manual For The Endura Rite Of The Cathari Heresy, The Cult Of Isis* (Aegean Park Press, 1987). Levitov claims that Catharism was actually a survival of the Greco-Roman-Egyptian cult of Isis and that the Voynich Manuscript is a liturgical manual of this cult. He further claims that the Voynich nymphs in the tubs are undergoing a Cathar sacrament called *Endura* - group suicide by opening veins in warm water.

1996/12/29, posted by CLARY Olivier

Niel says the association between Catharism and suicide has been propagated by Catholic sources and novel writers. The main origin of this claim is that groups of Perfects prefered to throw themselves into the fire singing psalms than make the smallest act against the wishes of the consolamentum, like pronouncing an oath or eating meat, and this could be viewed as a suicide. Also, Inquisition registers do mention endura ordered to some people, mainly women, by the diacon of their community, in very late Catharism (14th century), when Cathar churches had already disappeared long ago.
Voynich mini-FAQ

December 8, 1996 by Dennis Stallings

In 1912, Wilfrid M. Voynich (a book collector) bought a medieval manuscript (235 pages) written in an unknown script and what appears to be an unknown language or a cipher from the Jesuit College at the Villa Mondragone, Frascati, in Italy (near Rome). However, despite the efforts of many well known cryptologists and scholars, the book remains unread. Since 1969, it is at Yale University, at the Beinecke Rare Book Library with catalogue number MS 408.

It is known (from a letter of J. M. Marci in 1665/6) that the manuscript was bought by Emperor Rudolph II of Bohemia (1552-1612) for 600 ducats (an exorbitant sum in those days). The manuscript somehow passed to Jacobus de Tepenecz, the director of Rudolph's botanical gardens (his signature is present in folio 1r) and it is speculated that this must have happened after 1608, when Jacobus Horcicki received his title "de Tepenecz". Thus 1608 is the earliest definite date for the Manuscript.

The Voynich Manuscript, as it has come to be known, contains many drawings of plants, but the plants have not been identified, nor have the drawings been identified with known fanciful or distorted drawings of plants from the Middle Ages. There are what look like astrological drawings. There are curious drawing of little nude women bathing in baths with convoluted plumbing; nothing else like these drawings is known. The persons and costumes look generally European. The script seems to have been developed from early Arabic numerals and medieval Latin abbreviations, but composed of these elements in a unique manner; no other examples of the script or any like it are known. Nothing else about the Manuscript is even this definite; it is a completely unique artifact.

Computer analysis of the Voynich Manuscript has only deepened the mystery. One finding has been that there are two "languages" or "dialects" of Voynichese, which are called Voynich A and Voynich B. The repetitiousness of the text is obvious to casual inspection. Entropy is a numerical measure of the randomness of text. The lower the entropy, the less random and the more repetitious it is. The entropy of samples of Voynich text is lower than that of most human languages; only some Polynesian languages are as low.
EKT Hypothesis

1996/8/6, posted by Dennis Stallings

My hypothesis is that the concealment system for the VMs is a word game, like Pig Latin. I have devised a homophonic word game that would be less detectable than Pig Latin and would account for the presence of Voynich A and B, the low variety of digraphs (the low second-order entropy of the text), and the (relative) absence of long repeated phrases.

*King Tut*

The system that interests me the most is called King Tut. One makes the following substitutions:

A - a I - i R - rur
B - bub J - jug S - sus
C - cut K - kam T - tut
D - dud L - lul U - u
E - e M - mum V - vuv
F - fuf N - num W - wuv
G - gug O - o Y - yec
H - hush P - pup Z - zuz

"The sunflower is a marvellous plant with powerful virtues that must needs be concealed from the ignorant and uninitiated."


"Tuthushe susunumfuflulowuverur isus a mumarurvuvelullulousus puplulanumtut wuvituthush pupowuverurfufulul vuvirurtutuesus tuthushatut mumusustut numeedudsus bube cutonumcutealuledud fufruromum tuthushe igugnumoruranumtut anumdud unuminumitutiatutedud."

*Extended King Tut (EKT)*

With modifications, the King Tut system can account for other properties of the Voynich text. I shall call this modified system Extended King Tut (EKT).
1996/8/6, posted by Dennis Stallings

Here's a good crackpot idea: Old Gaelic! It was written in a Latin alphabet with 5 vowels and 13 consonants. Just about what we're looking for! An accent mark was placed over vowels to indicate a long vowel. A dot was placed over a consonant to indicate that it was "aspirated". "Aspirated" does not mean the modern linguistic term but rather that the consonant is changed, often that a stop is changed to the corresponding fricative (bh = v, ch = voiceless velar fricative). In modern Gaelic an "h" is placed after the consonant rather than a dot placed over it. Suppose that one Voynich character is the "h"? Then we get a lot of consonant phonemes with maybe half the characters!

I wrote the preceding paragraph with a very broad grin on my face, for I know very little about Gaelic. However, it illustrates my idea. Suppose that one Voynich character were used, like "h" in Gaelic, mostly to modify the preceding character to represent a different phoneme. Suppose that this one character were used widely in that role, so that 8-10 different characters were modified to a different phoneme. Would our tests show that this one character was a vowel? If so, that would explain a lot of things!
1996/7/26, posted by Rene Zandbergen

I've been talking about plant drawings several times now without quoting the folios they appear on. This should make up for that:

f2v: my 'Nymphoides Peltata' (German: Sumpfrose; English: not the banana plant mentioned on the Web but a close relative. Not a water lilly either *).
f8r: Hedera (Ivy).
f11v: the 'Pineapple' mentioned at the Birmingham meeting f39r and f95r2 are the same plant (flowers) IMHO. I might even be (mis)led to believe these are crocusses (crocei?? whatever).
f96v and the botttom plant on f99r are obviously the same plant, and f17v seems similar too. Jim: I gather your wife has seen more drawing of Dracontea/Serpentaria than most people. Could this be what these drawings are? How about one of the leaves on f42r?

1996/12/6, posted by Dennis Stallings

f21v - Petersen says salvia. That's common here and it doesn't resemble this.

f22v - Looks like milkweed.

f34v - Looks like lotus flowers.

f40v - Looks like calendula, roots like radishes. Petersen says Jerusalem artichoke (helianthus tuberosus). That is a sunflower that grows here. We have pictures, and it doesn't look like that. The center is not raised, and the petals are not like the sunflower. The roots should look more knotted, less bulb-like.

f50r - Looks like lotus pod or protea.

f50v - Also protea.

f85/86 - Newbold's ovum. We say it's the Colosseum in Rome, with sewer pipes out the southeast side. ;-)

f89v1 - Left center drawing, root looks like mermaid, person.

f90v - Root looks like cat body, plant head!

f93r - Brumbaugh's "sunflower". Petersen also says coxcomb, and we think it looks more like coxcomb, Celasia argentea 'Cristata'.

f100r - 3rd row, 1st on left, leaf on soapberry tree. Brumbaugh's "pepper" could be a pepper; there are many varieties of pepper.

f100v - 1st row, 2nd from left, looks like human lung!

f102r - (Jim's p. 229) - Drawing on bottom right looks like elephant ear. The one next to the left looks like white radish.

f102r - (Jim's p. 233) - The drawing on 3rd row, 4th left, Petersen's #231, looks like parsnip, pod like okra.
1996/7/9, posted by Robert Firth

A fascinating short article in the April 1995issue of 'Discover' (a popular science magazine)tells how a group of researches into human DNAdecided to see whether the codon sequencesfollowed Zipf's Law.

Somewhat to their surprise, they did, even in theso-called "junk" DNA (the 95%+ of human DNA thatdoesn't seem to do anything).

Their conclusion: "we don't know what it says, butit's language". Sound familiar?

1996/7/9, posted by Gabriel Landini

I 've been to a talk by of the authors of that research group (S Havlin)presenting the data in Marseille in the Fractal 95 Conference and I was not very impressed.The problem is that DNA does not have "words" and so they invented the "word" as n-base subsequences.This of course does not have anything to do with Zipf's law but to the relative probability of the bases. Yes, the DNA is different in coding and non-coding parts but this has been known for ages, so this "new finding" is not that new. The same results can be achieved with the n-base entropy, and it has a much more solid basis than "Zipf's law".

Also, you can get Zipf's distributions with absolutely random sequences.So the "language" of the junk DNA is a very far fetched hypothesis. I am not saying that it is not right, only that the evidence for having a "language" is very weak. The junk DNA is there for some reason and there are more interesting hypotheses on why we have accumulated DNA which is of no use.

The reason for the Zipf's law in random sequences is a different one from the one in the texts (well, as far as I understood W. Li's paper).Zipf's laws in texts may be important when you know that you have a "real" text and want to compare "distances" between the Zipf's plots. (I think that a reference for that is a short paper in Physical Review Letters E, by S. Havlin. If anyone is interested I can look for the reference).
1996/7/1, posted by Jacques Guy

If I am to believe the comparative historical work of Larry Trask on Basque (and I believe it) Ancient Basque would have been a candidate for voynichese.

1996/7/1, posted by Dennis J. Stallings

This is interesting. A long time ago you suggested that Voynichese might be a pre-Indo-European survival in Europe, like Basque,Etruscan, or Pictish. Such a language might have survived into the Middle Ages in a small, isolated pocket. This pocket could have harbored a subculture unnoticed by history that produced the VMs. That's a logical scenario. Basque is not that small a pocket but is isolated.

If the underlying language is Basque or maybe Etruscan, we have a chance of decrypting the VMs. Of course, the underlying language could be completely extinct, leaving no trace in modern times.

1996/7/12, posted by Rene Zandbergen

I totally believe that supposedly lost cultures have been able to continue to exist, the scenario where one or a few individuals have continued a style known to them, 'while the rest of the world had a renaissance' is a more plausible explanation for the Voynich Ms. (IMHO of course).Mind you, the Basque/Aquitanian/Iberian theory should definitely be pursued. (Jacques' Salir Salirbosita Salibos has such a nice Voynichese ring to it :-)). Same Robert Firth did once point to Spain as one of the good candidate countries of origin, and the Arab connection is another strong point.

1996/7/12, posted by Dennis Stallings

Another of my pet hypotheses is that the VMs originated in Eastern Europe. That is an area that is less known to Western European scholarship and where the subculture that produced the VMs might have more easily passed unnoticed. Consider too where the VMs first appeared in history: in Rudolf II's Prague. I like D'Imperio's idea that the original components of the Voynich alphabet are early Arabic numerals and medieval Latin abbreviations. If you accept that, that points to areas using the Latin alphabet: Poland, Czech/Slovakia, Hungary. If you accept Cyrillic or Arabic as possible bases, then Russia and Ukraine are right next door for Cyrillic, and I believe that the Turks were in Hungary at the time for Arabic.

1997/11/12, posted by Jorge Stolfi

(0) The VMs is written in cypher. I will leave this hypothesis to the crypto experts to explore.

(1) The Voynich "words" are syllabes; the two classes of letters defined above are basically the vowels and consonants. there are about 10-12 significant prefixes, and about 20 significant suffixes; which offhand seems right for many languages, including English (12 vowel sounds, a couple dozen vowel clusters).

The number of consonants seems a bit to high: around 20 "simple" consonants, plus a long tail of consonant pairs.

(2) but for a tonal language like Chinese or Vietnamese. The difference is mainly that some of the letters (soft ones, presumably) would have to indicate the tones. This alternative has the merit that, in Chinese, the syllabes are indeed the natural unit of text. On the other hand, the "V" syllabes may be hard to explain (unless some of my "soft" letters are actually consonants).

(3) Voynichese is an agglutinative language like Turkish: the "hard" letters are the stem of the word, and the soft letters are modifying affixes.

(4) Voychinese is a semitic language like Arabic or Hebrew; the prefix, midfix, and suffix correspond to the three basic consonants, and attached vowels.
