Line and paragraph as structural unit (Noise or data ?)

Date: Thu, 22 May 2003 08:30:33 +1000
From: Jacques Guy <jguy@alphalink.com.au>
Subject: VMs: Line and paragraph as structural unit (Noise or data ?)

>Which brings us back to
>the phenomenon of
>the line and paragraph as a structural unit. It was Currier who noticed
>this, but neither he nor
>anyone since has explained why this should be so.

I cannot explain why this should be so, but I can explain how
this could be so (I actually might have some years ago. The
disheartening thing about the VMs is that, whatever new ideas
I have come up with lately, I have found already in the archives
long ago, often suggested by myself: I had just forgotten).

The paragraph as a structural unit. It it reasonable to think
that a paragraph contains at least one whole sentence, and
therefore ends in a whole sentence. Many languages make use of
sentence-final particles, others put their verbs at the end.
In the first group Chinese, especially Classical Chinese;
Japanese, Korean. In the second group Japanese, Korean, Burmese,
Hindi, Malayalam, and many more I do not know about.

As for paragraph beginnings, I guess that the structural alluded
to is the regular presence of a gallows. The simplest explanation
provided here was that it is ornamental, as the first letter of
chapters in medieval manuscripts is usually ornamental.

The line as a structural unit. This again is in the archives.
We may imagine that the scribe is "thinking aloud" (perhaps
even saying aloud) what he is writing. Arriving at the end of
a line (a physical line, on paper), there is a tendency to
pause (try it). If the language used has external sandhi, this
pause prevents the end of the word at the end of the line from
merging with the next word. The result is that the letters at
the ends of lines (and the beginnings of lines) have a different
frequency distribution than elsewhere in the lines.
Many languages have external sandhi to varying degrees of
complexity. Sanskrit is one, at the complex end of the
spectrum. Korean is another, a bit less complex. Many Chinese
dialects have it, about as complex as Korean. French has
it too, Modern French not much, but French of the 1800's and
more so French of the 16th century, in the form of "liaisons".


From: "Philip Neal" <philipneal_vms@hotmail.com>
Subject: Re: VMs: Line and paragraph as structural unit (Noise or data ?)
Date: Sat, 24 May 2003 18:07:05 +0000

We have indeed discussed this before, to my mind inconclusively.
Whenever any one feature of Voynichese is mentioned, somebody is sure
to point to a type of natural language which displays a similar feature. The
problem is that the several explanations given never
amount to a consistent overall story. Thus:

Voynichese has low entropy.
"It must have a restricted set of phonemes like Hawaiian."

Many words have the same initial sequence and a different ending.
"Voynichese must have inflectional morphology like Russian."

Many words have the same final sequence and a different initial.
"Voynichese must have initial mutations like Welsh."

Many words have the same initial and final sequence but are different
"Voynichese must have vocalic ablaut like Arabic."

The EVA vowels <o> <e> <ee> <a> <y> commonly occur in that order in a word.
"Voynichese must have vowel harmony like Finnish."

Certain characters are restricted to the final position in a line.
"Voynichese must have external sandhi like Sanskrit."

The same word frequently occurs twice and thrice in succession.
"Voynichese must have repeated plurals like Malay."

Certain words seldom occur initially or finally in a line.
"Voynichese must have a strict word order like Japanese."

Short words are more common towards the end of the line.
"Voynichese must have sentence final particles like Chinese."

Certain combinations of characters are rare.
"Voynichese must have positional restrictions on phonemic contrast like

It has always rather surprised me that Jacques Guy thinks that
Voynichese is a natural language, but he knows far more languages
and far more about linguistics than I do. Can he give even tentative
answers to questions like these:

Can Voynichese be identified as an SVO, SOV or VSO language?

Is Voynichese isolating, inflectional or agglutinative? Can a single
Voynichese word represent a sequence of two morphemes? Can a sequence
of two Voynichese words represent a single morpheme?

Is it possible to identify syntactic categories? E.g. do the one-word
star labels behave like members of the same substitution class when
they occur in continuous text?

How many phonemes does the Voynichese language possess?

Do the EVA characters <a> <e> <o> <y> in fact represent vowels? If so,
are they the only Voynichese vowels? Does Voynichese display vowel

Do characters with a similar appearance represent phonemes with a
common feature? E.g. would you expect the gallows characters to be
various kinds of labial, various kinds of fricative or something like

Is the Voynichese orthography systematically defective in the manner
of the Semitic scripts?

Does the Voynichese script involve the systematic use of allographs
like Latin capital letters?

I have never seen a plausible story of Voynichese as a natural language
and until I do, I prefer to think that the MS represents an encipherment of
a well known language.

From: Jacques Guy <jguy@alphalink.com.au>
Subject: Re: VMs: Line and paragraph as structural unit (Noise or data ?)

>It has always rather surprised me that Jacques Guy thinks that
>Voynichese is a natural language

You wouldn't believe what some natural languages do, and I know
only about a pitifully small sample.

>Can Voynichese be identified as an SVO, SOV or VSO language?

I do not know how to answer this question. Rather: I cannot
see any evidence for or against. There is a fourth possibility:
Voynichese is a free word-order language, like Latin, like
Kupapunyu (Australia), like Laghu (Solomon Islands), like Finnish
(as I was told by a Finnish correspondant).

>Is Voynichese isolating, inflectional or agglutinative?

Some Chinese dialects, which are isolating, have extensive
external sandhi. Sanskrit, which is inflectional, has internal
and external sandhi (the term, at any rate, is a Sanskrit word.
It was coined when philologists discovered Sanskrit). Korean,
which is agglutinative, also has extensive internal and external
sandhi. I suspect that Voynichese is also affected by sandhi,
external sandhi at least (I won't go into the reasons here,
they're in the archives). The "Chinese theory", which I
originally did not believe in, but which I could not reject,
in the light of Jorge Stolfi's statistical evidence, makes me
think that Voynichese is perhaps an isolating language.

> Can a single
>Voynichese word represent a sequence of two morphemes? Can a sequence
>of two Voynichese words represent a single morpheme?

I hold that the spaces between words are artefacts of the
shapes of the letters. Therefore, we do not know where word
breaks are--except, of course, at the beginning and end of
paragraphs. Labels? Perhaps the labels are not words, but
.. er... "numbers", as we would label pictures A, B, C, D,
and so on. The Chinese use a set of special characters for this
purpose, each with its own pronunciation, so each one syllable.
The Voynich labels could be such a system.

>Is it possible to identify syntactic categories? E.g. do the one-word
>star labels behave like members of the same substitution class when
>they occur in continuous text?

Labels. See above.

>How many phonemes does the Voynichese language possess?

Honestly, I have not idea. Consider: in German "sch" is a
single phoneme, likewise "dd" and "ngh" in Welsh, and "ch" and
"c'h" in Breton.

>Do the EVA characters <a> <e> <o> <y> in fact represent vowels?

I am pretty sure that they do, for two reasons:

1. The application of Sukhotin's vowel algorithm suggests so.
2. Their shapes.

I also believe that <i> = <e> and that <ee> = <a>

If you rummage through the archives, you will find very old
posts from me where I opine that <y> is an unstressed,
undifferentiated vowel, a schwa in other words.

And rummaging through the archives again, you'll find what
I wrote about the likely origin of this alphabet: the
Beneventan script. EVA <ee>, which I think is the vowel
"a", is the spit and image of "a" in Beneventan. Frogguy
<ct> (can't remember EVA), which Sukhotin's algorithm
identifies as a consonant, is the spit and image of
Beneventan "t".

>If so,
>are they the only Voynichese vowels?

I have argued that <ol> was a single vowel, "ou", just like
in Modern Greek and in Armenian (I did not know any Armenian
back then, so I did not mention Armenian).

>Does Voynichese display vowel

I cannot tell at all.

>Do characters with a similar appearance represent phonemes with a
>common feature? E.g. would you expect the gallows characters to be
>various kinds of labial, various kinds of fricative or something like


>Is the Voynichese orthography systematically defective in the manner
>of the Semitic scripts?

You mean Arabic, Hebrew, Phenician, don't you? Certainly not. An obvious
reason: its entropy would be much higher, and the distances between
successive vowels as given by Sukhotin's algorithm would show a much
wider spread.

>Does the Voynichese script involve the systematic use of allographs
>like Latin capital letters?

I think so. In fact, I am sure. Didn't I write above that EVA <i> = <e> ?
I stand by that.
VMs: New / old theory

Subject: VMs: New / old theory
From: "Petr Kazil"
Date: Sat, 19 Oct 2002 11:14:11 +0200

Remember the article about the VMS in the Dutch teenage-science magazine
"Kijk"? (May 2002)
They've posted a letter from a reader who has his own theory. I don't think
it's a valid theory, but it's funny anyway. Rough translation follows:

Subject: VMs: New / old theory
•From: "Petr Kazil"

My compliments for the article on the VMS. I was busy with it for some time,
but if you look at the pages differently you see symbols that look like our
current alphabet. They have a characteristic thick beginning followed by a
thin curve and then a thick ending. Like if you're doing calligraphy! The
letter combinaton e-s-o-g is found remarkably often in the text sometimes
preceded or followed by a decorative symbol. When we had typing lessons in
our school we had to practice by repeating the same character sequences over
and over again. Could this be a writing exercise-book, used by several
different persons? And freely decorated by the some teenagers with
over-active hormones? Or will I now get the 2002 Nobel prize for Nonsense? -
Gerard Vrakking, Raalte
Paradigms Regained

Subject: VMs: Re: Paradigms Regained
From: Rene Zandbergen
Date: Tue, 15 Oct 2002 03:06:28 -0700 (PDT)

Gabriel and Dennis wrote:

D:> Which brings us back to the theme I like to harp
D:> on. Could the Voynichese 'words' be syllables of
D:> a European language like French, Italian,
D:> German, Croatian, etc.?

G:> To my taste, there are too many different words,
G:> unless some of the words represent more than one
G:> syllable.

If the VMs words are syllables from a polysyllabic
language like the ones suggested by Dennis, then
we are faced with two problems. I tend to agree
with Gabriel that we should see fewer different
words in the MS.
Monosyllabic languages have invented tones
just to avoid the 'shortage' of words that would
otherwise arise.
The second problem is: the labels in the MS are
like all other words in the MS and it wouldn't
make much sense that these are only syllables,
i.e. parts of words, not whole words.

So if the VMs words are syllables, they should
belong to a mono-syllabic language.

Also that is not without problems. There are quite
a number of VMs words which seem really too long to
be mono-syllabic, but then again, mono-syllabic
languages can (and do) have loan words which are
Code dictionaries...?

From: Nick Pelling
•Date: Wed, 09 Oct 2002 13:04:19 +0100

From my research, I believe that the VMS was written in Northern Italy (influenced by both Milanese and Florentine cultures) around 1460, and *without* the aid of complex cryptography.

Given this, here's my current hypothesis about the structure of its code/cipher. I predict:
(1) It's essentially a "dressed up" Florentine number code
(2) The numbers are expressed in Roman numerals
(3) Those numerals are hidden using a mixture of steganography and stenography
(4) Gallows characters are based loosely on the idea of the Cistercian number cipher
(5) Non-dictionary words are typically anagrammed
(6) words express simple quantities
(7) Any extra letters required are simply thrown into the mix, perhaps in a verbose way

It may well be that the dictionary itself is simply encoded (perhaps in some anagrammatic or every-other-letter form) in the final section at the back. This would seem to be the simplest explanation.

Plainly, number codes can't be decoded using cipher cryptology: nor can they be decoded if you can't even read the numbers. :-) I believe that this is the reason why this hasn't been cracked.

However: while the idea of a "dressed up" number code has often been proposed on-list, are there any Italian number code dictionaries from about 1400-1500 still in existence that we could compare it against?

I'd be interested to see if they share any structural elements... for example, common words having low index values (for quick writing, similar to Morse code), etc. Or if there was a perceived upper limit to the size of those dictionaries - 50 words? 100 words? 200 words?
Michiton oladabas

From: Jorge Stolfi
Date: Sun, 15 Sep 2002 14:27:00 -0300 (EST)

Indeed I doubt it myself... For one thing, there are those drawings on
the upper left corner, with "standard VMS" style and subject.

Also the Voynichese glyphs in look just like those
in the main text, written with same hand and same confidence.
On the other hand the Roman letters look rather clumsy.

Rene, I believe it was you who reported that the ink of f116v looks
similar to that of the main body. This detail may be significant
because there are hints that the VMS ink is not the standard
iron-gall formula.

It is also hard to imagine why the two words , and only
those, were left "undecoded" among the rest. If the decipherer was
trying out an incomplete correspondence table, we should see a more
uniform mixture of Roman and Voynichese letters, shouldn't we?

Finally, the large "M" at the end of line 2 looks similar to the "M"s
in the zodiacal diagrams ("May" and "March"), and there is a general
resemblance between other letters too. So it seems quite possible that
they were done by the same person. Now the spellings of the month
names are quite peculiar, and almost surely they were added before the
VMS got to Rome.

Thus I would rather believe that both the month names and f116v were
written by the same person, who could read and write Voynichese
fluently, but had very limited command of Roman script and of the
(apparently European) language in which he had to write those notes.
Number encoding as central to the code...?

From: "GC"
•Date: Fri, 12 Jul 2002 23:49:21 -0500

Nick wrote:
> ATM, one of the things I'm trying to determine is: what
> would the simplest
> possible solution to the Voynich look like? (As Occam's
> Razor would point
> to that being the most likely.)
> Currently, my best candidate is: a number code (where
> the numbers are
> steganographically hidden) plus a stripped-down
> supporting alphabet. That's
> where I'm working my way up from. :-)
> Justification: if (like me) you suspect that both EVA
> and EVA
> code for Roman "III", why on earth would such a tiny
> core cipherbet include
> *two* different ways of coding numbers... unless one
> was for a number code
> and the other for actual numbers?
> Comments?

Nick, I would like to make comment on the two different ways of
encoding, without drawing any conclusions as to the underlying
meaning ;-)

A very abbreviated version of my theory of glyph construction is
posted, and as I re-transcribe the manuscript I'm building the
detailed version with imaged examples of my theory. Since my
theory involves groups of four glyphs, this (fortunately or
unfortunately, depending on your pet theory) falls right in with
the theory of numeric construction. A roman numeral 4 could be
written with IV or IIII, depending on your taste and the time
period, but every time you hit a multiple of 5, another numeral is
used besides 'I'.

(In this and future posts, I'll be using the convention of 'x' for
any other designated transcription than EVA, and Gabriel's
notation for EVA.) In the VMS, it is my opinion that the majority
of the character sets are built around two strokes, the 'c' stroke
and the EVA stroke. In the 'c' set, we have c, cc, ccc, and
cccc. In the set we have , , , and . One
other convention is in force, and that is the "tail" at the end of
words. The is in my estimation an 'm' with a tail at the
end of a word, as I would write it in English, and in the middle
of a word I would not add a tail to this glyph. This makes the
equivalent to an in the middle of a word. This glyph
as four distinct "tails".

The 'c', 'cc', 'ccc', and 'cccc' glyphs also have tails at ends of
words many times, and I have identified three tails in my current
transcription. I'm positive that by the time I reach 25% of the
manuscript, I'll encounter a page that relies heavily on a fourth
tail for this glyph as well. Meanwhile, the glyph has four
distinct forms, and interestingly enough, the few times this
glyph-set stands as a lone character, it most often has a "tail"
in the form of the end turned into an 'o' or a '9'. The same
applies to the "gallows/" combinations.

It occurs to me that these four units can form the basis of
several types of symbolic numbering systems, since their true
meaning is reliant on the less conspicuous "multiple of 5"
character. There is even the possibility that the two forms of
"notation" refer to numbers taken from two different pages of a
book, homophonic substitution incorporating more than one document
or page. The possibilities in this arena are endless.

We two are storing our pizza money in different jars, obviously,
but I do see the attraction of your approach. I'm putting my
pepperoni money in the "position sensitive homophonic
substitution" jar, but we're obviously seeing the same patterns
from different angles and calculating the same numbers. KUDO's!
VMS numbering systems hypotheses

Subject: VMs: Re: VMS numbering systems hypotheses...
From: "Philip Neal"
•From: "Philip Neal"

The table of numerals from 1 to 100 is a plausible suggestion. The
ordering across the table is intuitive: the downwards ordering less
so (the arguments for identifying one row as 20, another as 80 are
not tremendously strong). Why should 30 and 40 be the very rare
combinations pe- and fe- while 50 is the very common She- ?

The table explains many of the frequent combinations which make
the Voynich words so repetitious, but not all of them. The transcription
of 78r simply ignores 'ol' and 'dy' and this is not satisfactory.

I am not attracted by the idea of dain, daiin etc as ounces - these
words are extremely common and I defy you to produce a text in which
'ounce' or 'oz' has a similar frequency, however broad the meaning of
the word used to be.

You suggest that 'qo' might mark out numerals used in a code book, and
this would explain why the combination is so rare in the star labels
and the marginalia. Presumably, then, the other numerals are supposed
to be used like letters in a homophonic substitution or similar scheme.
These are the lines I am working on myself. I have found that there is
no shortage of ways of converting Voynich text into numerals, but that
any given scheme turns out to be impossible to convert into a plausible
language (you always find yourself with seven consecutive consonants or
a common word like 'and' repeated three times). It is at this stage of
testing a hypothesis that the lack of long repeated sequences and the
internal structure of the line become important problems.

All in all, this strikes me as good work. It is not the solution but it
is the kind of thing I would expect a solution to look like, and it may
be an important halfway stage. I will certainly give some further study
to it.
VMS numbering systems hypotheses

From: Dana Scott
•Date: Sun, 09 Jun 2002 08:11:44 -0700

How would the 898989 sequence in the middle/right of line six in f14v and the
89890898 sequence in the seventh line be interpreted? I see a possible match to
the triplicates seen in the botanical drawing. And what are all those dots in
the first gallow? May match to the drawing as well?

Re: Red stars, yellow stars

From: Luis Vélez
Date: Fri, 10 May 2002 09:26:31 -0400

I remembered Nick saying in a private email exchange:
> I'd really love a proper pigment/vellum/pollen/binding analysis to be done
> on the VMS - I think it's incredible that none has been done to date. The
> one document with the least hard evidence! At the very least, the sparkly
> (flecked) blue pigment would have its own story to tell, I'm sure. :-)

So I turned to Professor DeLaney from Truman (as her work 'From the
Apothecary's Shelf to the Painter's Palette: Pigments in Renaissance
Florence'), and this is what she said about the blue pigment on f67r:

> As I'm sure you can understand, I'm hesitant to say anything definitive
> without seeing the pigment and page in the original. I also should add
> that I'm not a mss. specialist. Having said that, to my eye it could well
> be a pigment made from the mineral azurite, which would have been mined in
> Germany, as well as other places, during the early modern period. However,
> without seeing the page in the original, I cannot say for certain.
> You might look at the excellent series put out by the National Gallery and
> Yale University Press, entitled "Artists' Pigments: a handbook of their
> history and characteristics." It's now in 3 volumes, and I believe volume
> 1 has an extensive entry on azurite. It should list some characteristics
> you might use when looking at the page itself, as well as countries of
> origin, etc. Azurite certainly would have been widely available during the
> early modern period (however, that also means that finding the origin of
> the pigment would not necessarily help you to determine where it was
> produIt is very hard, and has to be smashed and then ground patiently with
mortar and pestle until it slowly and dustily turns to powder. ced).

Then, azurite it is?

I checked "The identification of blue pigments in early Sienese
paintings by color infrared photography" by Cathleen Honiger...
and some additional material - in the end, this is what I gathered, in a

*Recipes for making artificial blue pigments are found in literature dating
from the 3rd century AD that managed to survive five hundred years of Dark
Ages to reemerge between the VIII-IX centuries in two Latin manuscripts
containing recipes for the preparation of blue pigments from both copper and

* Azurite was an important pigment in Europe from the 15th to 17th
centuries, but then vanished when Hungary, the primary source of the natural
pigment, was conquered by the Turks.

About the other candidates for blue pigments:

*The oldest synthetic pigment is known as Egyptian blue frit and was
produced by firing in a kiln a mixture of one part lime (calcium oxide) with
one part copper oxide and four parts quartz (silica). The resulting hue was
widely used in Egyptian wall paintings.

* The highest valued color pigment of the Middle Ages was ultramarine, an
intense blue pigment made from lapis lazuli collected in Southern
Afghanistan. Being the most expensive, it was always typically chosen for
portraits of the Virgin Mary, which explains the custom of showing her
always clad in blue. Ultramarine and Azurite can be hard to distinguish
without microscopy, based only on a prior knowledge of how each color should

* The iron blues are the first of the artificial pigments with a known
history and an established date of first preparation. The color was made by
the Berlin colormaker Diesbach in or around 1704. Moreover, the material is
so complex in composition and method of manufacture that there is
practically no possibility that it was synthesized independently in other
times or places. Although alchemists found the majority of colors in
minerals like malachite (green), azurite (blue), orpiment (yellow) and
realgar (orange), they extracted others from plants and even insects. One of
the Middle Ages' most distinctive pigments, kermes ? from which the word
carmine derives ? was extracted from a wingless insect, kermes vermilio,
that lives on scarlet oaks around the Mediterranean.

There was also Cerulean Blue, cobalt blue and Indigo, but these would seem
unlikely candidates at first glance.
writing systems and cultural identities were mixed in Central Europe

Petr Kazil wrote:
Among the many hypotheses this one might be testable. But this would require a big detour through secondary literature. There might exist some books on vanished European (sub-)cultures. Two were discussed extensively - the Cathars and the Bogomils. Another example is the Coptic language. However I can't think of an isolated cultural center that would produce such a mature artifact - Spanish Islamists, Greek Byzantine Monks? The problem becomes even worse if you follow the hypohesis that the VMs is not an "elite" artifact but a "popular" artifact - then it must have been a large subculture. Still I would be very interested if
someone produced a list of vanished European subcultures and their dates.

From: "Rafal T. Prinke"
Date: Sat, 18 May 2002 10:20:36 +0200

But note that on the whole it does not seem to be an original creation - rather a compilation (perhaps with some modifications) of general knowledge. I have just read an article (in Polish) about a newly discovered 17th c. manuscript of an alchemical treatise which was written partly in Latin, partly in Polish, and partly in Armeno-Kiptchak language using the Armenian alphabet. The author was a Pole of Armenian descent living in Lvov, which had a sizable Armenian community. It has no resemblance to the VMS - but shows how languages, writing systems and cultural identities were mixed in Central Europe.
