Line and paragraph as structural unit (Noise or data ?)

Date: Thu, 22 May 2003 08:30:33 +1000
From: Jacques Guy <jguy@alphalink.com.au>
Subject: VMs: Line and paragraph as structural unit (Noise or data ?)

>Which brings us back to
>the phenomenon of
>the line and paragraph as a structural unit. It was Currier who noticed
>this, but neither he nor
>anyone since has explained why this should be so.

I cannot explain why this should be so, but I can explain how
this could be so (I actually might have some years ago. The
disheartening thing about the VMs is that, whatever new ideas
I have come up with lately, I have found already in the archives
long ago, often suggested by myself: I had just forgotten).

The paragraph as a structural unit. It it reasonable to think
that a paragraph contains at least one whole sentence, and
therefore ends in a whole sentence. Many languages make use of
sentence-final particles, others put their verbs at the end.
In the first group Chinese, especially Classical Chinese;
Japanese, Korean. In the second group Japanese, Korean, Burmese,
Hindi, Malayalam, and many more I do not know about.

As for paragraph beginnings, I guess that the structural alluded
to is the regular presence of a gallows. The simplest explanation
provided here was that it is ornamental, as the first letter of
chapters in medieval manuscripts is usually ornamental.

The line as a structural unit. This again is in the archives.
We may imagine that the scribe is "thinking aloud" (perhaps
even saying aloud) what he is writing. Arriving at the end of
a line (a physical line, on paper), there is a tendency to
pause (try it). If the language used has external sandhi, this
pause prevents the end of the word at the end of the line from
merging with the next word. The result is that the letters at
the ends of lines (and the beginnings of lines) have a different
frequency distribution than elsewhere in the lines.
Many languages have external sandhi to varying degrees of
complexity. Sanskrit is one, at the complex end of the
spectrum. Korean is another, a bit less complex. Many Chinese
dialects have it, about as complex as Korean. French has
it too, Modern French not much, but French of the 1800's and
more so French of the 16th century, in the form of "liaisons".


From: "Philip Neal" <philipneal_vms@hotmail.com>
Subject: Re: VMs: Line and paragraph as structural unit (Noise or data ?)
Date: Sat, 24 May 2003 18:07:05 +0000

We have indeed discussed this before, to my mind inconclusively.
Whenever any one feature of Voynichese is mentioned, somebody is sure
to point to a type of natural language which displays a similar feature. The
problem is that the several explanations given never
amount to a consistent overall story. Thus:

Voynichese has low entropy.
"It must have a restricted set of phonemes like Hawaiian."

Many words have the same initial sequence and a different ending.
"Voynichese must have inflectional morphology like Russian."

Many words have the same final sequence and a different initial.
"Voynichese must have initial mutations like Welsh."

Many words have the same initial and final sequence but are different
"Voynichese must have vocalic ablaut like Arabic."

The EVA vowels <o> <e> <ee> <a> <y> commonly occur in that order in a word.
"Voynichese must have vowel harmony like Finnish."

Certain characters are restricted to the final position in a line.
"Voynichese must have external sandhi like Sanskrit."

The same word frequently occurs twice and thrice in succession.
"Voynichese must have repeated plurals like Malay."

Certain words seldom occur initially or finally in a line.
"Voynichese must have a strict word order like Japanese."

Short words are more common towards the end of the line.
"Voynichese must have sentence final particles like Chinese."

Certain combinations of characters are rare.
"Voynichese must have positional restrictions on phonemic contrast like

It has always rather surprised me that Jacques Guy thinks that
Voynichese is a natural language, but he knows far more languages
and far more about linguistics than I do. Can he give even tentative
answers to questions like these:

Can Voynichese be identified as an SVO, SOV or VSO language?

Is Voynichese isolating, inflectional or agglutinative? Can a single
Voynichese word represent a sequence of two morphemes? Can a sequence
of two Voynichese words represent a single morpheme?

Is it possible to identify syntactic categories? E.g. do the one-word
star labels behave like members of the same substitution class when
they occur in continuous text?

How many phonemes does the Voynichese language possess?

Do the EVA characters <a> <e> <o> <y> in fact represent vowels? If so,
are they the only Voynichese vowels? Does Voynichese display vowel

Do characters with a similar appearance represent phonemes with a
common feature? E.g. would you expect the gallows characters to be
various kinds of labial, various kinds of fricative or something like

Is the Voynichese orthography systematically defective in the manner
of the Semitic scripts?

Does the Voynichese script involve the systematic use of allographs
like Latin capital letters?

I have never seen a plausible story of Voynichese as a natural language
and until I do, I prefer to think that the MS represents an encipherment of
a well known language.

From: Jacques Guy <jguy@alphalink.com.au>
Subject: Re: VMs: Line and paragraph as structural unit (Noise or data ?)

>It has always rather surprised me that Jacques Guy thinks that
>Voynichese is a natural language

You wouldn't believe what some natural languages do, and I know
only about a pitifully small sample.

>Can Voynichese be identified as an SVO, SOV or VSO language?

I do not know how to answer this question. Rather: I cannot
see any evidence for or against. There is a fourth possibility:
Voynichese is a free word-order language, like Latin, like
Kupapunyu (Australia), like Laghu (Solomon Islands), like Finnish
(as I was told by a Finnish correspondant).

>Is Voynichese isolating, inflectional or agglutinative?

Some Chinese dialects, which are isolating, have extensive
external sandhi. Sanskrit, which is inflectional, has internal
and external sandhi (the term, at any rate, is a Sanskrit word.
It was coined when philologists discovered Sanskrit). Korean,
which is agglutinative, also has extensive internal and external
sandhi. I suspect that Voynichese is also affected by sandhi,
external sandhi at least (I won't go into the reasons here,
they're in the archives). The "Chinese theory", which I
originally did not believe in, but which I could not reject,
in the light of Jorge Stolfi's statistical evidence, makes me
think that Voynichese is perhaps an isolating language.

> Can a single
>Voynichese word represent a sequence of two morphemes? Can a sequence
>of two Voynichese words represent a single morpheme?

I hold that the spaces between words are artefacts of the
shapes of the letters. Therefore, we do not know where word
breaks are--except, of course, at the beginning and end of
paragraphs. Labels? Perhaps the labels are not words, but
.. er... "numbers", as we would label pictures A, B, C, D,
and so on. The Chinese use a set of special characters for this
purpose, each with its own pronunciation, so each one syllable.
The Voynich labels could be such a system.

>Is it possible to identify syntactic categories? E.g. do the one-word
>star labels behave like members of the same substitution class when
>they occur in continuous text?

Labels. See above.

>How many phonemes does the Voynichese language possess?

Honestly, I have not idea. Consider: in German "sch" is a
single phoneme, likewise "dd" and "ngh" in Welsh, and "ch" and
"c'h" in Breton.

>Do the EVA characters <a> <e> <o> <y> in fact represent vowels?

I am pretty sure that they do, for two reasons:

1. The application of Sukhotin's vowel algorithm suggests so.
2. Their shapes.

I also believe that <i> = <e> and that <ee> = <a>

If you rummage through the archives, you will find very old
posts from me where I opine that <y> is an unstressed,
undifferentiated vowel, a schwa in other words.

And rummaging through the archives again, you'll find what
I wrote about the likely origin of this alphabet: the
Beneventan script. EVA <ee>, which I think is the vowel
"a", is the spit and image of "a" in Beneventan. Frogguy
<ct> (can't remember EVA), which Sukhotin's algorithm
identifies as a consonant, is the spit and image of
Beneventan "t".

>If so,
>are they the only Voynichese vowels?

I have argued that <ol> was a single vowel, "ou", just like
in Modern Greek and in Armenian (I did not know any Armenian
back then, so I did not mention Armenian).

>Does Voynichese display vowel

I cannot tell at all.

>Do characters with a similar appearance represent phonemes with a
>common feature? E.g. would you expect the gallows characters to be
>various kinds of labial, various kinds of fricative or something like


>Is the Voynichese orthography systematically defective in the manner
>of the Semitic scripts?

You mean Arabic, Hebrew, Phenician, don't you? Certainly not. An obvious
reason: its entropy would be much higher, and the distances between
successive vowels as given by Sukhotin's algorithm would show a much
wider spread.

>Does the Voynichese script involve the systematic use of allographs
>like Latin capital letters?

I think so. In fact, I am sure. Didn't I write above that EVA <i> = <e> ?
I stand by that.
1997/09/19, posted by Jorge Stolfi

I have asked around for the month names in Occitan (a group of languages from Southern France that includes Provençal, Gascon, etc.) I got the following responses
so far:

Gascon(*) Std. Occit.(*) Toulon(**)
-------------- -------------- -------------
genièr genièr genier
heurèr febrièr febrier
març març març
abriu abril abriéu
mai mai mai
junh junh junh
julhet julhet julh[1]
agost agost august[2]
setémer setembre setembre
octòbre octòbre octóbre
navémer novembre novembre
decémer decembre decembre
1996/4/12, posted by Guy Thibault

On the foldout with the 3 circle (the rightmost having the 'pleiades'in it), look for the same label... You will find near 5'o'clock on the leftmostcircle and nead 3'o'clock (more inside) in the middle one the same label (taurus?). It might be taurus, since it is repeated in the rightmost circleand is even linked to the pleiades by a line...

With a star finder, assuming a northern hemisphere, near 45' latitudeit would seems that the left most circle is the winter night sky (orsummer morning sky) vue facing south. The middle circle would beeither summer night sky or fall morning sky...

The problem, is the these labels are also repeated in the circles with thenymphs... If the figures are indeed months, then the author must be callingmonths by their associated astrological sign...

One problem with this is the there is more than 12 different labels, what everthey are! I tried regrouping all the different labels and re-drawing the circleswith simple letters instead to more clearly see where a lebel is repeated...This might bring more ideas...

1996/4/15, replyed by Rene Zandbergen

All these labels in the zodiac.... most of them start with OP- or OF-(but not all which is very interesting in itself), and after thatapparently any combination of A, O, E, 9, J (and maybe one or two morethat I forgot). To me they look like numbers. In my little spare timeI'm trying to figure out if they could be dates. The OP- or OF- couldbe the year part (14xx or 15xx) or the end of the month name or many other things.

The alternate labels could be 'special' dates (Christmas, St.Valentine's day). Whenever such an occurrence (OP- or OF-) appears in the body text, and there is some ambiguity, it may be preceded by '4' to indicate it is not a number/date but a word ?
1993/09/17, posted by Robert Firth


Again, back to the Zodiac. The number of nymphs evidently matters - if you turn from folio to folio, you can see the scribe trying different patterns to fit them in (10+19, 8+16+6, 10+20...), and often having to scrunch them up, or spread them out, near the end of a ring. (By the way, that suggests these folios were not a fair copy, but at least in layout were originals.)

But, what calendar has one month of 29 days (Pisces) and then nine months of thirty days? None at all. Maybe Pisces is a mistake, and all the months have 30 days. But that's the Pharaonic Egyptian calendar, and why would anyone revive it? And why have it beginning with Pisces - the Egyptian year began in our July, with the heliacal rising of Sirius.

So, maybe this is astrological rather than calendric. But, again, why begin the calendar with Pisces and not Aries? That I find very, very odd - because, you see, it's right: the Vernal Equinox does fall in Pisces - about 2 days in at present, and in 1350 about 11 days in. Why, alone of all works of Western astrology, is the Voynich Zodiac true to the stars? Who in that age even remembered the precession of the equinoxes, discovered by Hipparkhos of Nicaea but, as far as I know, forgotten until Tycho de Brahe?

(No, wait a minute, there's a Middle English rime about the Great Year - Graves quotes it in The White Goddess - so maybe this knowledge just went underground.)
And why are Aries and Taurus divided into light and dark halves, each with 15 nymphs, stars, or days?


Jacques counters this emphatically, i.e. not French but perhaps a Slavionic language, based on the Octember.

1997/09/18, posted by Jacques Guy
We have been through all this several times around. Plus, there is that "Octember" for October, and, a long time ago, someone here said that it was typically Slavic.

1997/09/18, posted by John Grove
For the months I can make out marc, abril, may (the light one has a circumflex accent above the y), octebre, and novebr.. I suppose you could compare them to Russian (a little) Marta, Avril, Maj (a character that does in fact have a line over it in it's Cyrillic equivalent), OctYaBr' & NoYabr' (The Ya is a single character that looks like a backwards R in the Cyrillic Alphabet. However, I think that one could place the spellings into a large number of the European Languages... Mars, Avril, Mai.(French)..... Marzo, Abril, Mayo (Spanish)..... Marco (with a Cedilla), Abril, Maio (Portuguese) etc....
Voynich calendar

1992/02/20, posted by Robert Firth


(1) Two of the zodiac emblemata are unusual. As he points out,
Scorpio isn't a scorpion; my guess is a salamander. And
Gemini seems to be a male/female pair, which is wrong: in
classical mythology, they're Castor and Pollux.

(2) The names of the houses are identical in greek and latin,
except that libra ("scales") is zygos ("yoke"). The decans
(three per sign) are, as he suspects, coptic.

(3) If (a big if) the calendar is 12 months of 30 days each, then
it's egyptian, fer sure. That was the calendar Sosigenes took
to Julius Caesar; it was a roman idea to make the months
uneven. However, the starting point is then wrong: the egyptian
religious year began at the winter solstice, with the five
intercalary days; the administrative year began, of course,
with the nile flood.

(4) The semitic month names are from the new babylonian calendar.

Babylonian Hebrew

Nisanu Nisan
Ayaru Iyyar
Simanu Sivan
Dumuzu Tammuz
Abu Av
Tashritu Tishri
Arakhsamna Heshvan
Kislimu Kislev
Tebetu Tevet
Shabatu Shevat
Adaru Adar

Nisanu 1 was the new year. However, I feel any correspondence is unlikely, since this was a lunar calendar, with 7 intercalary months added every 19 years in the pattern of the Metonic cycle.
A better guess, I think, would be the lunar/solar calendar
described in the Book of Enoch, which I'll look up and post.

(5) Finally, note that the month names were added by a later hand, and almost certainly by using the western astrological mapping of sign into month. That mapping has been wrong for over two
thousand years, because of the precession of the equinoxes.
It's the obvious guess, but we shouldn't assume it's right.


replied by Jacques Guy.

Is it really a calendar? If so, not ours: the months are all
30 days long. The year starts some time in March, reasonable
enough. What is the language of months? Got me. I posted a
query about it to Linguist@tamvm1. The hand in which the
month names are written is compatible with the 15th-century
garb of the crossbowman. Why aren't there any obvious month
names in Voynichese? The crossbowman covers part of an
inscription. Does that mean that the Voynich is 15th-century
at the latest? I'd say so. Why are April and May split into
two sub-months of 15 days each? Each "day-figure" has a
caption. Names of patron saints? For each "month" Petersen
lists the zodiac sign in Latin, Greek, and Hebrew, and gives
a name to each decan, which seems to me to be Coptic or
Ancient Egyptian (written in Greek letters).


Karl Kluge says: "if this is some sort of idealized calendar with uniform 30 day months...". Not necessarily idealized. The Mayas had a year of 12 months of 30 days, plus 5 or 6 "leap days". That was their civil calendar, the "haab", distinct from their religious calendar, the "tzolkin" of 18 months of 20 days. I vaguely remember that the Egyptians had a calendar of 12 months of 30 days --- vaguely, because all my references are at home. The Romans... what did they have? Anyway, the Voynich "calendar" does not seem to be lunar. If it were, we would have 29 and 30 days alternating pretty regularly. The interesting part is the month names in that European language. "Octember" ought to be a dead giveaway."Yony" is reminiscent of Modern German "Juni", and "jollet" of French "juillet". "Abril", on the other hand, is Spanish!
"Abril", on the other hand, is Spanish!
1992/1/7, posted by Robert Firth


Zodiac: D'Imperio gives the count of figures, and every month has 30 (except poor pisces, with 29). That strongly suggests these figures have a calendrical basis, so if they are astrological they aren't about ephemerides or horoscope casting; they're more likely to be "Rudolph's Lucky Days for 1607". Incidentally, there are a lot of similarities between the labels on the piscean nymphs and the labels on the drawings of f82v. Otherwise, I might conjecture that the names are star names, and that most of them begin with "oqp-" because most star names begin with "al-".

Astronomical: here also the two examples I have seem calendric. The diagram on f68r is a circle divided into eight segments, four with single stars and four with groups of stars. The text starts in the middle of the NW segment, which has the largest star. I think this stands for "spring", the eight segments are for the quarter and cross-quarter days, and the diagram is oriented so you rotate it counter-clockwise to read the text. Incidentally, if the circumscribed text starts at the vernal equinox, then the top is approximately May Day and the bottom approximately All Saints Day; the connotations with "witchcraft" are clear.
