posted by ぶらたん at 21:32| Comment(0) | その他


On 30 Oct 2001, at 11:37, Zachary Owen wrote:

> I seem to recall someone stating that there were no numbers
> found in the VMS, I'm not too sure about letter frequencies, so I was
> wondering how this conclusion was reached.

2001/10/31, posted by Gabriel Landini

I had a look at the word patterns in EVA and could not find any obvious EVA characters behaving exclusively as roman characters.

This assumed:
1. numbers from I to III should appear at least once
2. the transcription was correct
3. the eva alphabet is the correct coding of the vms
4. the numbers were written as single words
5. the numbers were coded with a simple substitution (not
considering equivalent characters),

I can repost the text, otherwise it may be found in the mail achive (around 1998?).

Also note that some EVA characters are very close to arabic numerals: r~2, l=mediaeval 4, d=8. This prompted Dennis Mardle to suggest that <olld> in folio f66r.23 could mean 1448.

Also note that some EVA characters are very close to arabic numerals: r~2, l=mediaeval 4, d=8. This prompted Dennis Mardle to suggest that in folio f66r.23 could mean 1448.

2001/11/2, posted by Dennis Stallings
I don't remember where it is, but in the VMs there's a diagram of a circle with 5 degree increments marked out. Each 90 degree quadrant has the 5 degree increments marked with the same sequence of VMs characters. These seem like good candidates for VMs numbers. Surely someone remembers where this diagram is.
posted by ぶらたん at 21:04| Comment(0) | その他

medieval French would be a good candidate

2001/4/26, posted by Dennis Stallings

Jorge once said, "Many of us believe that Voynichese is a monosyllabic language in a complex script". To that I add, it may be such a representation of a common European language broken into syllables, ie. the words are actually syllables of a common European language.

I think medieval French would be a good candidate for this. Consider:

1) In spoken medieval and modern French, words are not distinguished separately; the stress on each syllable is about the same.

2) French poetry is not the weak-STRONG, weak-STONG, etc. iambic pentameter of English, nor the LONG-short-short, LONG-short-short, etc. dactylic hexameter of ancient Greek (and ancient Latin under Greek influence); no, a French verse is a fixed number of *syllables*! The alexandrine verse, the rough equivalent of heroic couplets in English, is a rhymed couplet of two lines of eleven syllables.

3) Louis XIV'x Royal Cipher was never broken in his lifetime, and records of it were lost afterward. When the late-19th-century crippie Ètienne Bazeries finally broke the cipher, which was expressed in groups of three numerals, he found that French *syllables* were enciphered, not single letters.

4) I believe that by the time of the VMs' origin (ca. 1480), French had become the language of communication of Europe's upper classes. In 1290, Marco Polo dictated his story of his travels - in French.

Of course, it could be a dialect of Italian. After all, the Renaissance was going on there at the time. René noted that the Vat. 1291, showing the nymphs Voynich-style, was in northern Italy at about the time. Toresella suggests Venice because of the prevalence of the "alchemical herbals" around there. Certainly Venice was a crossroads of many cultures then.

Jorge has done admirable work on the structure of all Voynich words. But let's not forget that Tiltman came up with a paradigm that explains 55-60% of Voynich "words". Even better, Robert Firth came up with a paradigm that explains 75-80% of Voynich "words":


So. Choose three French texts of ca. 1480 (Rabelais and Montaigne are later, ~ 1550, so perhaps Marco Polo?), manually break them down into syllables, and get counts of the syllables. Then compare the top 280 syllables to the 280 Voynichese "words" that fit the Firth paradigm.

Yes, Voynichese may be homophonic, offering several alternatives for a given number of syllables; thus the top 280 Voynichese words may represent the top 100 French syllables. Yes, 8000 other Voynichese words represent the remaining 20%. But, instead of an empty volume, what I suggest might give us a piece of Swiss cheese whose holes we could fill later.
posted by ぶらたん at 17:43| Comment(0) | 書かれた言語


2001/3/5, posted by Gabriel Landini

I did, most repetitions are with the recipes section:
f1v # Same plant as f102r1[3,2] ? (Stolfi)
f18v # Same plant as f102r2[3,1] ? (FSG, Stolfi)
f19r # Same plant as f102v1[2,2] ? (Stolfi)
f23r # Same plant as f102r2[3,1] (FSG, GL)
f32v # Same plant as f102r2[1,2] ? (FSG, Stolfi)
f37v # Same plant as f102r1[3,1] ? (Stolfi)
f39r # Same plant as f95r2 ? (Stolfi)
f47v # Same plant as f102r2[1,1] ? (Petersen)
f48r # Same plant as f89v2[3,4] ? (Stolfi)
f47v # Same plant as f89v[1,4] ? (Stolfi)
f90v1 # Same plant as f100r[1,3] ? (Stolfi)
f96v # Same plant as f99r[4,1] ? (GL)
posted by ぶらたん at 16:13| Comment(0) | 植物


2001/2/22, posted by Adam McLean

It seems to me that many of the skilled cryptographers on this group have puzzled and worked over the Voynich now for many years and yet seem no nearer to cracking the code.

It also seems unlikely to me that someone in the 16th century could devise a code that could defeat 21st century methods.

But how else can we proceed ?

I know I must sound like an old bore, always coming back to the same theme, but it seems to me that we have not yet exhausted an approach based on seeing the context of the manuscript - and relating it to other similar material. There may not be a Rosetta stone for the Voynich, but there may be some manuscripts out there that might help us see the context of the Voynich. Recently Dana Scott seems to have spent many hours surfing the net looking for images and parallels in manuscripts. A valient effort, however, I suspect only 0.01 % or less of medieval manuscript material has been scanned and placed on web sites. We really need some primary research done in libraries and special collections of such material, or to tap the knowledge of someone who has studied such material in depth.
posted by ぶらたん at 14:33| Comment(0) | その他


Antoine Casanova's research

2000/10/16, posted by Adam McLean

I have just reread Antoine Casanova's posting on 6th March 2000, based on his thesis, which reveals a structure within the individual 'tokens' in the Voynich language. These he shows as a series of rules, and from these he concludes that the language of the Voynich is not a natural language but has the characteristic signature of an artificial language.

2000/10/17, posted by Jorge Stolfi

In my view, the most significant feature of Antoine's substitution patterns is that the first letter of a Voynichese word seem to have more "inflectional freedom", while the final letters are relatively invariant. These patterns are precisely oposite to what we would expect to see in Indo-European languages (at least Romance and Germanic), where grammaticalinflection usually modifies letters near the end of the word.
Presumably this is what Antoine has in mind whe he says that Voynichese words are "built from synthetic rules which exclude ... natural language". Anyway, I think that this conclusion is unwarranted. After all, there are non-IE natural languages, which I do not dare to mention by name 8-), that do seem to have `substitution patterns' similar to those of Voynichese.
Thus I don't accept Antoine conclusion that Voynichese must be an artificial language, or at best a code based on "progressive modification [similar to] the discs of Alberti". It cannot be just some IE language with a funny alphabet, sure; but we already knew that.
I find it interesting also that his analysis yield a very anomalous pattern for n = 8, namely P_8 = ( 6 8 1 2 3 4 7 5 ). While that pattern may be just a noise artifact, it may also be telling us that the rare 8-letter words are mostly the result of joining a 2-letter word to a 6-letter one.
I am not sure what to make of Antoine's rules for generating P_n from P_{n+1}. For one thing, they seem to be a bit too complicated given the limited amount of data that they have to explain. Moreover, the counts s_2,.. s_{n-2} seem to be fairly similar, and the differences seem to be mostly statistical noise; therefore, their relative ranks do not seem to be very significant. Indeed, applying Antoine's method to Currier's transcription we get P_6 = ( 1 4 2 6 5 3 ), whereas from Friedman's we get P_6 = ( 1 5 2 4 6 3 ). Moreover, the latter would change to P_6 = ( 1 5 3 4 6 2 ) if we omitted just two words from the input text.
But the main limitation I see in Antoine's method is that he considers the absolute position of each letter in the word to be a significant parameter for statistical analysis. I.e., he assumes implicitly that an n-letter word contains exactly n "inflectional", slots, each each of them containing exactly one letter. This view seems too simplistic when one considers the patterns of inflection of natural languages, where each morphological "slot" can usually be filled by strings of different lengths, including zero. To uncover the inflection rules of English, for example, one would have to compare words of different lengths, because the key substitution patterns are

dog / dogs / dog's / dogs'
dance / dances / danced / dancing / dancer / dancers / ...
strong / stronger /strongest / strongly

and so on.

Another problem of Antoine's method is that the most important structural features of words in natural languages are usually based on *relative* letter positions, and may not be visible at all in an analysis based on absolute positions. For example, in Spanish there is a particularly strong alternation of vowels and consonants, so that if words were aligned by syllables one would surely find that the "even" letter slots have very different substitution properties than the "odd" slots. But since Spanish words may begin with either vowel or consonant, and may contain occasional VV and CC clusters, the 3rd and 4th letters in a 6-letter word should be about as likely to be VC as CV; and, therefore, will probably have very similar substitution statistics.

Indeed, aligning words letter-by-letter is a bit like classifying fractional numeric data like 3.15 and -0027 into classes by the number of characters, and then analyzing the statistics of the ith character within each class, without regards for leading zeros, omitted signs, or the position of the decimal point. While some statistical features of the data may still have some visible manifestation after such mangling, we cannot expect to get reliable and understandable results unless we learn to align the data by the decimal point before doing the analysis.
posted by ぶらたん at 21:07| Comment(0) | その他

意味がある vs 意味がない

2000/1/23, posted by Rene Zandbergen

In order to have real, strong evidence that the VMs contains meaningful text, we need to know how one can create a 'meaningless' text that still exhibits the same properties as meaningful text. More to the point: we need to find a mechanism that could have been applied 400-500 years ago.

Jacques already pointed out that we don't actually know how to define meaningful and meaningless. This may well prove to be a serious problem. When trying to generate meaningless texts which the LSC would classify as mneaningful, or vice versa, we're likely to end up in the no-man's land bordering on the two.... Take a meaningful text and start removing words (every 10th, every 2nd, at random...). When does the text stop being meaningful? How does the LSC curve behave?

2000/1/24, posted by Jorge Stolfi

Consider that an ideal text compression algorithm should take "typical" texts and turn them into random-looking strings of bits. Of course this transformation preserves meaning (as long as one has the decompression algorithm!); but, for maximum compression, the program should equalize the bit probabilities and remove any correlations. Modern compressors like PKZIP go a long way in that direction. The compressed text, being shorter than the original, will actually have more meaning per unit length; but it will look like perfect gibberish to LSC-like tests.

Or, consider a meaningful plaintext XORed with the binary expansion of pi. The result will have uniform bit probabilities, and no visible correlations; but it will still carry the original meaning, which can be easily recovered. It would take a very sophisticated algorithm (one that knows that pi is a "special" number) to notice that the text is not an entirely random string of bits.

So the LSC and possible variants are not tests of `meaning' but rather of `naturalness.' They work because natural language uses its medium rather inefficiently, but in a rather peculiar way: it uses symbols with unequal frequencies (a feature that mechanical monkeys can imitate), but changes those frequencies over long distances (something which simple monkeys won't do).

However, with slightly smarter monkeys one *can* generate meaningless texts that fool the LSC; and the same applies for any "meaning detector" that looks only at the message. Conversely, one can always encode a meaninful text so as to make it look "random" to the LSC. In short, a naturally produced (and natural-looking) text can be quite meaningless, while a meaningful text may be (and look) quite unnatural.
posted by ぶらたん at 10:54| Comment(0) | その他


LSC (Letter Serial Correlation)

2000/1/23, posted by Mark Perakh

LSC test revealed in VMS features identical with meaningful texts we explored. On the other hand, if we assume that each voinichese symbol is a letter, then the letter frequency distribution in VMS is much more non-uniform than in any of 12 languages we tested. Furthermore, in one of my papers you can see the LSC results obtained for a gibberish which I created by hitting (suposedly randomly) the keys on a keyboard. It has some features of a meaningful texts, but also has some subtle differences from meaningful texts. You probably noticed that my conclusion was that, if we rely on LSC data, VMS can be either meaningful or a result of a very sophisticated effort to imitate a meaningful text, in which even the relative frequencies of vowels and consonants have been skilfully faked. I can hardly imagine such an extraordinarily talented and diligent forger, so I am inclined to guess VMS is a meaningful text, but some doubts remain. Moreover, if VMS symbols are not individual letters, all LSC results hang in the air.

2000/1/15, posted by Gabriel Landini

I think that the LSC depends heavily on the construction of words, But also think that word construction (because of Zipf's law) depends heavily on a sub-set of the word pool.

Long-range correlations in codes was discussed in DNA a couple of years ago in very prestigious Journals like Nature and Science, but to date I do not think that anybody had a convincing theory or explanation of the meaning and validity of the results.

If you think, really what is the relation (in any terms) of a piece of text which is many characters away from another? What is the large scale structure of a text? That would mean that there are events at a small scales and also at larger scales. I can imagine that up to the sentence level or so there may be patterns or correlations (what we call grammar?), but beyond that, I am not sure. Think of a dictionnary, there may not be any structure beyond 1 sentence or definition (still Roget's Thesaurus coforms Zipf's law for the more frequent words). Consequently I see no reason why there should be any large scale structures in texts. (I may be very wrong).

2000/1/16, posted by Mark Perakh

My comments related only to the question whether or not we can expect LSC to distinguish between meaningful and monkey texts. I believe the behavior of monkey texts from the standpoint of LSC is expected to be quite similar to that of permuted texts, therefore LSC is expected to work for monkeys as well as for permutations. I do not think LSC will distinguish between permuted and monkey texts. This is based of course on the assumption that the texts are long enough so the actual frequencies of letter occurences are quite close to their probabilities.

2000/1/17, posted by Rene Zandbergen

I agree with Gabriel that using a 3rd order word monkey would be even more interesting in terms of checking the capabilities of the LSC method in detecting meaningful text. On the other hand, getting meaningful word entropy statistics is even more difficult than getting 3rd order character entropy values, so the text from a 3rd order word monkey will repeat the source text from which the statistics have been drawn much more closely than should be the case. As before, a 1st order word monkey will be equivalent to a random permutation of words, and if it is true (in a statistically significant manner) that the LSC test distinguishes between one and the other, we do have another useful piece of evidence w.r.t. the Voynich MS text.

2000/1/20, posted by Mark Perakh

I believe we have to distinguish between four situations, to wit: 1) Texts generated by permutations of the above elements (as it was the case in our study). In this case there is a limited stock of the above elements, hence there is a negative correlation between elements# distributions in chunks, and therefore it is a case without replacement (hypergeometeric distribution). Our formula for Se was derived for that situation. 2) Monkey texts generated by using the probabilities of elements (letters, digraphs, etc) and also assuming that the stock of those elements is the same as that available for the original meaningful text. In this case we have again negative correlation and it is a no-replacement case (hypergeometric) so our formula is to be used without a modification. 3) The text generated as in item 2) but assuming the stock of letters is much-much larger (say 100,000 times larger) than that available in the original text, preserving though the ratios of elements occurrences as in the original text. This is a case with replacement (approximately but with increasing accuracy as the size of the stock increases). In this case our formula has to be modified (as indicated in paper 1) using multinomial variance. Quantitatively the difference is only in L/(L-1) coefficient which at L>>1 is negligible. 4) The text generated assuming the stock of elements is unfinitely large. In this case the distribution of elements is uniform, i.e. the probabilities of all elements become equal to each other (each equal 1/z where z is the number of all possible elements (letters, or digrams, etc) in the original text). In this case formula for Se simplifies (I derived it in paper 1 for that case as an approximation to roughly estimate Se for n>1). Quantitatively cases 1 through 3 are very close, but case 4 produces quantities measurably (but not very much) differing from cases 1 through 3 (see examples in paper 1).

2000/1/21, posted by Jorge Stolfi

Why should the LSC work?

In a very broad sense, the LSC and the nth-order character/word entropies are trying to measure the same thing, namely the correlation between letters that are a fixed distance apart.

People have observed before that correlation between samples n steps apart tends to be higher for "meaningful" signals than for "random" ones, even for large n. The phenomenon has been observed in music, images, DNA sequences, etc. This knowledge has been useful for, among other things, designing good compression and approximation methods for such signals. Some of the buzzwords one meets in that context are "fractal", "1/f noise", "wavelet", "multiscale energy", etc. (I believe that Gabriel has written papers on fractals in the context of medical imaging. And a student of mine just finished her thesis on reassembling pottery fragments by matching their outlines, which turn out to be "fractal" too.) As I try to show below, one can understand the LSC as decomposing the text into various frequency bands, and measuring the `power' contained in each band. If we do that to a random signal, we will find that each component frequency has roughly constant expected power; i.e. the power spectrum is flat, like that of ideal white light (hence the nickname `white noise'.) On the other hand, a `meaningful' signal (like music or speech) will be `lumpier' than a random one, at all scales; so its power spectrum will show an excess power at lower frequencies. It is claimed that, in such signals, the power tends to be inversely proportional to the frequency; hence the moniker `1/f noise'. If we lump the spectrum components into frequency bands, we will find that the total power contained in the band of frequencies between f and 2f will be proportional to f for a random signal, but roughly constant for a `meaningful' signal whose spectrum indeed follows the 1/f profile. Is the LSC better than nth-order entropy?

In theory, the nth-order entropies are more powerful indicators of structure. Roughly speaking, *any* regular structure in the text will show up in some nth-order entropy; whereas I suspect that one can construct signals that have strong structure (hence low entropy) but the same LSC as a purely random text.

However, the formula for nth-order entropy requires one to estimate z**n probabilities, where z is the size of the alphabet. To do that reliably, one needs a corpus whose length is many times z**n. So the entropies are not very meaningful for n beyond 3 or so.

The nth-order LSC seems to be numerically more stable, because it maps blocks of n consecutive letters into a single `super-letter' which is actually a vector of z integers; and compares these super-letters as vectors (with difference-squared metric) rather than symbols (with simple 0-1 metric). I haven't done the math --- perhaps you have --- but it seems that computing the n-th order LSC to a fixed accuracy requires a corpus whose length L is proportional to z*n (or perhaps z*n**2?) instead of z**n. Morever, one kind of structure that the LSC *can* detect is any medium- and long-range variation in word usage frequency along the text. (In fact, the LSC seems to have been designed specifically for that purpose.) As observed above, such variations are present in most natural languages, but absent in random texts, even those generated by kth-order monkeys. Specifically, if we take the the output of a k-th order `letter monkey' and break it into chunks whose length n >> k, we will find that the number of times a given letter occurs in each chunk is fairly constant (except for sampling error) among all chunks. For kth-order `word monkeys' we should have the same result as long as n >> k*w, where w is the average word length. On the other hand, a natural-language text will show variations in letter frequencies, which are due to changes of topic and hence vocabulary changes, that extend for whole paragraphs or chapters. Thus, although the LSC may not be powerful enough to detect the underlying structure in non-trivial ciphers, it seems well suited at distinguishing natural language from monkey-style random text.

In conclusion, my understanding of the Perakh-McKay papers is that computing the LSC is an indirect way of computing the power spectrum of the text. The reason why the LSC distinguishes meaningful texts from monkey gibberish is that the former have variations in letter frequencies at all scales, and hence a 1/f-like power spectrum; whereas the latter have uniform letter frequencies, at least over scales of a dozen letters, and therefore have a flat power spectrum. Looking at the LSC in the context of multiscale analysis suggests many possible improvements, such as using scales in geometric progression, and kernels which are smoother, orthogonal, and unitary. Even if these changes do not make the LSC more sensitive, they should make the results easier to evaluate. In retrospect, it is not surprising that the LSC can distinguish the original Genesis from a line-permuted version: the spectra should be fairly similar at high frequencies (with periods shorter than one line), but at low frequencies the second text should have an essentially flat spectrum, like that of a random signal. The same can be said about monkey-generated texts. On the otherhand, I don't expect the LSC to be more effetive than simple letter/digraph frequency analysis when it comes to identifying the language of a text. The most significant influence in the LSC is the letter frequency histogram --- which is sensitive to topic (e.g. "-ed" is common when talking about past) and to spelling rules (e.g. whether one writes "ue" or "ü"). The shape of the LSC (or Fourier) spectrum at high frequencies (small n) must be determined mainly by these factors. The shape of the specrtum at lower frequencies (higher n) should be determined chiefly by topic and style.

2000/1/22, posted by Jorge Stolfi

For one thing, while the LSC can unmask ordinary monkeys, it too can be fooled with relative ease, once one realizes how it works. One needs only to build a `multiscale monkey' that varies the frequencies of the letters along the text, in a fractal-like manner.

Of course, it is hard to imagine a medieval forger being aware of fractal processes. However, he could have used such a process without knowing it. For instance, he may have copied an arabic book, using some fancy mapping of arabic letters to Voynichese alphabet. The mapping would not have to be invertible, or consistently applied: as long as the forger mantained some connection between the original text and the transcript, the long-range frequency variations of the former would show up in the latter as well.

Moreover, I suspect that any nonsense text that is generated `by hand' (i.e. without the help of dice or other mechanical devices) will show long-range variations in letter frequencies at least as strong as those seen in meaningful texts.

Thus Mark's results do not immediately rule out random but non-mechanical babble or glossolalia. However, it is conceivable that such texts will show *too much* long-range variation, instead of too little. We really need some samples...
posted by ぶらたん at 20:26| Comment(0) | テキストの性質



1999/1/18, posted by Jorge Stolfi

> [Takeshi:] Isn't it difficult that we assume plants and human
> have common properties?
> e.g.
> zod f70v2.S2.13 ACKV =otaldy=
> pha f101v2.R1.2 AHV =otaldy=
> By the way, I thought one person represent one day in the zodiac
> calendars. But it is not true, right? (I mean, there are 30
> women in each zodiac calendars. But some women have the same
> label.)

Well, have you seen my "Chinese theory" page? Perhaps the correct
reading of <otaldy> (as the VMS author intended it) is <chang>, but
one of those <otaldy>s is <chàng> and the other is <cháng>...

> What do their labels mean in the zodiac calendars? What do you
> think kind of property they have? Their name? their birthday?
> where they live? They have a same kind of star? who and who are
> relatives by blood and marriage? etc.

I have no satisfactory theory for what the "zodiac" diagrams and the
numphs are supposed to be. If they indeed represent the zodiac signs,
why do they all have 30 "stars"? Why are Aries and Taurus split in

Even the zodiac symbols at the center are a bit suspect; it is
possible (although, I admit, unlikely) that the central circles were
originally empty, and the signs were added later, by someone who just
guessed they were related to the zodiac. Or perhaps the guess was made
by the VMS author himself, as he copied the diagrams from some other

If the nymphs are real or imaginary individuals (not just decoration),
then the labels are likely to be their names; in which case it is not
that strange to see repetitions.

> occurrence count
> ---------------------
> okaly H 4 (A 5)
> okoly 2
> otal dar 2
> okam 2
> okaldy 2
> okeoly 2
> okalar 2
> oteolar 2
> okeey ary 2
> otaly 2
> okal 2
> otaraldy 2

I hadn't noticed that there were so many repetitions in the Zodiac.
Very strange! Why is no label repeated three times? Is there any
pattern to these repetitions (such as position of labels in
diagram, etc?)

> Is it possible to think that <okal> or <otal> itself have a
> meanings and +<y> or +<dy>?

I wish I knew the answer....

If the language is Chinese, this is somewhat unlikely (although the
<y> or <dy> could be tone marks, and I believe that in Chinese
there are some rules that say that tone X changes to tone Y when
it comes before a word with tone Z.)

On the other hand, if the language is Chinese then those resemblances
are not surprising, and they do not mean anything: "ching" and "chi"
are not related...
posted by ぶらたん at 23:28| Comment(0) | その他


1999/1/15, posted by Jorge Stolfi


Besides <otoldy> we can look at the similar words <otaldy>, <opaldy>,
<ytaldy>, <ytoldy>, etc., which could be alternative spellings of the
same word.

I count 9 occurrences of those words as labels, and 17 as words in the text.

Here are those occurrences, extracted from the concordance I posted
recently. I have split them into labels and text, then sorted by
section and page. (I have kept only the "majority" version ("A")
of each occurrence. There were only a few dissenting votes, usually
by the FSG/SSG transcriptions.)


sec location trans occurrence
--- ------------ ------ ----------------------------------------------------
cos f67r1.S.1 ACHV =otaldy=
zod f70v2.S2.13 ACKV =otaldy=
bio f82v.L3.14 AHV =otoldy=
pha f88r.m.1 AHLV =otaldy=
pha f89r1.t.4 AHKLV =otoldy=
pha f89r2.L2.0 AHUV =otoldy=
pha f99r.L1.12 AHU =otoldy=
pha f99v.L1.1 AHUV =otoldy=
pha f101v2.R1.2 AHV =otaldy=


sec location trans occurrence
--- ------------ ------ ----------------------------------------------------
bio f78v.P.26 ACFHV shedy sol fchedy otaldy/lol *ar shr r ol
bio f79r.P.28 AFHV dai*n yteey chyteey otoldy lchey/lcheey qochey
bio f79v.P.1 AFHV olk*ry qotolol otaldy otedol or olorol/
hea f22r.P.11 ACFH yckhody qokchy oky otoldy yty dol or-dachy
hea f28r.P.7 AFH shockhy shocthy otoldy-dshor dol dar/oschotshl
hea f2r.P.6 ACFH daind-dkol sor-ytoldy-dchol dchy cthy/
hea f44v.P.1 ACFH shol tol qotshol otoldy/yolkol cheol qokchain
hea f52r.P.2 AH dar yty/oty shor ytoldy qoky koldal oteees
hea f52r.P.3 ACFH tchody qotam oky-ytoldy/lshopchy qoky qotchy
hea f53v.P.13 ACH -*dam/ycthodaiin otoldy=
hea f9v.P.9 ACFHU tor chyty dary-ytoldy/oty kchol chol
heb f43v.P.1 ACFGU r araiin otedy opoldy/shedy octhy otedy
heb f48r.P.1 AFH ykeeody olaiin opaldy/daiin yteeol choody
heb f48v.P.9 ACFGH loldy lol-otchdy otoldy ytam otedy/tol
heb f95v1.P.4 AFH qokal oty shekshey otaldy okshey ytshedy
pha f89r2.P3.8 AFHLU che* oldy sheodal ytoldy/daiin cheok o keol
str f58r.P.30 AFH chody cheol okolchy otaldy/odshchol taiin

Note that almost all occurrences of <otoldy> and friends *as labels*
are in the pharma section, and almost all occurrences *as text* are in
the herbal section. Of these, 8 are in herbal-A (all <otoldy> or
<ytoldy>), 4 in herbal-B (one each of <otoldy>, <opoldy> <otaldy>,
<opaldy>; so these may be bogus).

Note also that <ytoldy> tends to occur right after gaps in the text
due to intruding plants (marked by "-" above). I take this fact as
evidence that <y> is (always? often? sometimes?) a calligraphic
variant of <o>, used at end-of-word and sometimes at

There is one occurrence of <ytoldy> in the *text* of pharma page f89r2.
Coincidentally that is the only page with *two* occurrences of
<otoldy> as a label.

Moreover, there is some evidence that <k> and <t>, while distinct, were
interchangeable to some extent. Indeed the distribution of <okoldy>
and its variants is somewhat similar to that of <otoldy>:


sec location trans occurrence
--- ------------ ------ ----------------------------------------------------
cos f68r1.S.14 AHUV =okoldy=
zod f72r2.S2.5 AHV =okaldy=
zod f72v3.S1.18 AHUV =okaldy=
bio f82r.L2.5 AUV =okaldy=
bio f82v.L3.14 U =okoldy=
pha f88r.b.3 AHKV =ofaldo=
pha f89r2.L2.0 L =okoldy=


sec location trans occurrence
--- ------------ ------ ----------------------------------------------------
hea f18r.P.8 AFH qokchor ckhol olody okaldy-dary/chol chcthal
hea f36r.P.7 ACH -dan/qotol cthol okol dy okchy-ytorory-sold/
hea f3v.P.4 ACH **s eey kcheol okal do r chear een/y**ear
hea f54v.P.9 ACH qockhey qodal ytam okal dy/kol c*kaiin chckhy
heb f33r.P.2 ACFH ytchedy qokar cheky okaldy qokaldy otor oldar
heb f40r.P.4 ACFH okaiin okar oky okoldy ol/lokar qokar
heb f43r.P.2 ACFGHU chety dar aiir okaldy daral otchdy daiin
heb f43v.P.2 ACFHU ches***y okeody oky okaldy kchdy okar/tody
cos f57v.R1.1 AHU daram qokar okal okal d o l shkeal dydchs
bio f75r.P.27 ACFHV qoty pshar shedy okaldy-dar otar otedy
bio f75r.P.44 ACFHV okedy qokedy otedy okoldy otar otam olaiin
bio f82v.P.12 ACFHV qokchey qokain okal dy lchedam/orain shedy
pha f88r.P3.11 AHL lkeey cthol poldy s-okoldy/qokol chol qokol
pha f89v1.P1.12 AHL kaiin ykchol qockhy okalda otal dal chodar
pha f99r.P2.7 AFH cheody qokol okoly okoldy qokoly qokal okchol
pha f99v.P1.2 AFH qokchol qokeol okoldy-q*kholdy t*ly daiin/
str f105v.P.3 AFHT okair qotol dol okoldy qokedy opched oteedy
str f113v.P.48 AGH qokeeedy lkaiin okal dy/yshey teeo oteedy
str f58v.P1.10 AHU qokal* qokaiin okal okaldy ory/tchol shol
str f58v.P2.27 AHU okey okal o*aly okaldy okeor sheey=

Note that there are three <okoldy>s in the *text* of the pharma pages,
all of them on pages where <otoldy> occurs as a label (f88r, f99r,
f99v) --- one more bit of evidence for a close relationship between
<k> and <t>. (Grammatical inflection, perhaps?)

> Is it evidence that the manuscript itself is meaningless?

On the contrary, I think that the highly skewed distributions of
<otoldy> and <okoldy> confirm (once more) that the VMS is *not*
random text.

> I don't think these four plants are the same. Why do different
> plants have same name? ... After someone succeeds in identifying
> what language or code is in the Voynich MS, can we explain these
> repetitions? I don't think so...

The labels may indicate properties of the plants, not their names. The
properties could be the plant's usage (e.g. "poison", "tonic",
"emetic", "diuretic", "too strong", "doesn't work"), its
smell/flavor/color/size, which parts of the plant are used, how it is
repared ("infusion", "poultice", etc), the season for picking, the
place or date of the finding, the country where it grows, the dealer
who sells the plant, the sympathetic star, the name of the daemon who
summoned by consuming the plant, etc. etc.

Someone suggested that the labels may be meaningless tags, used
just as we would use (a), (b), (c) or (1), (2), (3), etc.

Or perhaps (ahem!...) those <okoldy>s were distinct but
similar-sounding words in an unfamiliar language, and the author was
unable to hear the difference.

In any case, I am almost convinced that the drawings in the pharma
section are "field notes", where the author recorded plants as he
"found" them; and the herbal and bio pages are later elaborations on
those notes. The pattern of <otoldy> occurrences above seems at least
compatible with this theory.

The main evidence for this theory is the fact that some pharma
drawings are repeated in the herbal pages --- enlarged and done with
more care, sometimes with fancy flowers, but in the same pose and with
the same details (i.e. definitely a copy of the same drawing, not just
an independent drawing of the same plant.)

Note that the pharma plants may have been "found" in a pharmacist's
shop, in a library, in the teachings of a master/guru/shaman/explorer,
in conversations with natives, etc.. However, since many pharma plants
are unlabeled, or bear repetitive labels, I think it is slightly more
likely that they were found in the wild by the author, and he did not
know their names (except for a few plants, eg. the maidenhair fern.)

Since we are on the subject: I think that the "containers" in the
pharma pages were added after the whole section was complete, as an
afterthought. Note that they are all squeezed in the margin (except in
one instance where the container lies between two plants).

The containers could represent plant categories, of course; but
perhaps they are (also?) "thumb marks" for quick page finding...

In any case, it seems that the plants in the pharma section seem to
have been sorted by some criterion; not only by inference from the
presence of the containers, but also from looking at the drawings
themselves. So they probably aren't the primary field sketches, but
clean-copies made sometime later at the "office".

However the relative realism of the pharma drawings says to me that
they were made by someone who had seen the plants --- which cannot be
said of the herbal drawings. In fact I would bet that the herbal
drawings were done by assistants or hired illustrators.
posted by ぶらたん at 22:59| Comment(0) | その他


1998/11/25, posted by Jorge Stolfi

--- okal (VERY COMMON) -------------------------------------------

A very common word in the VMS. It could mean "Sun", or perhaps "Moon". (Or "water"; the planet Mercury is literally "Water-Star" in Chinese... 8-)

--- opchol dy (RARE) ---------------------------------------------

The words "opchol dy" or "opcholdy" do not seem to occur elsewhere, but there are half a dozen near misses:

"otshol dy" occurs in Herbal-A text (f7v).

"qokchol dy" ditto (f18r).

"okchaldy" ditto (f23v).

"opchaldy" ditto (f45r).

"okchol do" ditto (f52r).

"ofsholdy" is a Zodiac star label (Cancer, f72r3).

"ypcholdy" is a Pharma plant label (f102v1).

"yteeoldy" mentioned in Pharma text (f101r1).

--- ytoaiin (RARE) -----------------------------------------------

My concordance finds no exact recurrences, but does find half a dozen near missses:

"ykoaiin" in the text under the same diagram (f67r2, line 1), and in an early herbal-A page (f3v, line 1).

"qokoaiin" in the text under the same diagram (f67r2, line 3), and part of a label in the Cosmo diagram overleaf (f67v2).

"otoaiin" in text around a "Sun face" on a nearby page (f68r2), and in an early herbal-A page (f1v, line 5).

"okoaiin" in a Pharma text (f89v1, line 10).

"opyaiin" in a herbal-A page (f23r, line 1).

--- dolchsody (VERY RARE) ------------------------------------------------

Occurs just once more (split and without the "s" plume), on page f66r, line 19:

...daiin daiin dal DOL CHEODY dairaly dairal...

--- okain am (VERY RARE?) ------------------------------------------------

Occurs once more (with the "q" prefix), on f111v, line 9:

...okeey qokeey qokey qOKAIN AM- soiin shed qoksheo...

But "am", like most words ending with "m", is almost surely an abbreviation (note its occurrence at end-of-line). The word "okain" alone is extremely common.

--- yfain (VERY RARE? VERY COMMON?) -------------------------------------

The word "yfain" itself does not seem to occur elsewhere. On the other hand some `equivalent' words like "okain", "ykain", "otain", "ytain" etc. are exceedingly common.

--- ofar oeoldan (VERY RARE) --------------------------------------------

The word "ofar" is very common, but "oeoldan" does not seem to occur elsewhere, not even in disguise ("ysoldon", "araldan", etc.).

(BTW, the whole phrase "ofar oeoldan" did not make it into the index because of a bug in my code, in the handling of comma-spaces. One more thing to fix for the next release...)

--- doaro (VERY RARE) ---------------------------------------------------

The concorance shows no other occurences of this word or its `equivalents' ("dyary", "doary", "daosy", etc.).
posted by ぶらたん at 20:57| Comment(0) | その他


Could the difference between A and B be due to different subject matter?

1998/11/25, posted by Rene Zandbergen

> Another thought. Could the difference between A and B be due to
> different subject matter?

It could be, but then we know that the text has nothing to do with the illustrations. Herbal-A and Herbal-B are the most different of all 'dialects', in my scatterplots (based on digraphs). Pharma and Astro/Cosmo are very similar, despite their probable subject- matter.

But more importantly, the same scatterplots show that the difference between A and B is of the type of a continuous change. As if the writer's style (spelling, cypher characteristics) gradually changed with time. From these plots, I can think of three possibilities.

1) One author, who started writing in B-style and gradually devellopped A-style. This could mean that the Herbal-A section is a cleaned-up copy of earlier Herbal-B-type scribbles, but this task was not completed. It would also mean that the zodiac section was written backwards.

2) One author, who started writing in A-style and gradually 'degraded' into B-style. This would mean that the Herbal-B pages in the first half of the Ms have been misplaced during the binding by an illiterate (like us :-) ). Note that the Herbal-B handwriting is the only part which is visibly different from the rest.

3) Two authors ('A' and 'B'). They started with a common style, 'A' doing pharma and 'B' doing astro-cosmo. 'A' then did herbal-A and 'B' then did the stars and bio sections. 'B' also did some herbal pages perhaps when 'A' was no longer able or willing to continue. The nice aspect of this theory is that 'A' did all the plant drawings, and 'B' did all the drawings involving stars and nymphs. The odd part is that the sections on which they started are all on foldout pages and only later they went to normal pages in normal-sized quires.

(3) is the more fascinating option, which explains most of the observed features, but (2) is the simpler explanation, which is also worth something.
posted by ぶらたん at 23:04| Comment(0) | テキストの性質


George of Trebizond

1998/7/17, posted by Rene Zandbergen

Let me further ruin my credentials with a silly theory based on a lot of circumstantial evidence. It's almost too good to be true. According to this theory, the writer of the Voynich Ms is one George of Trebizond. Here are a few reason why, mostly from the very learned book: George of Trebizond, a biography and a study of his rhetoric and logic, John Monfasani, Leiden, Brill, 1976.

Trebizond lived from 1395 to 1472-3. He was originally from Crete but came to N.Italy where he lived for some years (Venice, Padova), and then moved to Rome. He translated many classical Greek texts to Latin (notably Ptolemy's Almagest, which took him less than a year). George was a humanist, though at odds with most of his fellow humanists.

For many years he was secretary to the Pope, which presumably means that he would have been familiar with the diplomatic codes, of the type described by Tranchedino.

George was quite mad, especially when he got older. Here's a short quote from the introduction by Monfasani (whose thesis is that he may not have been all that mad after all):

"[Trebizond] has always been something of an enigma. [...] [he] was a Greek who preferred to write in Latin; a translator of Plato who launched the most vituperative attack on Plato in the Renaissance, and perhaps of all time; a famous humanist and authority on rhetoric who passionately defended medieval scholasticism; and, most paradoxically of all, a very pious Christian and familiar of popes who risked his life to tell Mehmed II, the Moslem conqueror of Constantinople, that he (Mehmed) was the King of Kings and Destined Ruler of the whole world."

Later he says that a psychoanalytical study of Trebizond would seem to be in order but he hasn't the necessary knowledge. Like Roger Bacon, Trebizond got himself thrown into gaol by the clergy, but he managed this feat three times. The second of these was because of his 'libido' and there are letters about a scandal involving the sexagenarian Trebizond and a Venetian girl. The third occasion was because of his suspected treason to Mehmed II.

Trebizond had a firm belief in astrology. He writes that his fated life was controlled by his birth in the sign of Pisces.

Still, Trebizond was an expert scribe and it should be suspected that he could draw better than what we see in the VMs. This is the main reason why he probably has nothing to do with it. Also, it is not clear why he should have done it, nor what is contained in it

Finally though, the mechanism how the VMs would have ended in Prague is clear. Trebizond wrote a highly contested commentary on the Almagest, which was only published after his death (by his son Andreas). There are letters between Tycho Brahe and Taddeus Hajek (one of Rudolph's trustees and host to Dee) that they need to acquire a copy of Trebizond's commentary, lest such an important product of the human mind be lost to posterity. The VMs would have come with it.

This is just the tip of the iceberg. Trebizond was involved in many strange actions and activities, too numerous to describe here.
posted by ぶらたん at 21:14| Comment(0) | 制作者


John Stojko's translation (Letters to God's Eye)

1998/4/20, posted by John Stojko

f2v in Voynich

The Voynich manuscript is written in one language v Ukrainian, according to my decipherment. The alphabet consists of consonants only and is written from left to right. It is written not by one person but by many. Bellow is given original writing in English alphabet and than translated to English language.

I will divide the original writing into 6 section separated by empty spaces. The voynich cipher is printed in Caps letters and the added vowels are in low letters.

1st phrase 5 letter - T W W ZH J = Te Wy WyZHyJe
2nd phrase 7 letter - P N W SCH P W R = PoNoWe SCHo Po WiRy.
3rd ? 11 ? - W S T ZH J W K B Z J = WiSTe BoZHyJ, Wy u oKo
BoZiu (or BoZJu)
4th ? 3 ? - P W R = PoWiRe.
5th ? 4 ? - K B Z R = oKo oBZyRa
6th ? 5 ? - D N ST S = oDNoST Siu. (I use ?iu| to soften

1. Te wy wyzhyje I ponowe scho po wiry. Wiste bozhyj, wy u Oko Boziu powire. Oko obzyra odnost siu.
2. Tupezh ty peshe. Odnowa odnowyly I siu wist powi. Ale wy omane de wistysh. Powi Ori Oko bozhyje.
3. Wy shto pyshe po wiri? Ladno se powily, opowi Kosa i powi Oko Bozia. Popustu se Oko bozhyje.
4. Odnowu powolaw panowi Ori I opowi Oko bozhyje.
5. Te powiru, dnes Oko bozhe zhyje. Popyta se. De odne nashe Oko wire, powilo Oko bozhyje.
6. Oko wiru opowilo po wiry. Powily ta unowyly I pyshu. Puste se Oko bozhyje. Wy shto powire po baju.
7. Oko bozhyje opowist panusi. Se wy shto omynash? Powi ty unowo de po Medu u poru pan bozhyj.
8. Powi to win duma po wiry? Ponowyly, powily I khwalyw Kosu.

(English translation)
1. That you outlive and renew what is according to the believe. The god#s is informing, you will believe in Oko Boziu (baby God). Oko is looking for this unity.
2. You are writing stupidly. Again we renewed and I am telling this news. But you will deceive where you are informing. The God#s Oko will inform Ora.
3. What are you writing in believe? You made beautiful statement, will explain Kosa and Oko Bozia will disclose. Oko of god, this is nonsense (empty saying).
4. Again I am calling to Mr. Ora. Oko of god will explain.
5. That I believe, today Oko of God is alive. The question is this. Where is one of ours Oko#s believe Oko of God told
6. Oko explained religion faithfully. We told than renewed and now I am writing. Oko of God, this is nonsense. Are you believing what you been told?
7. Oko of God will explain to Miss. What are you avoiding? Tell me again, where after Median on time will be Mr. God#s?
8. Tell me that he is thinking what he believes. You renewed, announced and praised Kosa.

1998/4/24, posted by Jorge Stolfi

If I got it right, this is John Stojko's proposed decryption,
from basic EVA to vowel-less old Ukranian:

--- eva2ukr.sed ------------------------------------------
# Basic EVA to Ukrainian, per John Stojko and Karl Kluge
# Strings of "e"s are ambiguous, make a guess:


Let's try this mapping on the first paragraph of page f2v:

1 kooiin cheo pchor otaiin odain chor dair shty
2 kcho kchy sho shol qotsho loeees qoty chor daiin
3 otchy chor lshy chol chody shodain chcthy daiin
4 sho cholo cheor chodaiin

We get


Satiate ye no mete pain eating. To soul ye tied love uniting: do like a wise or

saint. Son, are we to wait the acts, we too? That bum --- if cats are uniting, delay!

Tease Nero on tieing the war -- not thee, not, dear awaited love. In no soiree delay!

I wait on thee, to in mating united lay.

1999/9/8, posted by Mark Perakh

Come on, guys. I was sure the Stoyko's claim was a joke. Is Stoyko serious? Even though it is not very relevant, I was born in the Ukraine and Ukrainian is one of my two mother tongues (the other is Russian). The Khazar empire ceased to exist about the ad 950. Ukrainian language separated from the Old Slavic which was also the precursor of Russian and Belorussian, only a couple of centuries later. Most of the territory which now is Eastern Ukraine was a part of Khazar empire, but since the middle of 10th century became the domain of Scandinavian Rus with the capital in Kiev, built on the spot where earlier existed a Khazar rural hamlet. There no chance whatsoever that VMs was written in Ukrainian and contains a correspondence between some fictititous ruler of Ukraine and a Khazar king. It is a historical nonsense. Sorry for being blunt.
posted by ぶらたん at 11:54| Comment(0) | 解読者



1998/3/1, posted by Karl Kluge

Here is the full frequency data for the transformed text using the mapping from Tiltman structures to individual characters given above:

Vowels identified by Sukhotin's algorithm: I J K E A L B G 4
Line Word
Letter Global Initial Final Initial Final
0 5.84927 0.52506 0.89153 1.54647 21.70076
1 7.56907 0.90692 2.00594 2.49814 15.81762
2 1.87020 0.23866 0.14859 1.14498 7.03303
3 2.68105 0.71599 0.37147 1.44238 7.83527
4 0.06866 0.04773 0.07429 0.05948 0.25404
5 0.26484 0.33413 0.00000 0.17844 0.45461
6 0.00000 0.00000 0.00000 0.00000 0.00000
7 0.00000 0.00000 0.00000 0.00000 0.00000
8 0.00000 0.00000 0.00000 0.00000 0.00000
9 0.00000 0.00000 0.00000 0.00000 0.00000
A 3.48210 5.25060 1.26300 4.90706 1.60449
B 2.89031 4.72554 1.04012 4.32714 1.55101
C 0.39235 0.14320 0.00000 0.75836 0.13371
D 0.12424 0.04773 0.07429 0.14870 0.05348
E 4.78339 10.54893 0.89153 14.98885 1.36382
F 0.05231 0.00000 0.00000 0.16357 0.01337
G 1.77538 5.34606 0.14859 4.72862 0.82899
H 0.15694 0.14320 0.00000 0.46097 0.04011
I 16.53752 16.84964 3.26895 20.66914 11.49886
J 7.42194 10.73986 0.52006 8.28253 3.65022
K 11.84241 19.18854 13.15007 14.92937 15.48335
L 3.28592 11.31265 6.53789 3.58364 1.29696
M 4.51855 3.15036 7.20654 2.24535 1.93876
N 9.12539 6.49165 11.73848 7.83643 2.68753
O 0.24849 0.00000 2.52600 0.04461 0.05348
P 2.39333 0.28640 6.61218 0.65428 0.45461
Q 5.25421 1.24105 20.20802 1.56134 1.39056
R 0.14059 0.00000 0.74294 0.08922 0.02674
S 3.48210 1.24105 9.21248 1.47212 1.23011
T 0.40543 0.19093 1.26300 0.16357 0.20056
U 0.12424 0.04773 0.14859 0.07435 0.09360
V 0.00000 0.00000 0.00000 0.00000 0.00000
W 3.21726 0.28640 9.88113 1.01115 1.31034
X 0.02616 0.00000 0.00000 0.01487 0.00000
Y 0.01635 0.00000 0.07429 0.01487 0.00000
Z 0.00000 0.00000 0.00000 0.00000 0.00000

Entropy 4.00770 3.47030 3.59539 3.65581 3.46832
- ------------------------------------------------------
Digraphs whose max frequency global, line initial, etc. > 2.500000%:
line word
global initial final initial final wf/wi
0E 1.4085 0.0000 0.0000 0.0000 0.0000 6.2592
0I 0.7781 0.0513 0.0000 0.0000 0.0321 3.3754
0N 0.6196 0.0000 0.0899 0.0000 0.0160 2.6217
1E 0.6880 0.0000 0.0000 0.0000 0.0000 2.9985
1I 0.9726 0.1540 0.0000 0.2967 0.5451 2.7691
1K 1.1131 0.1027 0.1799 0.3894 1.6194 2.5397
E0 0.7745 1.0267 0.1799 3.1337 3.2227 0.0164
E2 0.6268 1.3347 0.0000 2.5032 2.6615 0.0000
I0 2.4063 2.4641 0.4496 3.8569 9.6841 0.0164
I1 3.8940 3.0287 0.7194 5.8780 9.2512 0.2622
II 1.0410 0.8727 0.1799 0.3894 0.4489 2.9330
IK 2.3558 2.1561 1.3489 2.9112 5.0024 2.4742
J0 1.6282 1.1807 0.0899 2.1509 6.5416 0.0328
J1 2.1181 2.1047 0.5396 2.6516 5.0505 0.0983
KI 1.4805 5.1848 0.2698 0.9271 0.9139 2.8347
KJ 0.6952 2.8747 0.0000 0.4636 0.2565 1.2125
KP 0.6664 2.0534 2.7878 0.8159 0.2245 0.0655
KQ 2.4675 4.4661 12.8597 4.3019 0.8498 0.1147
KS 1.0807 1.3860 4.2266 2.0026 0.4008 0.2458
KW 0.9474 0.5133 3.9568 2.0026 0.5451 0.0655
LN 0.4431 2.8747 0.3597 0.3894 0.0802 0.0164
NK 1.7543 0.6160 2.9676 1.0013 2.2607 0.4752

h2 3.57014 3.23198 2.96356 3.27572 2.79567 3.47274

Any suggestions on how to proceed with testing of this hypotheis regarding the nature of the encoding (and, more to the point, finding the correct mappings of Voynich character combinations to plaintext characters if this is the type of cipher we're dealing with)?
posted by ぶらたん at 23:09| Comment(0) | テキストの性質

C89 ratio

1998/2/9, posted by Denis Mardle

my remark that implied Herbal B1 and B2 were the same language/hand despite the pretty fit to the quires from Karl's work which could still be valid since I was only looking at the Currier O89 to C89 ratio. This ratio is very good at sorting out sets. For instance Herbal B1 has 23.7% O89 ( 41 to 132 ) and Herbal B2 is 20.0% ( 67 to 269 ). These figures fit into the range for the Stars B sets f104r,f105r,f106r,f107r ( see my later "quires ... " ) which are in the 16-26% range. The f103r and f108r sets have only 4.7%, closer to the Bio - B figure of only 0.5%, but I suspect significantly different. Herbal A is very different again with the ratio at 98.5% ( 270 to 4 ). My conclusion is that neither Herbal B1 nor B2 can go with Bio - B and the O89 to C89 ratio test does not split them. I will accept another statistic to show a B1 to B2 significance, but I need to see the figures. The O89 to C89 ratio ( at 98.5% ) will not split Herbal A sets.
posted by ぶらたん at 22:10| Comment(0) | テキストの性質



1997/11/18, posted by Jorge Stolfi

Although Voynichese is fairly repetitive, there are remarkably few repetitions among this list. Here are all I could find (in EVA encoding):

dokor <f100v.T.1;C> {pharma}
dokor <f101v1.R1.1;C> {pharma}

otarar <f100v.T.2;C> {pharma}
otarar <f101v1.R1.3;C> {pharma}

odor <f100v.M.4;C> {pharma}
odor <f101v1.R2.4;C> {pharma}

olkor <f100v.M.2;C> {pharma}
olkor <f101v1.R2.2;C> {pharma}

okal <f67r2.L;Z> {astro, moon & circles; "sun"?}
okal <f72r2.S.12;K> {zodiac, gemini, stars}{was <f72r2.12A;K>}
okal <f72r2.S.14;K> {zodiac, gemini, stars}{was <f72r2.14A;K>}

okaly <f70v2.S1.3;C> {zodiac, pisces, btw outer stars; K:okala}
okaly <f72r2.S.13;K> {zodiac, gemini, stars}{was <f72r2.13A;K>}
okaly <f72r2.S.21;K> {zodiac, gemini, stars}{was <f72r2.21A;K>}

okary <f89v2.m.1;L> {pharma}
okary <f70v2.S1.17;C> {zodiac, pisces, btw outer stars; K:ykary}

okeey.ary <f72r1.S.4;K> {zodiac, taurus dk, stars}{was <f72r1.04A;K>}
okeey.ary <f72r2.S.15;K> {zodiac, gemini, stars}{was <f72r2.15A;K>}

okshdchas <f89r2.b.1;K> {pharma; L:okan-yorain}{was <f89v.45A;K>}
okshdchas <f89r2.m2.6;L> {pharma}

otaiin.otain <f72r1.S.13;K> {zodiac, taurus dk, stars}{was <f72r1.13A;K>}
otaiin <f71v.S1.6;K> {zodiac, taurus lt, btw outer stars}{was <f71v.11A;K>}

otal.ypsharal <f70v1.S.15;K> {zodiac, aries dk, btw inner stars}{was <f70v1.15A;K>}
otal <f72r2.S.1;K> {zodiac, gemini, stars}{was <f72r2.01A;K>}

otalaiin <f70v1.S.2;K> {zodiac, aries dk, btw outer stars}{was <f70v1.02A;K>}
otalaiin <f71v.S2.3;K> {zodiac, taurus lt, btw inner stars}{was <f71v.03A;K>}

otaldy <f67r1.S.1;C> {astro, moon & circles, circle sectors}
otaldy <f88r.m.1;L> {pharma, plant; K:otaly}
otaldy <f70v2.S1.12;K> {zodiac, pisces, btw outer stars}{was <f70v2.26A;K>}

otaly <f88r.m.1;K> {pharma, plant; L:otaldy}{was <f88r.05A;K>}
otaly <f88r.t.6;L> {pharma, plant}
otaly <f70v1.S.10;K> {zodiac, aries dk, btw inner stars}{was <f70v1.10A;K>}
otaly <f70v2.S1.10;K> {zodiac, pisces, btw outer stars}{was <f70v2.24A;K>}

otear.araydy <f70v1.S.5;K> {zodiac, aries dk, btw outer stars}{was <f70v1.05A;K>}
otear <f100r.t.3;K> {pharma}{was <f100r.17A;K>}

otoram <f88r.b.2;L> {pharma, plant; K:otor.am}
otoram <f88v.m.1;K> {pharma; L:otorad}{was <f88v.17A;K>}

The rarity of repetitions is encouraging. Note that some of them seem to be transcription errors. Also, there seem to be relatively few repetitions spanning different sections.

The first four entries of this list are intriguing: { dokor, otarar, odor, olkor } appear on page f100.v and then again on page f101v1, in slightly different order. Could they be transcription errors?

The thrice-repeated label "okal" was identified by Rene as the name of the Sun in Voynichese.
posted by ぶらたん at 21:21| Comment(0) | 植物



1997/10/31, posted by Rene Zandbergen

> Couldn't the differences in vocabulary and style between the
> astronomical and herbal sections be due to the differing
> subject matter?

Very unlikely. Just remember that Herbal-A and Herbal-B have presumably the same subject-matter, yet are very different in style. Or put the other way around: (ob-smiley omitted)

Theorem 1: if the differences are due to subject-matter, then the pictures do not belong with the text.

And this leads to another important theorem:

Theorem 2: if the differences are due to subject-matter, then the person who collated the pages, bound the Ms and numbered it *could not read* the VMs.

Proof: He went by the pictures.

I would favour the 'different dialect' explanation. Note that subject-matter-related differences may still be identified, but I expect them to be of a smaller scale than the A-vs-B differences.
posted by ぶらたん at 22:12| Comment(0) | テキストの性質



In addition, experts at the McCrone Research Institute in Chicago determined that the ink was not added in a later period. The text was likely written in Northern Italy.
- EarthTimes, 03 Dec 2009

1997/10/30, posted by Dennis Stallings
If the VMs originated in northern Italy, then Latin, an Italian dialect, or a French dialect would be good candidates for the underlying language. And all of these are Romance languages. All that should help with statistical analysis.

Incidentally, even at that time French was the lingua franca of the European upper classes; Marco Polo wrote his story in French.

posted by ぶらたん at 22:31| Comment(0) | 書かれた言語


1997/09/19, posted by Jorge Stolfi

I have asked around for the month names in Occitan (a group of languages from Southern France that includes Provençal, Gascon, etc.) I got the following responses
so far:

Gascon(*) Std. Occit.(*) Toulon(**)
-------------- -------------- -------------
genièr genièr genier
heurèr febrièr febrier
març març març
abriu abril abriéu
mai mai mai
junh junh junh
julhet julhet julh[1]
agost agost august[2]
setémer setembre setembre
octòbre octòbre octóbre
navémer novembre novembre
decémer decembre decembre
posted by ぶらたん at 19:09| Comment(0) | カレンダー