2010年07月27日

ヴォニッチの中で読める文字

9 March 1954: Questions for Prof. E. Panofsky as a result of Conference with him and Prof. John von Neumann at Princeton on 9 March 1954.

Q7: What plain text have you found in the VMS? A: folio 70 ff in zodiac signs, f 66, and on f 116 v. On f. 66, R. Salomon says "der mus del" which is the same as "der Mussteil" which means the stock of household goods which cannot be withheld from a mans widow on his death. On f 116v, "so nim geismi[l]ch o", meaning, "... take goats milk, or..."

あとは、別のフォリオでマリアどうたらと解読されている部分もあるが、(そのうち別に投稿します。)これらはヴォイニッチアルファベットで書かれていない部分なので、ヴォイニッチの本文自体で解読された部分は一つもない。
posted by ぶらたん at 23:08| Comment(0) | その他

The Friedman collection

1994/4/8, posted by Jim Reeds

1. General correspondence with Voynichologists, including long series of letter exchanges with Mrs Voynich, Manley, Miss Nill, Newbold, Erwin Panofsky, Father Petersen, John Tiltman, etc. I have only read a fraction of these letters (maybe 100 in all?) but the remind me very much of the sort of letters seen in this electronic forum.

2. A complete bound set of photostats of the VMS, printed, I think, from negatives made by Voynich in the early 1920's. These are labeled with "page numbers" which are the same as the page numbers found in rand.org:/pub/voynich/voynich.orig.

3. A mass of work sheets from the First Study Group, including a transcription of about half-a-dozen pages of the VMS. Included in this is a number of FSG alphabet sheets, some with annotations about Roman letter equivalents, frequency counts, etc. There do not seem to be any papers about the organization, membership, procedures, etc, of the FSG. Many of the work sheets are signed, so it should be possible to make an approximate list of FSG members:

Robert A. Caldwell G. E. McCracken of Drake Univ, Des Moines, Iowa Thomas A. Miller William M. Seaman Fried Mark Rhoades of Arvada, Colorado Frances M. Puckett

and a few others I didn't write down.

4. A mass of loose photostats of VMS pages, which must cover most of the VMS, some duplicates of pages in item 2 above, some copies of plates in books, etc. These might well have been used by the First and/or the Second Study Groups.

5. A printout listing of almost all of the VMS (pages 1 through 230), item 1609 in the collection, which I described last autumn, which Jacques and I are both entering into our computers and which we wish to publish. This is the fruit of the FSG.

6. A printout listing of a somewhat differing transcription of pages 77-113, representing a possibly earlier draft of 5 above.

7. A thin file about the Second Study Group's plans to use RCA's computers in about 1962. Flow charts, RCA memos, data entry procedures. A distribution list has Prescott Currier's name; possibly the other names are the other members of the Second SG.

8. About 32 or so computer printout sheets of a transcription into a Currier-like alphabet, of pages 120-175, presumably the fruit of the SSG.

9. A monster thick book of almost 700 pages of computer printout, a sort of KWIC index of item 8.

10. Papers associated with E. S. Friedman's Washington Post article. A letter somewhere in the collection has WFF telling his correspondent that ESF would write the newspaper story and WFF would write the longer, more scholarly paper for Isis or Speculum. He never did.

11. Folders containing notes and exhibits for talks about the VMS.

12. All of Petersen's Voynichalia, notably his hand copy of the VMS his drawings of the plants his 3 by 5 card index of all Voynich words his notebook index of all Voynich words his notebooks about his study of herbals, medieval science, etc.

13. An interesting item not filed with the Voynich stuff was a file full of drafts, in varying states of revision, of a paper about Trithemius and his reception, by Charles Jastrow Mendelsohn, Friedman's U. Penn friend, who died in 1939. I think Friedman tried to finish the paper, but I do not know if it was published. Isis, I suppose, or (conceivably) Speculum is what CJM and WFF were aiming for, but I don't know if the paper was actually published. It looked good, relied a lot on Heidel's book (**Trithemius Vindicated** is a loosely translated short version of Heidel's book's title). I didn't have time to read the paper all the way through, but will some other time.
posted by ぶらたん at 22:39| Comment(0) | その他

THE DISTRIBUTION OF LETTERS AND IN THE VOYNICH MANUSCRIPT: EVIDENCE FOR A REAL LANGUAGE?

1994/3/24, posted by Jacques Guy

Language AとBのそれぞれの中で、頻度が違うというのも不思議ですよね。
俺なんか単にでっち上げの結果じゃないかと思ったりするんだけどね。

A is characterized by the extreme frequency of letter , which occurs more than eight times as often as in B. B on the other hand is characterized the very high frequency of letter which occurs three times as often as in A. We observe 2393 occurrences of in A, and 3053 of in B. Corpus B being 1.5 times the size of Corpus A, we would expect about 3590 occurrences of if there was an exact, systematic correspondence between and in those environments. The correspondence if not exact, only very strong. But note that letter occurs with the same relative frequency as . It consists of two linked and is considered by Currier and most to be a single letter. It is probably so but the linkage of its two strokes is loose and it may, at least sometimes, be in fact two consecutive occurrences of . In that case the correspondence might yet be exact.

Sukhotin's algorithm identifies letters and as vowels. They resemble our Roman o and e, and Greek omicron and epsilon, and we may hypothesize that they have been used by the Voynich authors with similar phonetic values, perhaps substituted one for the other in a simple cipher. The substitution of e for o is common in natural languages. Thus Standard English and Scots English (lord, laird), Japanese and Japanese teenagers' slang (uso yo, use ye "that's a lie!")

We may, then, have in the Voynich Manuscript either the same language in two different ciphers (very probably simple substitution), or two dialects of the same language. Since the frequency counts above do not show any such strong correspondences for other letter pairs I would incline towards the second hypothesis. William Friedman came to think that the manuscript was not a cipher proper, but a text in an artificial language such as were elaborated by George Dalgarno, or Bishop John Wilkins. It is hardly conceivable that such a language could develop a dialect, as none is known to have been used extensively in writing, let alone spoken. The Voynich Manuscript is perhaps, then, written in a natural language.
posted by ぶらたん at 22:14| Comment(0) | テキストの性質

2010年07月23日

カレンダー追加

1993/09/17, posted by Robert Firth

確かに30日なのだけど、29日があったり、変な数で分割されていたり、真ん中のアルファベットは後から書かれた形跡があったり、注意深く見る必要がありそうですね。

Again, back to the Zodiac. The number of nymphs evidently matters - if you turn from folio to folio, you can see the scribe trying different patterns to fit them in (10+19, 8+16+6, 10+20...), and often having to scrunch them up, or spread them out, near the end of a ring. (By the way, that suggests these folios were not a fair copy, but at least in layout were originals.)

But, what calendar has one month of 29 days (Pisces) and then nine months of thirty days? None at all. Maybe Pisces is a mistake, and all the months have 30 days. But that's the Pharaonic Egyptian calendar, and why would anyone revive it? And why have it beginning with Pisces - the Egyptian year began in our July, with the heliacal rising of Sirius.

So, maybe this is astrological rather than calendric. But, again, why begin the calendar with Pisces and not Aries? That I find very, very odd - because, you see, it's right: the Vernal Equinox does fall in Pisces - about 2 days in at present, and in 1350 about 11 days in. Why, alone of all works of Western astrology, is the Voynich Zodiac true to the stars? Who in that age even remembered the precession of the equinoxes, discovered by Hipparkhos of Nicaea but, as far as I know, forgotten until Tycho de Brahe?

(No, wait a minute, there's a Middle English rime about the Great Year - Graves quotes it in The White Goddess - so maybe this knowledge just went underground.)
And why are Aries and Taurus divided into light and dark halves, each with 15 nymphs, stars, or days?

そして月の名前はフランス語。

Jacques counters this emphatically, i.e. not French but perhaps a Slavionic language, based on the Octember.

1997/09/18, posted by Jacques Guy
We have been through all this several times around. Plus, there is that "Octember" for October, and, a long time ago, someone here said that it was typically Slavic.

1997/09/18, posted by John Grove
For the months I can make out marc, abril, may (the light one has a circumflex accent above the y), octebre, and novebr.. I suppose you could compare them to Russian (a little) Marta, Avril, Maj (a character that does in fact have a line over it in it's Cyrillic equivalent), OctYaBr' & NoYabr' (The Ya is a single character that looks like a backwards R in the Cyrillic Alphabet. However, I think that one could place the spellings into a large number of the European Languages... Mars, Avril, Mai.(French)..... Marzo, Abril, Mayo (Spanish)..... Marco (with a Cedilla), Abril, Maio (Portuguese) etc....
posted by ぶらたん at 23:41| Comment(0) | カレンダー

2010年07月22日

浴槽の裸の女性=吸血鬼エリザベートの犠牲者?

もちろん冗談です。

1992/09/28, posted by Jacques Guy.

I was reading "Crooks, Crime, and Corruption" Octopus Books, London: Hungarian countess Elizabeth Bathory bathed in the blood of her victims because she believed it preserved her beauty. In the black depths of her castle dungeons at Csejthe, the countess stored well-fed girls ready to have their veins cut open and filtered into pipes than ran into a blood bath.

The bathing nymphs are the countess's victims. One of them (the one I reproduced on some tiles of the mahjongg solitaire game), is the countess herself, bathing in blood. The herbal and pharmaceutical jars are eternal youth and beauty recipes. The astrological diagrams have to do with the favourable planetary aspects when herbs are to be collected, and victims bled. The plumbing are the pipes and the filtering system (just *look* at those plumbing folios, and tell me now if you are not feeling that you are slowly losing your grip on your sanity and starting to believe in my theory?)

It's a good job I don't have my Voynich stuff here: just checking up on the date when the Voynich MS was acquired by Rudolph II might deflate my wonderful theory. It is fortunate that I do not have a colour reproduction of the folio where Erzsebet Bathory is shown bathing in blood, the blood might not be red (but would a countess stoop to bathing in anything but *blue* blood?). Accounting to my Larousse encyclopedia, she was discovered in 1610. According to "Crooks, Crime, and Corruption":

Tried for 610 murders, the countess was condemned to be walled up for life in a room from which all light and sound were excluded. In 1614, she expired after three years of this living death.
posted by ぶらたん at 22:55| Comment(0) | その他

Prescott Currier

quoting from the Prescott Currier's letter.

日本軍の暗号解読などを仕事にしていたんですよね。

It occurs to me that some members of your group might like to know a little something about me. So herewith very briefly: AB in Romance Languages; taught myself Japanese and spent several pre-war years and most of the war (ie WWII) working on Japanese cryptosystems. The Navy sent me to Russian Language School in 1945 and I spent the ensuing years in the intelligence business with two tours on the US Embassy in London (1946-48 and 1955-58). Retired in 1963 and took an advanced degree in Comparative Philology at the Univ. of London (School of Slavonic Studies and University College). First became interested in VMS in the late thirties through Billy Friedman and associated myself with the group of IBM officials who, at Friedman's urging, undertook to transcribe and punch up, sort and print some of the Biol. B material (the wives did the transcribing using my alphabet). It wasn't until after I retired from Government service in 1969 however, that I spent serious time working on the VMS - mostly pencil and cross-section paper in an endless variety of frequency counts with time out to ponder and to attempt to interpret the masses of statistics I produced. And the only thing I have to show for it is the little paper that I wrote in the early 70's and presented at the 76 symposium. I stopped working on the MS at that time simply because I couldn't think of anything else to do. I first met John Tiltman in England in 1941. We became close personal friends, as did our families, and maintained our friendship until his death a few years ago.

At this point I am reminded that John Tiltman turned over to the then Director of Kew Gardens much of the Herbal section of the VMS and asked him to identify the illustrations. In reply, as I remember it, he did provide names for a few of the plants and tentative identification of a few others but the great majority he classified as 'compositae' (concocted drawings using elements from several plants).

Now consider, if you will, the situation that must have confronted the scribes who produced the Herbal Section. Two men (initially one) sitting at a table on which sat a stack of vellum sheets filled with phony pictures of non-existent herbs. Assuming that they knew that the illustrations were deliberate fabrications did they make up stories about the nonexistent uses and benefits of non-existent herbs? or, Did they fill the space around the illustrations with a prearranged text having nothing to do with the illustrations? or, Did they fill the space around the illustrations with a well-constructed script having to do with nothing in particular (ie. deliberate, but recoverable nonsense)? My own rather feeble attempt at a comparative analysis of the two texts (A & B) led nowhere. But I am sure that a well-programmed, high powered computer attack might well produce some interesting results. I am planning to write the Beinecke and order a photocopy of the Manuscript some time soon. By the way, my first communication with the Beinecke took place in 1970 when I wrote offering my services when and if they were planning any serious research. They replied that they had no plans now or in the future for any research, so "Thank you but No."

I note with some satisfaction that some of my conclusions have attracted the favorable attention of a couple of the members of your group. I hope that this continues since I still feel that everything I said in the 'little paper' is still valid. I also note that my transcription alphabet is perhaps not suitable for computer use. I trust that final choice will not be overly complicated for all other uses. Which reminds me, I can see no reason why 7/J should not be merged - into J preferably. Even if it turns out that they are in fact two different characters no confusion should result.
posted by ぶらたん at 22:24| Comment(0) | その他

ラベル

俺も以前、ラベルがテキストの本文で使われている場所を全部探してみたのだけど、手がかりにならなかった。

1992/08/13, posted by Karl.Kluge
In my continued quest to look for label matches in an attempt to decide if the text actually means anything, I transcribed the star labels from folio 68r1 and checked for matches in the labels I had previously transcribed and in the D'Imperio transcription. Spaces were allowed after any character, 6/8 confusion was allowed. Here are the results. Make of them what you will.

1992/10/20
I'm still trying to figure out what, if anything, the label corpus (f68r1, f70v2, f72r2, f88r, f100r) stats can tell us. There are some obvious noticeable differences between the labels and the mss as a whole:

4O is line initial 17.9% in the mss, < 1% in the labels
AM is line final 11.5% in the mss, 3.6% in the labels
OF is line initial 2.1% in the mss, 22.2% in the labels
OP is line initial 3.5% in the mss, 33.9% in the labels

The average number of label occurrences per page is noticeably higher in the biological section (127 in 20 pages) than in the herbal (232 in 111 pages).This may be the first real evidence of apparent differences in subject matter being reflected in properties of the text.
posted by ぶらたん at 22:11| Comment(0) | その他

英文のエントロピー

英文のエントロピー
http://www7a.biglobe.ne.jp/~yobology/text/entropy_of_english/entropy_of_english.htm

これらを誰か私に講義してくれないかしら。
バックグラウンドの知識がないから辛い。

Zipf の法則
http://www.cut-the-knot.org/do_you_know/zipfLaw.shtml

http://www.nslij-genetics.org/wli/zipf/

1995/12/7, posted by Andras Kornai
obeying Zipf's law proves anything about the VMS. A classic paper by Mandelbrot shows that text typed by monkeys who use the spacebar just as randomly as the other keys will obey Zipf's law:

"On the theory of word frequencies and on related Markovian models of discourse" by Benoit Mandelbrot


1996/4/1, posted by Adams Douglas

True, but the digraph matrix will be quite different for random text vs.
intelligible text. As anyone who has ever tried to write a program to
generate pronouncable but random passwords knows, purely random text
doesn't cut it. You have to adhere to at least a small set of deterministic
rules to get the output to "read" reasonably like English. You have to
change the rules to make it sound like other languages.

1996/8/7, posted by Jacques Guy

As for the low entropy. It is the second-order entropy that is low. The second-order entropy is a measure of how unpredictable the *next* letter is from the previous one. A low entropy means "oh, yes, I was expecting to see an ... following this ..., and there it is!". The lower the second-order entropy, the more often your expectation will turn out right.


1996/8/7, posted by Dennis Stallings

I understand this. However, this brings up something I've been wondering about. Here and elsewhere, you've given many examples of languages that have low second-order entropies due to restrictive phonotactics and low phonemic inventories (Polynesian languages, Japanese, ancient Basque). However, looking at the various examples of pronounceable Voynich, it seems to me that Voynich has somewhat more consonant contacts and syllables ending on consonants than that. And, of course, the phonemic inventory is uncertain because of the issue of 2+ characters ?= 1 phoneme issue. From my not-very-extensive examination, I wonder if the low second-order entropy of the VMs text isn't due instead to the large number of *longer* strings, say 3-7 characters, that are repeated, and not necessarily to restricted phono- or graphotactics?
posted by ぶらたん at 17:01| Comment(0) | その他

Hand A and B

1992/02/23, posted by jbaez

筆記者AとBは本当にいるのかしら。
筆記者A、Bと言語A、Bは完全に対応しているのかな。

Hand A writes rather large letters, loosely spaced, and is given to fancy flourishes in his gallow letters. Hand B on the other hand :-) writes in a rather cramped way, with smaller letters, and is not so much given to flourishes. The final folios, 103-116, which are the only ones completely devoted to text (unless one counts the pictures of stars, which seem more like paragraph headers than illustrations), are in hand B. (Note: this what it looks like to me, I haven't checked the "official" record, since I want to learn the difference myself). Hand A seems much more eager to leave lots of blank space on the page. All this would seem to portray A as a free-wheeling, expansive fellow and B as a perfectionistic, constipated sort, BUT it seems to be the case that ALL THE NYMPHS OCCUR ON B's part of the text. One especially dramatic instance of this is on the last page, with its mysterious "key" -- I guess this is folio 117v -- which occurs right after the folios completely devoted to text. It was probably written by B, since the writing looks like B's and it occurs after a bunch of B. AND, it has one last little nymph on it!
posted by ぶらたん at 13:04| Comment(0) | テキストの性質

2010年07月21日

Voynich calendar

1992/02/20, posted by Robert Firth

12星座の名前は明らかに後代の手による追加。
カレンダーも一ヶ月が30日で、12ヶ月だとずれてしまうので不明。ここではエジプトのものだと言っているが、本当にそうなのかは確かめてません。

(1) Two of the zodiac emblemata are unusual. As he points out,
Scorpio isn't a scorpion; my guess is a salamander. And
Gemini seems to be a male/female pair, which is wrong: in
classical mythology, they're Castor and Pollux.

(2) The names of the houses are identical in greek and latin,
except that libra ("scales") is zygos ("yoke"). The decans
(three per sign) are, as he suspects, coptic.

(3) If (a big if) the calendar is 12 months of 30 days each, then
it's egyptian, fer sure. That was the calendar Sosigenes took
to Julius Caesar; it was a roman idea to make the months
uneven. However, the starting point is then wrong: the egyptian
religious year began at the winter solstice, with the five
intercalary days; the administrative year began, of course,
with the nile flood.

(4) The semitic month names are from the new babylonian calendar.

Babylonian Hebrew

Nisanu Nisan
Ayaru Iyyar
Simanu Sivan
Dumuzu Tammuz
Abu Av
Uluru
Tashritu Tishri
Arakhsamna Heshvan
Kislimu Kislev
Tebetu Tevet
Shabatu Shevat
Adaru Adar

Nisanu 1 was the new year. However, I feel any correspondence is unlikely, since this was a lunar calendar, with 7 intercalary months added every 19 years in the pattern of the Metonic cycle.
A better guess, I think, would be the lunar/solar calendar
described in the Book of Enoch, which I'll look up and post.

(5) Finally, note that the month names were added by a later hand, and almost certainly by using the western astrological mapping of sign into month. That mapping has been wrong for over two
thousand years, because of the precession of the equinoxes.
It's the obvious guess, but we shouldn't assume it's right.

=================

replied by Jacques Guy.

Is it really a calendar? If so, not ours: the months are all
30 days long. The year starts some time in March, reasonable
enough. What is the language of months? Got me. I posted a
query about it to Linguist@tamvm1. The hand in which the
month names are written is compatible with the 15th-century
garb of the crossbowman. Why aren't there any obvious month
names in Voynichese? The crossbowman covers part of an
inscription. Does that mean that the Voynich is 15th-century
at the latest? I'd say so. Why are April and May split into
two sub-months of 15 days each? Each "day-figure" has a
caption. Names of patron saints? For each "month" Petersen
lists the zodiac sign in Latin, Greek, and Hebrew, and gives
a name to each decan, which seems to me to be Coptic or
Ancient Egyptian (written in Greek letters).

=================

Karl Kluge says: "if this is some sort of idealized calendar with uniform 30 day months...". Not necessarily idealized. The Mayas had a year of 12 months of 30 days, plus 5 or 6 "leap days". That was their civil calendar, the "haab", distinct from their religious calendar, the "tzolkin" of 18 months of 20 days. I vaguely remember that the Egyptians had a calendar of 12 months of 30 days --- vaguely, because all my references are at home. The Romans... what did they have? Anyway, the Voynich "calendar" does not seem to be lunar. If it were, we would have 29 and 30 days alternating pretty regularly. The interesting part is the month names in that European language. "Octember" ought to be a dead giveaway."Yony" is reminiscent of Modern German "Juni", and "jollet" of French "juillet". "Abril", on the other hand, is Spanish!
"Abril", on the other hand, is Spanish!
posted by ぶらたん at 23:55| Comment(0) | カレンダー

クーリエの統計結果に対する意見

1992/01/28, posted by Robert Firth

当然普通の自然言語でも、前後の単語間の相関はあるわけで、ヴォイニッチだけの性質ではないし、むしろヴォイニッチがでたらめの無意味ではない証拠にもなると思う。

1. Letter correlations.

Currier, and now we, have found correlations between the final "letter" of a "word", if that is they are letters and words, and the initial letter of the next word. Granted. However, I have some problems with the interpretation. Currier claims he knows of no language with this feature; I think he's very wrong.

First, note that in many languages (welsh for instance) the phoneme (sound) at the start of a word is modified by the previous word. Some systems of writing reflect this change (modern welsh I believe does), and some do not. Secondly, note that in some languages there are grammatical rules that lead to such correlations. In english the chain of causation runs from right to left (a possibility Currier
overlooks): "a" changes to "an" before a vowel, and possessives in "-y" change to "-ine". Likewise, both french and italian elide heavily, and some writing systems reflect this. Finally, I struggled through enough Dante at one time to know that in italian poetry of the time, endings and beginnings were highly correlated, not because of orthography or grammar, but because of euphony.

So, yes, the statistical patterns exist, and they are real, but I have two problems. (a) are they unusual? - would we not find the same with known european languages, and (b) can we reason back from the effect to the cause, given that there might be many or multiple causes, at very different levels of language.
I am similarly skeptical of preferred initial letters. After all, in latin "qu" is never final, and "x" is never initial, and I'm sure parallels can be found in many languages. So this isn't unusual. Preferred paragraph-initial letters, frankly, impress me even less. In Euclid's "Elements", for example, paragraphs usually begin with "Axiom", "Theorem", "Corollary", "Lemma", ... in other words a very small set. The peculiar letter pattern is a consequence of the peculiar word pattern, and that is a consequence of the style of the author. Nothing can be deduced about the language from this effect.
posted by ぶらたん at 00:08| Comment(0) | テキストの性質

2010年07月20日

EFFECTS OF THE ENDINGS OF ONE "WORD" ON THE BEGINNING OF THE NEXT "WORD"

You remember I mentioned that some "word"-finals have an
obvious and statistically-significant effect on the initial
symbol of a following "word." This is almost exclusively to
be found in "Language" B, and especially in "Biological B"
material.

--------------------End of Quote----------------------------

前の単語の最後の文字が、次の単語の最初の文字に影響を与えているという仮説の1つの答え。

1992/01/28, posted by Jacques Guy

Let me give you an example. Imagine that I were to write a space after every t when I writ e in English, and merge t he rest oft heremain ingwo rdsrat her ran domlyli keI've just beendo ing
right now. Since "th" is a very frequent digraph, you would
observe a strong correlation between words ending with "t" and words beginning with "h". In fact, I would not have to write a space after each and every single "t": a strong tendency to do so
would be enough to bring out the type of pattern observed by
Currier.
posted by ぶらたん at 16:16| Comment(0) | テキストの性質

クラスター分析

クラスター分析
http://aoki2.si.gunma-u.ac.jp/lecture/stats-by-excel/vba/html/clustan.html

クラスター分析
http://www1.tcue.ac.jp/home1/ymiyatagbt/cluster01.pdf

みんなクラスターの変数には何を使ってるんだっけ?
ページ毎の文字またはある単語の頻度でOK?
posted by ぶらたん at 14:29| Comment(0) | その他

Sukhotin's Algorithm

Sukhotin's algorithm to identify vowels

There is an obscure algorithm devised by one Boris Viktorovich Sukhotin, a Russian… linguist? mathematician? cryptographer? let's say researcher, in the early '70s, usually called Sukhotin's Vowel Identification Algorithm, whose initial aim was to quickly attack an alphabetic cyphertext encoded with an unknown substitution cypher. Jacques Guy, a linguist and programmer well-known for his interest in the Voynich manuscript, translated the algorithm from Russian and simplified it but his publishing in Cryptologia didn't help it being more widely known. Besides, his explanation is not very illuminating, so I'll show the algorithm and then try to explain it from first principles.

http://alaska-kamtchatka.blogspot.com/2010/07/sukhotins-algorithm.html

Sukhotin's algorithm.txt

2001/1/19, posted by Jorge Stolfi

The basis of the algorithm is the observation that in most languages Vs and Cs have a tendency to alternate: Vs are mostly surrounded by C's and vice-versa. So the algorithm tries to find a partition of the alphabet in two classes X and Y that maximizes the number of XY and YX pairs, and minimizes the number of XX and YY pairs.

> Did anybody apply that method to VMS, and if yes, were the
> symbols in VMS reliably shown to be either vowels or consonants?

I have tried to do roughly the same thing, by hand and by ad-hoc algorithms. I did find some structure in the aphabet, which is now part of the crust-mantle-core paradigm.

Basically, one can distingush several classes of letters (gallows, benches, dealers, etc.) which have similar digraph statistics; but there doesn't seem to be any simple mapping of those classes to a plausible `vowels and consonants' bipartition. Moreover, although those statistical classes seem clar-cut, and are fairly compatible with the morphological classification of the symbols, if I slightly change the similarity measure, I get a very different set of classes --- also clear-cut and compatible.

But two days ago I noticed another weird thing about the digraph frequencies, with sort of explains that ambiguity (and why Sukhotin's algorithm couldn't possibly work). Stay tuned...
posted by ぶらたん at 00:19| Comment(0) | その他

2010年07月19日

letter frequencies

1991/01/26, posted by Jacques Guy

qyとかin、iinがそれぞれ一文字かもしれないということ。
繰り返しも多いし、それらが一文字だったとすると、情報量はますます減っていく。

Quite true. The trouble is: we really do *not* know what the *true* letters of the Voynich alphabet are. For instance, <4> is almost always followed by <o> (about 90% of the time). Perhaps <4o> is a single Voynich letter. We just don't know. Another example: everybody (including me) believes that Currier's <N> and <M> (my <iv> and <iiv>) are single letters. Well, it seems obvious that they are, but is it true? In a few minutes I shall send an article entitled "pronounceable Voynich". Have a look at it. To make the language pronounceable I hit upon this idea: let Currier's <D> be "u", but consider his <N> as made up of <I> and <D>, and read it "iu", and his <M> as made up of <II> and <D> and read it "nu" (<II> in the Voynich letters, does look a lot like German cursive "n" after all). Of course, I just wanted to make it pronounceable and did not believe one moment at the time that that could anything more than a convenient way of recording things. But it turned out to work so well, and to wipe out so many of the problems I had had in trying to make the Voynich pronounceable that I am starting to wonder: Currier's <D> does look like a "v", "v" and "u" are not distinguished in medieval manuscripts, so...? And then I have found in Bischoff's treatise of Latin paleography an example of Beneventan "n" that looks surprisingly like two Voynich <I>'s. Could it be? Could it be?
posted by ぶらたん at 23:32| Comment(0) | テキストの性質

9=A?

1991/01/24, posted by Jacques Guy

EVAで言うと、dyとdaのことですね。
後で調べたいのでメモ。

Currier has remarked: "Final 89 is very high in language B,
almost non-existent in Language A".

That had me worried: "89" being extremely frequent, it would
mean that we have two very different "dialects" in Languages
A and B.

I split the Voynich file into VOYNICH.A and VOYNICH.B, and
did a frequency count, disregarding spaces but not
end-of-lines:

A B
89 Observed: 446 2844
Expected: 144 387
Ratio: 3.10 7.35

Indeed. But I also noticed that the discrepancies were
reversed for 8a:

A B
8a Observed: 993 768
Expected: 103 223
Ratio: 9.64 3.44

I remembered that, in my article in Cryptologia, I had
hypothesized that <9> could well be a word-final variant of
<a>, <o>, <c>, or <cc>. And that, in one of my postings to
this group, I wrote that Currier's finding that the ending
of one word strongly affects the beginning of the next
suggested to me that spaces between words had only an
aesthetic function. What if <9> were but a variant of <a>?

Let us see:

A B
8[a9] Observed: 1439 3612
Expected: 247 610
Ratio: 5.83 5.92

That confirmed my suspicion. But was there any additional
evidence? Yes. Looking at my frequency tables, I found that
<9> and <a> occurred in nearly perfect mutually exclusive
distribution, conditioned by the following letter.

Here are the statistics for Language A (my transcription
again):

a o i v c x z 2
a - 2 1358 60 12 245 338 7
9 8 222 4 1 713 11 6 80

4 8 9 g q l = #
a 1 7 5 - 3 5 4 -
9 230 425 72 - 302 353 375 63


And for Language B:

a o i v c x z 2
a 1 8 1506 16 22 739 728 7
9 36 790 8 2 757 332 125 140

4 8 9 g q l = #
a 1 8 6 - 5 10 2 -
9 1431 440 101 - 253 315 547 26

<a> occurs before <i>, <v>, <x> and <z>, <9> before other
letters, and at the end of lines (=) and paragraphs (#).
But note that the constraint is considerably relaxed in
Language B before <x> and somewhat before <z>.

This pretty well convinces me that <a> and <9> are two
variants of the same letter conditioned by the shape of the
following letter.

There is another conditioning factor: <9> occurs
line-initially. Here are the statistics:

a 9
Language A 2 149
Language B 8 134

Again, we observe a slight relaxation of this "rule" in
Language B. This makes me think that Author B wrote less
confidently than Author A.
posted by ぶらたん at 23:06| Comment(0) | テキストの性質

クーリエの1976論文より

Currier observed that letter frequencies varied accordingly
to their position in the line, being quite different
line-finally from word-finally, line-initially from
word-initially. He gives an instance of this phenomenon,
based on frequency counts from Herbal A (roughly 6500 words
in 1000 lines):

"word"-initial total frequency
symbols "word"-initially line-initially

<cqpt> 118 3
<coqpt> 212 26
<c'qpt> 24 0
<c'oqpt> 45 10


There is indeed something quite strange going on there. With
an average of 6.5 words or so per line, we should expect to
see about 15% of those word-initials occurring at the
beginning of a line. Thus:

"word"-initial total frequency
symbols "word"-initially line-initially

<cqpt> 118 18
<coqpt> 212 33
<c'qpt> 24 4
<c'oqpt> 45 7

The discrepancies between expected and observed frequencies
do not worry me much, except in the case of <cqpt>: 3
occurrences observed when 18 are expected is enough to catch
my attention too. Consider also that we should expect those
four common word-initials, totalling 399 cases, to occur 61
times or so line-initially. We find them there only 39
times.

Currier also found that "some "word"-finals have an obvious
and statistically-significant effect on the initial symbol
of a following "word." Thus:


"word" beginning with:

is preceded by <4o> <x> <ct>
"word" ending in: or <2> or <c't>

<x> series 13 7 91
<2> series 10 2 68
<v> series 23 0 275
<9> series 592 184 168


Currier comments:

"Words" ending in the <9> sort of symbol, which is very
frequent, are followed about four times as often by
"words" beginning with <4o>. That is a fact, and it
holds true throughout the entire twenty pages of
"Biological B." It's something that has to be considered
by anyone who does any work on the manuscript. These
phenomena are *consistent*, *statistically significant*,
and hold true throughout those areas of text where they
are found. I can think of no linguistic explanation for
this sort of phenomenon, not if we are dealing with
words or phrases, or the syntax of a language where
suffixes are present.
posted by ぶらたん at 22:10| Comment(0) | テキストの性質

2010年07月18日

自動書記、異言(舌がかり)

ヴォイニッチがでっち上げでないとしても、無意味であり、かつ何らかの規則を持っている場合、次のような可能性も言われている。

自動書記は日本ではあまり知られていないが、霊能者による異言は割と身近かも。

1992/1/15, posted by RJB
Automatic writing: basically, writing without attending to the act of writing; may be done in any state from distraction to deep trance. May take the form of barely intelligible scribbles; may develop into well-formed and fluent text, with "personalities" communicating different points of view, etc. Most famous literary case: the raw material of Yeats' _Vision_, which began as automatic writing by Yeats' wife shortly after they were married. (Much has been published, with some facsimile pages, by George Mills Harper, if anyone's curious.)
This sort of thing could conceivably lead to development of a script and "language" -- not unknown in the records of 19th-century mediumship. Famous case of a medium who contacted a "Martian" civilization. Was the book by Flournoy? I forget.

Glossolalia: speaking in tongues. I think Felicitas Goodman has done a study of the linguistic traits of glossolalic productions, shown that they are not random, but follow certain constructive rules (none of which are semantic!). Or was it Erika Bourguignon? I forget. Maybe both. To me, speaking in tongues sounds like the affective component of language without the semantic or syntactic.
posted by ぶらたん at 23:44| Comment(0) | その他

Currier's Language A/Bの特定

posted by Jim Gillogly, 1992/1/10

Currier's rules are:

a) Final 89(dy in EVA) is very high in Language B; almost non-existent in Language A.
b) SOE and SOR(chol and chor in EVA) are very high in A, often repeated; low in B.
c) The symbol groups SAN and SAM(chain and chain in EVA) rarely occur in B; medium frequency in A.
d) Initial SOP(chot in EVA) high in A, rare in B.
e) Initial Q(cth in EVA) very high in A, very low in B.
f) Unattached finals scattered throughout Language B.

I didn't know how to quantify f), so I ignored it for now. For the others, I calculated an ad hoc A Language score as follows:
a) Score 1 if final 89 < 10% of total words
b) Score 1 if SOE and SOR together > 5% of total words
Score 1 more if either SOE or SOR is repeated on the page
c) Score 1 if SAN and SAM together > 2% of total words
d) Score 1 if initial SOP > 2% of total words
e) Score 1 if initial Q > 2% of total words

Here's the output for f1r (page 001):

Page 001
nwords = 215.
Test a) word-final 89 is 3 of 215
Test b) SOE is 10 of 215.
Test b) SOR is 2 of 215.
Test b) SOE repeated: 1.
Test b) SOR repeated: 0.
Test c) SAM or SAN separately: 0.
Test d) Initial SOP 2.
Test e) Initial Q 16.
Overall score for page 001: 5.

言語Aと言語Bには明白な統計的違いがあり、それは筆記者A、Bに基づくというクーリエの主張です。
スコア化してプログラムかければAかBは明らかになりますが、本当に筆記者が二人いるのでしょうか?もしこれが証明されればでたらめの証明ですけどね。
posted by ぶらたん at 17:45| Comment(0) | テキストの性質

ランダムとは

1992/1/10, posted by John Baez

ランダムといえば、全くでたらめに文字を繰り返しているものだけではなく、規則に従ってランダムに並べていけば統計的に意味がありそうなヴォイニッチのような文章ができあがるのではないか?というのが無頼の最終的な目標です。
ある割合で「前-真ん中-後ろ」という組み合わせで単語を作りそれを並べただけじゃないの?それに対して、ヴォイニッチがランダムではないという証拠はいろいろあるので、今後紹介していきますけど。

First, there's not just one kind of random text. Let me illustrate with decimal numbers. A long number in which each digit appears 1/10th of the time, with the probability of it occuring unaffected by what any OTHER digit happens to be, is one kind of random number, which indeed has maximal entropy. We say that the the digits are mutually independent random variables (what ONE happens to be has nothing to do with what ANOTHER is) and identically distributed. But there are other kinds of random numbers, for example, it could be that "2" occurs 15% of the time and "1" occurs 5% of the time, the rest all appear 10% of the time, and the successive digits being mutually indepenent. (Sorry, the real jargon is "stochastically independent".) Here too they are stochastically independent identically distributed random variables but with lower entropy. They would not be identically distributed, however, if ever sixth digit was more likely to be a "7", and they would not be stochastically independent if 6's tended to be followed by 4's.

So in brief there are loads of kinds of texts which would be called random but have different statistical characters, and there are big fat probability theory books on this stuff, a bit of which I know. It's impossible for humans to make lists of stochastically independent identically distributed characters without resorting to spinners or other random number generators.
posted by ぶらたん at 17:08| Comment(0) | その他
HPへ戻る