2010年12月05日

修正跡

ヴォイニッチには間違いを訂正した跡がないと言われていますが、
少しは変なのあるのよね。

> Likewise the strange thing that ends line 6 of folio 24v,
> which I would write in advanced Frogguy:

> s
>c-lj a 2 A-2

I would say: a correction. The writer forgot the s and inserted it later.
posted by ぶらたん at 18:04| Comment(0) | その他

日本語のエントロピー

1997/2/1, posted by Dennis Stallings

If you analyze Japanese written in Latin characters (romaji), you get a low entropy. This is because of the severe phonotactic constraints of Japanese. It's close to true that a Japanese syllable may begin with zero or one consonant, have one vowel, and end with -n or nothing.

However, Japanese can also be written in hiragana or katakana (syllabic characters), due to the very fact of the severe phonotactic constraints. You have about 26 Latin characters, plus perhaps the long vowels, giving 25-30 characters for romaji. You have 50-60 characters in each kana set (although as I recall the kana don't indicate vowel length).

With romaji I'm sure even the normalized second-order entropy would be low. With kana I'm sure it would be higher. How much higher depends on word frequencies in Japanese and any rules Japanese might have for combining syllables.

1999/7/23, posted by Dennis Stallings

"Understanding the Second-Order Entropies of Voynich Text", May 11, 1998: http://www2.micro-net.com/~ixohoxi/voy/mbpaper.htm

... I struggled with the entropy concept. My reasoning went: Voynichese ought to "mean the same thing" in Frogguy as in Currier, despite the fact that there's a big difference in their entropy profiles. Likewise, Japanese and Hawaiian "say the same thing" whether they are written in phonemic (romaji in Japanese) or syllabic notation (hiragana or katakana in Japanese).

I've finally realize that my reasoning was false. You *can* transmit more information per character by using a larger character set, as with 71 characters for Japanese kana versus 22 characters for romaji. The question is: how well is a character set of a given size being used?

I still think that the comparison of h1 and h2, which I used in my paper, is useful. I also think that one could define the size of the character set by taking the characters that constitute 0.5% or 1.0% and subtracting the number of such characters from the total number of characters. From that number, one could calculate an h0 that would be meaningful.

1999/7/23, posted by Gabriel Landini

There are further problems in Japanese. One thing is "reading" Japanese, and the other is "listening" to Japanese. For example the word "shu" (2 characters in hira/katakana) has many different Kanjis, all with different meanings. So while reading gives you the exact character, the phonetic alphabets do not (and of course the spoken language doesn't either). This is the "context sensitive" aspect of Japanese: how on earth do you know which of the tens of "shu" you are referring to? Answer: because of what comes before and after it. Of course that this does not concern us because we do not know what voynichese should sound like.

1999/8/16, posted by Karl Kluge

My understanding is that cryptographers don't use entropy because it doesn't have clean distributional results unlike the various standard statistics such as Index of Coincidence. Modulo that, while you may not be able to use entropy to determine what the letters, words, or language are, that doesn't mean that given a specific cryptographic hypothesis regarding the alphabet and cipher system that entropy can't serve as a test of such a hypothesis.
posted by ぶらたん at 15:15| Comment(0) | エントロピー

マレー・ポリネシア語系、フィリピン語

ちゅうことで、無頼は英語、フランス語に引き続きフィリピン語(タガログ語)をちょっとだけ勉強中。もしヴォイニッチが暗号でなければ、音の繰り返しとかは、似てる気はするよね。

中世の宣教師が、現地の言葉をアルファベットで書き表そうとした試みだったら
面白いのになぁ。

ちなみに、他の言語だとスペイン語と古典ギリシャ語それぞれ1年習ったことある。ほとんど忘れちゃったけど。

1996/12/10, posted by Bob Richmond

The language appears to have a small number of phonemes. The languages of the Malayo-Polynesian family, as many observers have suggested, are the most likely possibility. The many languages of the Philippines make that area I think a very likely candidate, since the Spanish began extensive colonization there around 1565, with many Roman Catholic priests in isolated outposts.

Imagine then a young priest posted to the Philippines in the late 16th century. He reduced a local language to writing, as was becoming a widespread practice then - he would have known, for instance, about literary Nahuatl (since the Philippines were a province of Mexico!). Isolated, though, he went native, succumbed to the pleasures of the flesh, and kept some sort of record, using his invented alphabet and the local language. Perhaps he simply recorded his amorous doings with his wife - such records can become very repetitious.

ちゅうか、よく考えたら、未知の言語を書き表すのに、わざわざ新しくアルファベットを作り出す必要があるはずもない。

1997/10/31, posted by Rene Zandbergen
When Malay was mentioned in two contexts, I did not realise just how many features Malay written in an Arabic script would have in common with Voynichese. The prefixes and suffixes, the short words, the full-word repetitions, the absense of repeated characters.

1997/10/31, posted by Dennis Stallings
Jacques discussed the Jawi (Arabic) script used for Malay in the Voylist archives. Jawi does represent vowels, but in a complex manner.

However, I think you would see the same thing with Malay even in Latin orthography. And I'm pretty sure Malay would be a low-entropy language. It's in the Malayo-Polynesian group, and visually it looks low-entropy.
posted by ぶらたん at 13:56| Comment(0) | 書かれた言語
HPへ戻る