## 15361Re: [ANE-2] Re: Archaeological decipherment

Expand Messages
• Apr 1, 2014
On Mon, 31 Mar 2014, Miguel Valério wrote:

> I suspect that no single figure works for any undeciphered script. It is a
> matter of quality, not quantity. As John Chadwick ("The Decipherment of
> Linear B", p. 26) said: "How much is needed depends upon the nature of the
> problem to be solved, the character of the material, and so forth."

The amount of text required depends on the redundancy characteristics of
the language and the number of characters in the script. Redundancy is a
concept that grew out of Claude Shannon's work on information theory and
refers to the amount of repetitiveness in a language. Another way of
putting this is that redundancy is a measure of how much of a message can
be lost and still be understandable. Redundancy is what makes it possible
to understand someone who speaks with a foreign accent or to understand a
written message when some of the letters, or even words, are obliterated.
In information/communication theory, anything that interferes with or
distorts the message is "noise". Redundancy helps the system overcome
noise.

A known language written in an unknown script is essentially a simple
substitution cipher, the easiest kind of cryptogram to decipher. For a
given message, all simple substitution ciphers of that message are
equivalent, meaning that the cipher system is irrelevant. It doesn't
matter if it is letters, numbers, lines and dots, little stick men, or
ancient symbols. The cipher alphabet is simply "noise" that distorts the
message and if you know the redundancy features of the language and have a
sufficient amount of text, it can be deciphered.

According to Shannon, for a simple substitution cipher the amount of text
needed for a unique solution is log(N!)/R where N is the number of
characters in the script and R is a digital index of the redundancy of the
language. Again according to Shannon, the redundancy index for English is
about 0.7 which means that for English the amount of text needed to
provide a unique solution to a simple substitution cipher is the logarithm
to the base 10 of 26! divided by 0.7 or approximately 30. That means that
any simple substitution cipher of an English text should yield a unique
solution if it is more than 30 characters long. To test this, here is a
simple cryptogram that contains about twice as many characters as needed
for a uniqe solution: RFD PIY G UFY ZDHYJIH VWYJ G QWTS VFHS GTS G PDT
YJGT VWYJ G QWTS VFHS GUFTI. -GU KGBFTI. Anyone familiar with English
should be able to decipher this in a few minutes.

Redundancy features include letter frequency, letter combination (digraph)
frequency, positional letter frequency, word frequency, permissible word
shapes, and so on. Computational algorithms for decipherment use these
features by comparing frequencies in the language with frequencies in the
script and then looking for permissible words. Whether the permissible
words make sense or not is still something that the human mind does better
than the computer.

For Shannon's work on communication theory and secrecy systems in
communication, see

http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

and

http://pages.cs.wisc.edu/~rist/642-fall-2012/shannon-secrecy.pdf

Bob Whiting
whiting@...
• Show all 9 messages in this topic