Loading ...
Sorry, an error occurred while loading the content.

15361Re: [ANE-2] Re: Archaeological decipherment

Expand Messages
  • Robert M Whiting
    Apr 1, 2014
      On Mon, 31 Mar 2014, Miguel Valério wrote:

      > I suspect that no single figure works for any undeciphered script. It is a
      > matter of quality, not quantity. As John Chadwick ("The Decipherment of
      > Linear B", p. 26) said: "How much is needed depends upon the nature of the
      > problem to be solved, the character of the material, and so forth."

      The amount of text required depends on the redundancy characteristics of
      the language and the number of characters in the script. Redundancy is a
      concept that grew out of Claude Shannon's work on information theory and
      refers to the amount of repetitiveness in a language. Another way of
      putting this is that redundancy is a measure of how much of a message can
      be lost and still be understandable. Redundancy is what makes it possible
      to understand someone who speaks with a foreign accent or to understand a
      written message when some of the letters, or even words, are obliterated.
      In information/communication theory, anything that interferes with or
      distorts the message is "noise". Redundancy helps the system overcome
      noise.

      A known language written in an unknown script is essentially a simple
      substitution cipher, the easiest kind of cryptogram to decipher. For a
      given message, all simple substitution ciphers of that message are
      equivalent, meaning that the cipher system is irrelevant. It doesn't
      matter if it is letters, numbers, lines and dots, little stick men, or
      ancient symbols. The cipher alphabet is simply "noise" that distorts the
      message and if you know the redundancy features of the language and have a
      sufficient amount of text, it can be deciphered.

      According to Shannon, for a simple substitution cipher the amount of text
      needed for a unique solution is log(N!)/R where N is the number of
      characters in the script and R is a digital index of the redundancy of the
      language. Again according to Shannon, the redundancy index for English is
      about 0.7 which means that for English the amount of text needed to
      provide a unique solution to a simple substitution cipher is the logarithm
      to the base 10 of 26! divided by 0.7 or approximately 30. That means that
      any simple substitution cipher of an English text should yield a unique
      solution if it is more than 30 characters long. To test this, here is a
      simple cryptogram that contains about twice as many characters as needed
      for a uniqe solution: RFD PIY G UFY ZDHYJIH VWYJ G QWTS VFHS GTS G PDT
      YJGT VWYJ G QWTS VFHS GUFTI. -GU KGBFTI. Anyone familiar with English
      should be able to decipher this in a few minutes.

      Redundancy features include letter frequency, letter combination (digraph)
      frequency, positional letter frequency, word frequency, permissible word
      shapes, and so on. Computational algorithms for decipherment use these
      features by comparing frequencies in the language with frequencies in the
      script and then looking for permissible words. Whether the permissible
      words make sense or not is still something that the human mind does better
      than the computer.

      For Shannon's work on communication theory and secrecy systems in
      communication, see

      http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

      and

      http://pages.cs.wisc.edu/~rist/642-fall-2012/shannon-secrecy.pdf

      Bob Whiting
      whiting@...
    • Show all 9 messages in this topic