Sure, that's part of it, but Shannon's notion (which he termed "unicity") is only a starting point. Few (no?) ancient scripts can be treated as simple substitution ciphers, where there is a one-to-one mapping between phonemes and letters. More typically they are many-to-many: witness the at least three readings of the Sumerian "an" (i.e. star) symbol, and the fact that a given syllable in Sumerian can be written in many different ways. Then of course you don't actually know what the symbols represent: do they represent the phonemes? Just some of the phonemes? Syllables? Morphemes? Elements of meaning? Some combination of these? (The latter is the normal situation for ancient writing systems.)

The more you pile on these issues the larger the amount of text needed will be. So Shannon's unicity is a mathematical characterization of an ideal case, one that is more common in codebreaking than in decipherment.

For these and other reasons (e.g., you don't actually *know* the language), computational algorithms to solve the decipherment problem tend to use quite a few more tricks than just this.

Richard Sproat

Google, Inc