Computer analysis, online translator, intelligent guesses crack ancient German code
- 27 October, 2011 03:29
The encrypted initiation rite of an ancient German secret society preoccupied with ophthalmology has fallen to modern language analysis tools including an online automatic translator.
The 19th Century document known as the Copiale Cipher turned out to be just that - a substitution cipher - although a complex one.
The book, named for a plaintext word found in the otherwise enciphered work, was discovered in East Germany after the Cold War. Analysis of the 75,000-character document was carried out by a team of U.S. and Swedish information scientists and linguists led by Kevin Knight of the University of Southern California's Information Sciences Institute.
SLIDESHOW: The encryption quiz
SLIDESHOW: The history of steganography
Using a machine-readable transcript of the first 16 pages of the 105-page handwritten book, they performed statistical analysis of the frequency with which each of the 90 different characters appears and of the characters that precede and follow each character.
They also analyzed the frequency of recurrent character pairs and groupings of three.
The characters consisted of the Roman alphabet, some Greek letters and the rest unique symbols. Some of the Roman letters had variants with a dot over them or umlauts (two dots) or circumflexes (^).
Based on clustering characters that were preceded and followed by similar groups of letters, the researchers found that the Roman letters seemed to fall into a single large cluster. On a hunch, they assumed that since this single alphabet fell into a single group then it carried the meaning of the text and that all the other characters were there for show - to mislead attempts to decipher.
They attacked the transcript with automatic computer attacks that sought to make sense of the jumble in 42 different languages, with German, English and Latin being given preference. It didn't pan out.
So they revised their hunch to consider a homomorphic substitution in which a single letter of the plaintext message can be replaced by more than one character. So, for example, the letter "e" might be replaced by any of the characters "s", "2" or "@".
This helps hide the frequency with which plaintext characters appear, making it more difficult to decipher them. So imagine a letter that accounts for 12% of the characters in a message. If it is represented in the cipher by either of two replacement characters, seeking a single ciphertext character that accounts for 12% of the total characters won't reveal the plaintext letter.
An automatic computer attack that assumed a homomorphic cypher failed to decrypt the text, but it did indicate numerically that German might be the underlying language. Given that the text was found in Germany and that it ends with the plaintext "Philipp 1866" using the German spelling with two ps, they focused on German as the most likely language.
With that assumption, they drew up a frequency table of letter pairs. In German, the most common pair - or digraph - is CH. They substituted them for the most frequently appearing pair in the ciphertext. Then they looked at trigraphs that start with CH and end with the letter most commonly following CH, T. They substituted T for the character in ciphertext that most commonly followed the two symbols that they had replaced with C and H.
Proceeding like this they came up with some plaintext that included German words "ceremonie" and "der" separated by a character that had yet to be assigned a substitution value. The separation character was a Roman letter. Combined with other instances, they concluded that the Roman letters weren't there to carry the message; they were there as null characters to separate words but also to inject complexity into deciphering the text.
They ran the text as they had deciphered it so far through a German-to-English translator at www.freetranslation.com to see how much of the text represented actual German words. They found many words that weren't directly translatable, but that came close to being actual German words.
For example, they found "abschnitl", which is not a German word, but realized that "abschnitt" is. They concluded that the final "l", which was represented by a colon (:), was actually meant to double the preceding character.
Similarly, they translated several non-word groupings that included a character that looks like a cross, one of them being gesell+aflt. Replacing the + with three letters - sch - yielded the German word gessellschaflt. Replacing + with sch in other instances also yielded German words.
They finished off by correcting their translation to fit proper German usage, which fleshed out their table of substitutions.
Still, the functions of eight characters were undiscovered. All of them were written larger than all the other characters, and the researchers conclude they represent logograms, standing for the names of individuals and organizations.
The text includes initiation rites for new members as well as frequent references to eye surgery. Below is the start of the text as it has been translated so far, with 2, @, #, 9 and *representing the still undeciphered logograms:
of the 2 e @
Secret teachings for apprentices.
If the safety of the # is guaranteed, and the # is
opened by the chief 9, by putting on his hat, the
candidate is fetched from another room by the
younger doorman and by the hand is led in and to the
table of the chief 9, who asks him:
First, if he desires to become 2.
Secondly, if he submits to the rules of the @ and
without rebelliousness suffer through the time of
Thirdly, be silent about the * of the @ and
furthermore be willing to offer himself to volunteer
in the most committed way.
The candidate answers yes.
Read more about wide area network in Network World's Wide Area Network section.