How to Decipher

EXTRACTION

He’s using a polymorphic engine to mutate the code. Whenever I try to gain access, it changes. It’s like solving a Rubik’s cube that’s fighting back.

—Q [the nerd], in Skyfall (2012)

The Bond villain in Skyfall is a cyberterrorist (an unrecognizable Javier Bardem) who uses computers to destroy the British Secret Intelligence Service, find his way to M (Judi Dench), and take her out. (Spoiler: it doesn’t end well.) Some of the technology described in the film truly exists, such as the polymorphic engine in the quote above, which transforms a program into a subsequent version that still operates with the same algorithmic functionality: for example, 3 + 1 and 6 – 2 give us the same result, but they get there using a different code. So far, so good. Problem is, in the film, through the usual “Bondian” sorcery, they use polymorphic code to somehow build a map of the entire London underground.

As interesting as they are, these things have little to do with deciphering ancient writing systems. Cryptography (in the sense of “encryption”) is concerned with encoding messages by converting clear information into an unintelligible form. The code is intentionally encrypted, to mask communication. Its purpose is to maintain the secret. Ancient scripts, on the other hand, have no such aim to conceal (with the exception, perhaps, of the Voynich Manuscript): if we still can’t figure out how to read them, there’s no one to blame but the randomness and whim of the past, the gaps in history.

The objective, however, is nearly the same in both cases. And it’s called “extraction.” Extraction does two things: it decodes the message and it establishes the plausibility of interpreting it. Extraction is a means of both arriving at the code and checking its validity. The decipherer, therefore, must reconstruct the relationship between sign and sound, then verify these correspondences. And then, if they’re lucky, identify the language.

But extraction can’t do the job on its own. Every case must be calibrated using another factor, which helps us reconstruct the script’s context: this factor is known as “situation,” and it comprises the participants (those who invent, use, and read the script in question), the relationships between them, their environment, the how, where, and why a script is used. The stage, in other words, upon which a script performs.

Deciphering the “situation” is no easy task. We must enter the brains of the ancients, with all their conventions and decisions. We must pretend to be actors on the same historical stage, able to follow the same cues, choreography, movements, and intentions of the original actors. We must train ourselves to understand, to mimic, and to bring their thoughts back to life. Extremely difficult, but not impossible.

But before we make it to the dress rehearsal, that moment when extraction and situation come together, there’s a bit of road to travel.

FIVE EASY PIECES

The five easy pieces in the film Five Easy Pieces, starring (an irresistible) Jack Nicholson, are anything but easy: a fantasy and a prelude by Chopin, a suite by Bach, and a concerto and a fantasy by Mozart. The protagonist, Robert Eroica Dupea, is an ex–piano prodigy who’s abandoned his aspirations and set off on another, sadder path. The film is not about music, it’s about the difficulty of accepting how hard it is to accept your life. So much simpler to give in to frustration, resignation, and rage, to throw in the towel. Our five pieces of decipherment are even easier than the piano pieces in the film, but they teach us a similar lesson.

Decipherment, too, can be divided into five pieces, each representing a step in a well-defined analytical chain, kind of like IKEA furniture. If a step is missing in the “assembly” guide, or if the pieces don’t fit together, no decipherment is possible. We’ll use the undeciphered Script X as our fake specimen, a lab rat to test our analysis, which is technical and complex. Though not so different, in the end, from putting together a cheap Swedish bookshelf.

STEP 1. INVENTORY OF SIGNS. First we examine the inscriptions in Script X, then we gather all the signs and build an inventory, a repertoire. Let’s call it an alphabet, though it could just as well be a syllabary (probable), with a series—perhaps extensive, perhaps not—of logograms (highly probable). Once we’ve built our repertoire, we’ll know from the number alone whether it’s one or the other, or yet another still. If there are more than fifty signs, Script X is without doubt a syllabary. The syllabary with the fewest signs of all is the Canadian Aboriginal script Cree (45), followed by the classical Cypriot syllabary (55). Once the number of signs gets up into the hundreds, we know we’re dealing with a complex syllabary, likely accompanied by a barbaric throng of logograms.

I know what you’re thinking: this first step is way too easy. A C-major scale on the piano, something any four-year-old could master in a flash. Unfortunately, that’s not the case. Off the top of my head I can think of three undeciphered scripts that are still stuck on this first step: Easter Island’s Rongorongo, Cretan Hieroglyphic, and Cypro-Minoan. The nature of the problem varies in each case. In Rongorongo, many signs are extremely similar: Do they indicate different sounds or are they merely graphic variations? It’s the age-old problem of allographs—signs that vary only minimally in how they’re written. If I’m writing the letter R, I can alter it a little and make it an , though it still records the same sound, /r/. Our eyes are well trained, but if you fall out of the habit it can become difficult to spot the difference. With Cretan hieroglyphs, there’s another problem: icons. At what point does an icon cease to be “art” and become “sign”? “Almost immediately” is the short answer, though there’s no consensus.

STEP 2: POSITIONAL FREQUENCY OF SIGNS. The second step comes off as difficult, though it might be easier than the first. Once we’ve established the inventory, we must determine how the signs are distributed within the sequences (“words”). To do so, we must first determine if the words are separate from one another. For example, in the Old Persian of Persepolis that led to the decipherment of cuneiform, the words were clearly divided by a vertical wedge. Same for Cypro-Minoan and Linear A, where a vertical line was used to indicate separation. But this isn’t always the case: archaic Greek inscriptions, along with those in Latin, use a continuous script.

There’s also one important mini-step when dealing with positional frequency: in an open syllabic system (the most frequent typology, with a consonant + vowel pattern, abbreviated as CV), if a sign is always in the initial position, it is very likely to be a single vowel. When we break the word A-VE-NUE into syllables, we always isolate the A at the front. There’s no other way to write it, if it’s a CV pattern.

STEP 3. GRAMMATICAL PATTERNS. The third step is the one so masterfully put into practice by Alice Kober in her study of Linear B. Alice analyzed the words and broke them down. She sought out the root of each word. She studied how their suffixes or endings behaved. She looked for repetitions, testing for consistent patterns. She disassembled and reassembled, mapping out the language’s internal structure. She performed a surgical dissection.

Our INSCRIBE group is busy applying a similar analysis to Cypro-Minoan. And I’ll say it with full transparency: we’re ravenously copying Alice’s logical approach, step for step. We’re having less luck, since our repetition patterns are less frequent and less clear. We have fewer data (in this case, quantity would most certainly up our quality), though despite this lack we’ve managed to pinpoint a substantial number of proper names in the inscriptions. In other words, we know the Cypriots Tom, Dick, and Harry, and we have an idea about what they did with their lives.

STEP 4. TYPOLOGICAL CONCATENATIONS (“NETWORK ANALYSIS”). The fourth step revolves entirely around the archaeological context. If several inscriptions in Script X are found in a certain context (for example, a sanctuary, meaning they’re of a religious or votive nature) and record sequences that are also found in other contexts, or are found on objects of a different typology (for example, objects related to administration), we can trace a logical connection between the two and, if we’re lucky, determine the nature of the texts. Are there names of people? Of places? Are there repeated logograms? Are there numerals used to specify quantities?

In this step, archaeology is wedded to epigraphy and the study of inscriptions. This allows us to view these texts in the macro-context of their usage, to understand what purpose they might have once served. It’s like playing arpeggios on the violin. Harmonies arise naturally from the concatenation of sounds.

STEP 5. COMMON FACTORS WITH OTHER RELATED SCRIPTS. Our final step, speaking broadly at least. This one can’t always be put to good use, since, as we’ve seen, not all scripts fall into a tight-knit group; a few stray dogs wander from the pack, and finding a place for them among the ranks is nearly impossible, unless you force it. If a script is isolated, all we can do is confirm that fact, and study it as such.

But the others, those that travel as a family, all decked out with similar signs, can be studied as a pack. First of all, we can determine whether there are derivations, adaptations, differences, and similarities in how the signs appear. For example, our Script X could be derived from Script Y, but with a number of additional signs. Which, on a purely theoretical level, might mean that X represents a different language from Y. Though that would remain to be proved.

Linear B derives from Linear A, and they have nearly 75 percent of their signs in common. Applying phonetic values from Linear B to the identical signs in Linear A proves only one thing to us clearly (and I mean strikingly clearly, no matter what anyone else tells you): the language behind Linear A is not Ancient Greek. To be able to read 75 percent of Linear A and still not figure out which language it records is a fate worse than Tantalus’s. At least we can come away with some kind of reading, however approximate, however incomplete.

With Cypro-Minoan, thanks to one of INSCRIBE’s researchers, we’re now able to trace its lineage directly back to Linear A, sign by sign (which before had not been established). Today we can say with a certain level of confidence that Cypro-Minoan and Linear B are stepbrothers. Our reconstruction of this lineage has even helped us to read Cypro-Minoan, to an extent. It’s helped us move closer, in other words, to the final step, the height of decipherment: assigning sounds to signs.

AND NOW FOR THE SIXTH

I warned you, our five pieces are by no means easy. And you haven’t even seen the sixth, the most difficult—not so much to understand, but to execute successfully. Which, after all, makes it more my problem than yours. In piano player terms, this sixth step is Beethoven’s Op. 101, Chopin’s Grande polonaise brillante, Liszt’s Transcendental Études. You get the gist. And this nod to music isn’t just some pretentious comparison I’ve trotted out to make myself look more intellectual. The sixth step is indeed music, since it allows us to hear the language behind Script X. For our sixth piece, we attempt to apply phonetic values.

I mentioned Michael Ventris a few pages back. Alice Kober had made it through the first five steps, but she never conquered the last. Unfortunately, she stopped right at the most glorious moment, without ever seeing the fruit of her splendid labor. Though who knows what suspicions or ink lings she may have had, tucked carefully away beneath her objective lab coat. Perhaps her intuition was prodding her all along:* “Come on, Alice, it has to be Greek.” We’ll never know.

What we do know is that Ventris picked up Alice’s baton and made a dash for the finish line. With Kober’s syllable grids, he could see which signs shared the same consonants and which the same vowels. I don’t mean to imply that from there the decipherment was a walk in the park, quite the opposite: Ventris constructed a series of experimental grids, with hypotheses and tests for potential phonetic values. And I should remind you that up to this point Ventris was still convinced that the language behind Linear B was Etruscan—meaning that he (even he!) had broken the third commandment.

But the scientific method never lies. Organizing the vowels wasn’t hard: we’ve already seen that, in an open syllabic system (which Linear B was already known to be), a vowel in the initial position is isolated. Finding a spot for the a was relatively basic: it’s the most frequent. Ventris gave it a shot. And fortune kissed him smack on the forehead, since his CV hypotheses included the syllables naniso, etc. (all still in a hypothetical phase). Ventris studied the repetitions among the Cretan tablets in the palace of Knossos and found a few of these syllables grouped together: a sequence with a-?-ni-? Amnisos was the port of Knossos, noted as Amnisos even in classical Greek texts. Perhaps the sequence was a-mi-ni-so? And this sequence was often found together with the word ?-no-so.

Wait up. Moment of suspense.

Do you see it now, too?

Ko-no-so. Knossos. Bingo.

I like to imagine the expression on Ventris’s face when he recognized the names of two Cretan cities, and then realized that ko-no-so also appeared as a feminine adjective, ko-no-si-ja, and a masculine adjective, ko-no-si-jo. The language was inflected—very Indo-European. The language was Greek. Five hundred years before the Greeks’ arrival, but Greek nonetheless.

Ventris was able to extract the sounds of Greek almost without need of confirmation from other languages or scripts.* He had an ample number of texts (nearly three thousand) and, more important, he had Alice’s preparatory work, with her innumerable inflection diagrams. Kober’s hat-trick “triplets,” and her elegant assist, set Ventris up to score an epic goal (there’s never a bad time for a soccer metaphor).

Though we’re not in such a bad position ourselves, with Cypro-Minoan. To prop ourselves up, we’re leaning on one particular category that’s (relatively) simple to identify: proper nouns. Just like Michael Ventris. Just like Thomas Young, who identified the names in the Rosetta Stone’s cartouches even before Champollion.* Just like Georg Grotefend with the Old Persian cuneiform script.* And so far we’ve managed to identify a good many. I’ve stretched our story out with this detour into the sensational decipherment of Linear B, but only to show just how indispensable statistical analysis and the constant testing of new hypotheses can be when you don’t have the aid of related scripts. We have Linear A as a parallel, and can also turn to the later example of the classical Cypriot syllabary. Cypro-Minoan is stuck in the middle, sandwiched between these two reasonably legible scripts. Our goal is to get a better sense of what the whole burger tastes like.

We’re still in the kitchen, in the prep phase. But we have all the ingredients. In fact, it’s only in recent years that we’ve made such giant leaps in our understanding of Cypro-Minoan. We’re now equipped with a definitive inventory of signs, two thirds of which we can already read, and an outline of the script’s internal structure (its grammar, that is). We still haven’t figured out which language is behind it, and so we’re not yet able to connect the dots. But our hope of living the “decipherer’s dream,” of taking a ride on that blessed bicycle, is no longer so far-fetched.

EX MACHINA

BEN KINGSLEY: The world isn’t run by weapons anymore, or energy, or money. It’s run by little ones and zeroes, little bits of data. It’s all just electrons.

ROBERT REDFORD: I don’t care. [He walks away]

Sneakers (1992)

The world is run on codes and cyphers, John. From the million-pound security system at the bank, to the PIN machine you took exception to. Cryptography inhabits our every waking moment. But it’s all computer generated, electronic codes, electronic cyphering methods. This is different. This is an ancient device. Modern code-breaking methods won’t unravel it.

—Benedict Cumberbatch, in Sherlock Holmes,

“The Blind Banker” (2010)

In the central courtyard at the CIA’s headquarters in Langley, there’s an S-shaped sculpture that’s much more than a work of art. On its surface, an artist (in collaboration with a cryptography expert) engraved four encrypted messages. Three of these have been deciphered—the fourth remains a mystery. The sculpture is called Kryptos, and the challenge of deciphering its messages has drawn a wide range of curious decoders, including the National Security Agency and legions of computer whizzes. And yet, while many decipherers have indeed relied on their computers, others have cracked the codes with plain old pen and paper.

Does that make the latter better than the former? Do they get brownie points for having done it on paper? The answer is no. Don’t be deceived by the storied image of the brilliant decipherer, the Renaissance man and expert linguist who does everything by hand. These days, the study of writing, and of decipherment, is vastly different.

It has become a cooperative field, with no more room for prophets. The mantra today is synergy. Not only of group action but of thought: epigraphists, archaeologists, anthropologists, geomatics engineers, historians, cognitivists, semiologists, and computer scientists. And linguists. Perhaps linguists above all. But that hardly matters. What matters is their united effort.

It may well be that my characterization here is more of a manifesto than an objective description of what the field looks like today. Academics, especially in the humanities, are often trapped in their own intellectual bubbles. But a field like the study of writing lends itself perfectly to a more global approach, to open-minded scholarship that sees beyond disciplinary borders.

For anyone working in the field of decipherment, it’s true, there’s no replacing the “traditional” method, blending paleography, archaeology, and linguistics. However, it would be very shortsighted of us to write off other potentially useful approaches a priori. I present to you the bugbear of all traditional philologists. I present to you deep learning.

We’ve already looked at how machine-based methods were used in the study of the Indus Valley Script (IVS) and the Voynich Manuscript, to uncertain effect. The culprit here, I’d guess, was a lazy reliance on computers and computers only, with no regard for our mantra above, synergy: as if technology could close the case on its own. Without the eye of the humanist, you don’t stand a chance at decipherment.

Putting the pieces together, completing the puzzle of an undeciphered script, demands a combination of two powerful forces: the force of logic, and the force of creativity and flexibility.

And when it comes to this analytical plasticity, computers are far outmatched by man. They do, however, offer some advantages. With INSCRIBE we’ve set ourselves the goal of reconstructing the entire development of the writing systems over time, from Cretan Hieroglyphic to Linear A, Linear B, Cypro-Minoan, and the classical Cypriot Syllabary. These scripts resemble one another, as we’ve seen, but there are also differences, both on a paleographic level (that is, in how the signs are written) and on a linguistic level. Two of these systems, Linear B and classical Cypriot, have been deciphered. They are clear and legible, and they record a Greek dialect.

Our reconstruction is multifaceted. It’s archaeological, meaning it examines the context, reconstructs how the script was used, explaining it on a macro level. But it’s also paleographic, concerned with the shape of the signs, their development, their differences. And it’s also linguistic, seeking to understand which sounds, if any, are being recorded, applying the methods of decipherment to scripts that remain mysterious. And it’s also anthropological: we want to know why these scripts came about in the first place. No one before has attempted to view the Aegean scripts from such a broad perspective. So specialized are we, us researchers, so fixed on the details, and so complex are the scripts that we’re working on, that adopting such an all-encompassing view is a challenge. Which is precisely why we need a diverse team, driven by synergy, all working together to assemble the pieces.

I began to talk about computers and then I abruptly pushed them aside. I did this because it’s important to under stand that the traditional method comes before all else. With deep-learning strategies, however, we are able to do something that up until a few years ago was un-thinkable: we can now take control over what we choose to do manually.

In the last five to ten years, deep-learning algorithms have proved extremely effective at detecting similar patterns in different entities or realms. For example, a computer can be programmed to recognize the category “dog” from different dog breeds. It can learn facial recognition and optical character recognition. It can verify signatures. The possibilities are endless, but the crucial function here is disambiguation, understanding if and when like goes with like.

When it comes to the Aegean scripts, deep learning can help us in two fundamental ways:

  • It can act as co-pilot in our complete reconstruction of the Aegean family of scripts.
  • It can verify whether we’re properly grouping like with like via the traditional method. Remember that age-old problem of allographs, when reconstructing an inventory of signs (step 1 in our decipherment method)? Deep learning helps keep things straight.

Very well. Now I’ll add a third aim, as long as we’re being ambitious. We want to find out if there are any patterns of morphological variation. A mouthful, I know, but what I mean is we need to re-create the internal grammatical structure of all the Aegean scripts, which will in turn help us to understand their linguistic affinities.

There’s one little problem, of course: the dimensions, the big data, as you already know by now. Our data is small. The Aegean scripts appear in very few texts. In all, we have around ten thousand signs—and when I say “in all,” I mean every single sign in every single inscription. Still, the neural-network experts in INSCRIBE have seen worse. We’re working on it. We’ll get there.

Our eye, though human and fallible, will never be replaced by computers. The computer is no deus ex machina, although in the cockpit there’s a pilot, and there’s a co-pilot. And then there’s the crew. We already passed along the Jetway, a few pages back. We buckled our seat belts. Now it’s time to take off.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!