Using AI to uncover the mystery of an ancient manuscript

Modern scientific methods help decipher language and meaning of ancient manuscript.

Computing scientists at the University of Alberta are using artificial intelligence to decipher an ancient manuscript.

The mysterious text in the 15th-century Voynich manuscript has plagued historians and cryptographers since its discovery in the 19th century. Recently, U of A computing science professor Greg Kondrak, an expert in natural language processing, and graduate student Bradley Hauer used artificial intelligence to decode the ambiguities in human language using the Voynich manuscript as a case study.

Their first step was to address the language of origin, which is enciphered on hundreds of delicate vellum pages with accompanying illustrations.

Kondrak and Hauer used samples of 400 different languages from the "Universal Declaration of Human Rights" to systematically identify the language. They initially hypothesized that the Voynich manuscript was written in Arabic but after running their algorithms, it turned out that the most likely language was Hebrew.

"That was surprising," said Kondrak. "And just saying 'this is Hebrew' is the first step. The next step is how do we decipher it."

Kondrak and Hauer hypothesized the manuscript was created using alphagrams, defining one phrase with another, exemplary of the ambiguities in human language. Assuming that, they tried to come up with an algorithm to decipher that type of scrambled text.

"It turned out that over 80 per cent of the words were in a Hebrew dictionary, but we didn't know if they made sense together," said Kondrak.

After unsuccessfully seeking Hebrew scholars to validate their findings, the scientists turned to Google Translate.

"It came up with a sentence that is grammatical, and you can interpret it," said Kondrak. "'She made recommendations to the priest, man of the house and me and people.' It's a kind of strange sentence to start a manuscript but it definitely makes sense."

Without historians of ancient Hebrew, Kondrak explained, the full meaning of the Voynich manuscript will remain a mystery. He said he is looking forward to applying the algorithms he and Hauer developed to other ancient manuscripts.

An avid language aficionado, Kondrak is renowned for his work with natural language processing, a subset of artificial intelligence defined as helping computers understand human language.

"We use human language to communicate with other humans, but computers don't understand this language, because it's designed for people. There are so many ambiguous meanings that we don't even realize," said Kondrak. "Natural language processing helps computers make sense of human language. Not only do we want to talk to computers in our language because it's easier and more convenient, but also there is a lot of information that exists in the form of written word. Take the internet, for example."

"Decoding Anagrammed Texts Written in an Unknown Language and Script" appeared in Volume 4 of the Transactions of the Association of Computational Linguistics.