In collaboration with Google's DeepMind, researchers at the Faculty of Classics at the University of Oxford compared Pythia's ability to restore ancient text with that of ancient historians.
In the experimental evaluation, Pythia has successfully outdone the ancient historians in restoring ancient Greek text. The ancient historians achieved a character error rate of 57.1%, which pales in comparison to Pythia's 30.1%. Moreover, the deep learning AI accomplished the restoration in a matter of seconds, while historians took 2 hours.
Pythia is a type of AI algorithm trained by Yannis Assael and his colleagues at DeepMind to assist historians in deciphering eroded texts in Ancient Greek inscriptions.
Using deep learning technology, it recovers missing characters from a damaged ancient text by predicting character sequences and providing a confidence level for each prediction. It offers a fully automated assistance to the once painstaking restoration task.
According to their report, the researchers converted the most extensive digitally available corpus of ancient Greek texts from the Packard Humanities Institute (PHI Greek Inscriptions) and converted them into a machine-actionable text called PHI-ML containing 3.2 million words.
All core characters, along with numbers, accentuations, punctuation marks, and spaces, are standardized into the Ancient Greek alphabet. Additionally, the researchers introduced two characters: a hyphen ('-') to represent a missing character and a question mark ('?') to signify a character to be predicted.
Pythia's neural networks are designed to consider all possible predictions in character- and word-level. Furthermore, it includes contextual information when considering its top 20 predictions.
Contextual information comes in different forms, and it highly influences the success of the restoration. The layout and shape of the inscription, grammatical and linguistic considerations, textual parallels, and historical text are some examples of context information.
In its architecture, Pythia uses contextual length as its contextual information. The length acts as an augmented context in its predictions. Pythia showed a positive correlation between contextual length and the accuracy of predictions. Therefore, Pythia has difficulty restoring texts with only 20 Ancient Greek characters, but its performance peaks at 500 characters.
"It's all about how we can help the experts," says Assael in an interview.
Even if Pythia is highly accurate in predicting missing characters in Ancient Greek inscriptions, it is most valuable when used as a collaborative tool. Pythia can decipher the text, provide the top 20 suggestions, then have a historian pick the most accurate answer using their subject knowledge and expertise.
The use of Pythia removes the burden of deciphering inscriptions from historians and epigraphists. It widens their scope of study and lets them focus on the more substantial study of the deciphered text. The researchers at DeepMind and the University of Oxford are enthusiastic about the possible impact of Pythia, stating that "the reward is huge because it tells us about almost every aspect of the religion, social and economic life of the ancient world."
Pythia applies to all disciplines dealing with ancient texts, including philology, papyrology, and codicology. It applies to any language, be it ancient or modern.
Pythia, a Python notebook, and PHI-ML's processing pipeline are available as an open source at Github.