Game of Thrones: Why Language Technology cannot handle it (Yet)

The evaluation researchers from Vrije Universiteit Amsterdam and the Humanities Cluster of Dutch Royal Academy was done on four state-of-the-art tools for recognizing names in the text, to assess and improve their performance on popular fiction. They discovered solutions to boost the capacity of the machines in understanding names in one novel from the accuracy of seven percent to 90 percent.

The NLP, natural language processing tools are commonly used in several everyday applications like Siri and Google, but the effectiveness of these technologies is not wholly understood. Researchers from the Vrije Universiteit Amsterdam and the Dutch Royal Academy' Humanities Cluster have performed a thorough evaluation of four different name recognition tools on famous 40 novels, including A Game of Thrones. The researchers published their analysis in Peerj Computer Science, and the study highlighted types of names and texts that are particularly challenging for these tools to identify as well as solutions for mitigating this issue.

Also, the researchers extracted social networks from the novels to explore differences in story structure. The insights the researchers got can help the technology to be more robust against genre differences and also can back this technology to be more useful to journalists that intend to analyze large datasets such as the Panama Papers.

The machine learning is the basis for many NLP tools. It is a computer program that is trained to identify patterns in text based on previously fed examples. To recognize names in writing, the tool is supplied with many newspaper articles in which humans have meticulously marked the names.

Next, they will task the program to "learn" what a name looks like based on context or shape of the word such names that generally begin with a capital letter in English. The issue of applying such a system trained on newspapers to novels is that authors of such novels have much more freedom in their narrative than journalists who need to stick to facts. Fiction authors have the freedom of making up their names such as Tywin or R'hillor or use descriptive character names straight from the dictionary like Grey Worm. These names do not behave like "normal" names, and as a result, NLP systems have difficulty recognizing them in a text.

Niels Dekker, Trifork B.V., Tobias Kuhn, Vrije Universieit Amsterdam, and Marieke van Erp, KNAW Humanities Cluster, got highlight from their performed experiments on the flexibility of language and how writers contextualized names in stories. It is possible to refer Daenerys Targaryen as Daenerys and she, but she is also known as Dany, Daenerys Stromborm, Mother of Dragons, Khaleesi, the Unburnt and Mhysa.

The social media also created for A Game of Thrones, an illustration that her friends used Dany, and her full name is Daenerys only by her enemies, in her absence.

The description of the research in this publication reveals that the performance of NLP needs more attention and that there is still work to do before the computer fully understand the 'text.'