An ancient sumerian illustration of a deity carrying a "bag." Jumpstory.

This Computer Algorithm Can Help Decipher Long Lost Languages

This computer algorithm could prove to be a history-changing tool.


The more researchers use the technology available, the more progress we make in understanding our place in the universe. Not only is technology helping us explore the stars, the origin of our planet, and humankind as a species, but technology is being used to peer into the past, of course, not in a literal sense.

For example, archaeologists have used LiDAR to see ” invisible ” places to the unaided eye. By scanning portions of land from the air using LiDAR, researchers can identify structures that have been taken by nature, located under dense–sometimes inaccessible–parts of the planet.

But technology can do so much more; a new algorithm has proven to automatically decipher lost languages without needing advanced knowledge of their relationship with other languages. Although not many people are aware of that, there are many lost or dead languages that researchers have identified throughout the years.

Many have remained a complete enigma, while others have been partially translated. Technology such as computer algorithms and artificial intelligence could significantly increase scientists’ chances of deciphering these languages, revealing new, never-before-seen historical data.

The ultimate goal of the team of researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) is for the system to decipher lost languages ​​that have eluded linguists for decades, using just a few thousand words.

This interesting project is led by MIT Professor Regina Barzilay, who has revealed that the system is based on several principles of insights from historical linguistics. Language ​​generally tends to evolve in certain predictable ways.

The researchers explain that certain sound substitutions are likely to occur while given languages rarely add or remove an entire sound.

As an example, they say that a word with the letter “p” in the primary language may change to a “b” in the descendant language, but the change to a “k” is less likely due to the significant pronunciation gap.

All of this is being taught to computers, who can then use it to search for things that humans cannot.

2700-year old pottery with Hebrew Inscriptions.
2700-year old pottery with Hebrew Inscriptions.

By incorporating these and other linguistic limitations, Barzilay and MIT Ph.D. student Jiaming Luo developed a decryption algorithm that can handle the vast space of possible transformations and the scarcity of a guide signal at the input.

The algorithm learns to embed the language’s sounds in a multidimensional space where the differences in pronunciation are reflected in the distance between the corresponding vectors.

This design allows them to capture relevant patterns of language change and express them as computational constraints.

The resulting model can segment words in an ancient language and assign them to their counterparts in a related language.

The project is based on a paper Barzilay and Luo wrote last year that deciphered the lost languages ​​Ugaritic (a Semitic language) and Linear B (the writing system used to write Mycenaean Greek), the latter of which has taken decades to be decoded.

In the case of Linear B, it took several decades to discover the correct known descendant. For the Iberian, scholars still cannot agree on the related language: some defend Euskera. In contrast, others refute this hypothesis and affirm that Iberian is unrelated to any known language.

With the new language-deciphering system, the algorithm can infer the relationship between various languages, making the process easier for experts.

The proposed algorithm can evaluate how close two languages are and how many similarities or dissimilarities exist.

Tests in known languages have shown that the algorithms can accurately identify even linguistic families.

The team applied their algorithm to the Iberian, considering Basque and the less likely candidates from the Romance, Germanic, Turkish, and Uralic families. Although Basque and Latin were closer to Iberian than other languages, they were still too different from being considered related, the researchers have explained.

The new algorithm will help us better understand our history and solve many of the enigmas that remain unanswered today.

Studying languages especially lost languages and those that remain undeciphered, is essential if we can better understand the history of our civilization and species. This is because the progress and evolution of our species would be impossible without language.

PLEASE READ: Have something to add? Visit Curiosmos on Facebook. Join the discussion in our mobile Telegram group. Also, follow us on Google News. Interesting in history, mysteries, and more? Visit Ancient Library’s Telegram group and become part of an exclusive group.

Sources and references: News MIT / All other sources and references are linked throughout this article. If you feel that something isn’t right, don’t hesitate to contact us.


Written by Ivan Petricevic

I've been writing passionately about ancient civilizations, history, alien life, and various other subjects for more than eight years. You may have seen me appear on Discovery Channel's What On Earth series, History Channel's Ancient Aliens, and Gaia's Ancient Civilizations among others.

Write for us

We’re always looking for new guest authors and we welcome individual bloggers to contribute high-quality guest posts.

Get In Touch