This Computer Algorithm Can Help Decipher Long Lost Languages

This computer algorithm could prove to be a history-changing tool.

The more researchers use the technology available to them, the more progress we make in understanding our place in the universe. Not only is technology helping us explore the stars, the origin of our planet, and humankind as a species, but technology is being used to peer into the past, of course, not in a literal sense.

For example, archaeologists have used LiDAR to see places that are “invisible” to the unaided eye. By scanning portions of land from the air using LiDAR, researchers can identify structures that have been taken by nature, located under dense–sometimes inaccessible–parts of the planet.

But technology can do so much more; a new algorithm has proven to automatically decipher lost languages without the need for advanced knowledge of its relationship with other languages.

Although not many people are aware of that, there are many lost or dead languages that researchers have identified throughout the years.

Many of them have remained a complete enigma, while others have been partially translated. Technology such as computer algorithms and artificial intelligence could significantly increase scientists’ chances of deciphering these languages, revealing new, never-before-seen historical data.

The ultimate goal of the team of researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) is for the system to decipher lost languages ​​that have eluded linguists for decades, using just a few thousand words.

This interesting project is led by MIT Professor Regina Barzilay, who has revealed that the system is based on several principles of insights from historical linguistics. Language ​​generally tends to evolve in certain predictable ways.

The researchers explain that while given languages rarely add or remove an entire sound, certain sound substitutions are likely to occur.

As an example, they say that a word with a letter “p” in the primary language may change to a “b” in the descendant language, but the change to a “k” is less likely due to the significant pronunciation gap.

All of this is being thought to computers, who can then use it to search for things that humans are simply unable.

2700-year old pottery with Hebrew Inscriptions.
2700-year old pottery with Hebrew Inscriptions.

By incorporating these and other linguistic limitations, Barzilay and MIT Ph.D. student Jiaming Luo developed a decryption algorithm that can handle the vast space of possible transformations and the scarcity of a guide signal at the input.

The algorithm learns to embed the language’s sounds in a multidimensional space where the differences in pronunciation are reflected in the distance between the corresponding vectors.

This design allows them to capture relevant patterns of language change and express them as computational constraints.

The resulting model can segment words in an ancient language and assign them to their counterparts in a related language.

The project is based on a paper Barzilay and Luo wrote last year that deciphered the lost languages ​​Ugaritic (a Semitic language) and Linear B (the writing system used to write Mycenaean Greek), the latter of which has taken decades to be decoded.

In the case of Linear B, it took several decades to discover the correct known descendant. For the Iberian, scholars still cannot agree on the related language: some defend Euskera. In contrast, others refute this hypothesis and affirm that Iberian is not related to any known language.

With the new language-deciphering system, the algorithm can infer the relationship between various languages, making the process a much easier one for experts.

The proposed algorithm can evaluate how close two languages are and how many similarities or dissimilarities exist.

Tests in known languages have shown that the algorithms can accurately identify even linguistic families.

The team applied their algorithm to the Iberian, considering Basque and the less likely candidates from the Romance, Germanic, Turkish and Uralic families. Although Basque and Latin were closer to Iberian than other languages, they were still too different from being considered related, the researchers have explained.

The new algorithm will surely help us better understand our history and solve many of the enigmas that remained unanswered to this day.

Studying languages especially lost languages and those that remain undeciphered is of the essence if we are better to understand the history of our civilization and species. This is because the progress and evolution of our species would be impossible without language.

Check out this article posted on Curiosmos, which looks at five ancient languages that are in use to this day. In it, you will see that there are languages on Earth that, despite being millennia old, continue to be used to this day.


Join the discussion and participate in awesome giveaways in our mobile Telegram group. Join Curiosmos on Telegram Today. t.me/Curiosmos

Sources and references: News MIT / All other sources and references are linked throughout this article. If you feel that something isn’t right, please contact us.

Back to top button

Adblock detected :(

Hi, we understand that enjoy and Ad-free experience while surfing the internet, however, many sites, including ours, depend on ads to continue operating and producing the content you are reading now. Please consider turning off Ad-Block. We are committed to reducing the number of ads shown on the site.