How does Google Translate recognize the language almost instantly?

561 views

What processes and technologies are they using?
I assume iterating over the words in all languages and returning the closest match would take way, way longer.

Thank you!

In: Technology

2 Answers

Anonymous 0 Comments

They are using natural language processing and tokenization.

This is pretty standard practice nowadays and results are far superior to dictionary lookups. It’s still not perfect but it’s far better than just a few years ago.

Anonymous 0 Comments

> I assume iterating over the words in all languages and returning the closest match would take way, way longer.

Why would you assume that? Goole have an enormous number of computer and your request can use many at the same time. I could find an estimation of 2.5 million servers for google back in 2016. They do not publish the numbers but it is certainly in the millions

Run the translation process in parallel on different cores or different computers. That way the test on language takes the same time as a test on all the languages the system has it just uses more computers, this is what would be called an “embarrassingly parallel problem” It is embarrassingly easy to run in parallel on multiple computers because the program that tries to translate from French it completely independent from the program that tries if from German so just let one core run each language. You need some program at the end that selecting the most likely language but that is trivial as you likely just look at a percentage how from each translation how well the text matches the language.

So it is not an unreasonable way to do it. You could add dictionary lookups of the words and quite quickly what languages it can be.

If it is the google translate that is used in the browser the HTML code can contain tags with language information like lang=”de” for german so the source might tell you the language of the page.