> Is it just down to the fact that certain things cannot be translated directly from one language to another?
Usually, yes. One word in one language can have a dozen or more meanings, and each of those meanings might require their own word in the other language. If the software can’t understand which meaning is being used, then it might choose the wrong word in its translation.
There are very serious grammar considerations, too. For instance, many languages just use one verb in the present tense to show both repeated/ongoing actions and a one time action in the present, but in English we express this differently (“He goes to the theater on Saturdays” versus “He is going to the theater right now”). So in a case like that, the software might not know which version in English would be required based on the input in the other language.
Or like, in English you say “I would do this” and that could mean that you *did* do that in the past, or that you *would* (conditionally) do it if your situation were different (“I would buy a used car any time my old one broke down.” as opposed to “I would buy a Ferrari if I won the lottery.”). Well, in another language, you might just use simple past tense to express the first idea but you might need additional words and participles to express the second idea and the sentence becomes more complicated.
Then there’s also different word orders, or the fact that you can omit words in other languages that you can’t in others.
Languages are complex as hell man.
I think its important to recognize that language is not exactly a 1 to 1 process. Even though we have the words that may translate directly, often, this leads to cryptic and grammatically poor translations because how you would say something in English may be radical different than, say, in Hindi.
And this gets at the crux of it: no one really wants a simple word to word translation, we want meaning (or semantic) translation. And thats where it gets hard.
Most machine translation now days is done with machine learning. Specifically, they are done with Transformer networks. Data scientists will give this network A TON of text in both languages. For example, Google released a really popular model a few years ago called BERT, which was trained on the entirety of WIkipedia and any unpublished books they got their hands on. The point of doing this is so that *the network begins learning patterns in language* . This is why we sometimes get weirdness in translation: because the model actually has no understanding of the rules of any language, *it simply learns the patterns present in the text of those languages*.
I remember seeing a post somewhere on reddit a whole ago where someone was using grammarly and it recommended that the author shorten the phrase “he had not left” to “he left’nt”. Although this isn’t an example of translation, it illustrates my point: grammarly had no knowledge of how English grammar works, instead it was simply aware of the pattern of contractions and tried to apply it.
The same things happen in translation. The machine doesn’t know the actual rules of either language, its simply pattern matching between the two languages. And when those patterns don’t apply, the machine can make perplexing, sometimes humorous mistakes like “left’nt”.
In addition to issues based in words/vocabulary, there is also the problem of meaning and context. Current translation software simply isn’t at the point were it can reliably tell when a person is being sarcastic, implying an allegory, or referencing something that was implied in a previous sentence. These are all obvious to native speakers which can make translations awkward or just plain wrong.
Even though some linguists believe that language is a very structual system like math (and they call this system “glossematik” if you want to read about it) it isn’t exactly the case. Yes sentences have structures and words have meanings but language is incredibly effected by culture. Even though translating literal sentences to literal sentences is completely possible these literal translations might not give the meaning you want to convey. This is all about culture. Let’s take a look at some examples :
In English you would say “It’s raining cats and dogs” to express the idea that it is heavily raining, while Turks would say “Bardaktan boşanırcasına yağmur yağıyor” and that translated literally is “It’s raining like pouring out of a glass” even though you can get an idea, you wouldn’t translate it like this. You would use the proper translation which is “It’s raining cats and dogs”.
This is called “localization”. You take a script, understand the meaning and recreate it in a way that your target audience will internalize it. If you’re watching a Turkish TV series and a character says “İnşallah” that will be translated as “I hope” because you have no idea how or when İnşallah is used. You wouldn’t internalize it.
Our understanding of language depends on culture, context, and awareness of the situation. For example, explain the meaning of “He is an animal.” This phrase has very different meaning depending on whether the speaker is a zookeeper describing an ape, or a woman describing her boyfriend, or a police officer describing a suspect in a violent crime.
For another example, try explaining to someone who isn’t a native English speaker the difference between “butt dial” and “booty call”.
Machine learning is great at looking at billions of examples of translations and figuring out that one word or phrase is frequently translated to another language a certain way, but computers don’t (yet) understand the context or the culture behind that translation.
You can have virtual dictionary with every word’s translation. But language is never just words. It’s also grammar. Not only it is responsible for making words’ sets to have sense, it can also change meaning of some words. Let’s not even start about words that exist only in one language.
Easiest way to learn grammar to computers is to use Artificial Inteligence. AI is a algorith that you feed data and then make it guess. For example, you provide tons of photos of cats and dogs and you tell AI what is on the picture. Then you make AI guess what’s on the picture without telling it. If AI if properly trained, there is high chance it will guess properly.
In advanced concepts like language, it’s impossible for AI to have 100% accuracy.
Latest Answers