[ELI5] Why does autocorrect insist that the first letter of a misspelled word is more important than the rest of it?

996 viewsOtherTechnology

For example, if I spell “umportant”, it’s easy for us to recognise that it’s supposed to be “important”, but autocorrect insists that it’s something like “umbrella”, or I guess more logically “unimportant”, even though “important” is only 1 correction away.

​

These are real examples from my phone (Samsung Galaxy):

​

Wuick gets the suggestions Wicked, Which, Wucky, Whickham, Whicker, Wick, Wickets, Wicket, and Wickham. None of which are “Quick”, what I intended to write.

​

Nrown gets the suggestions Now, Nr own, Noon, Nowhere, Nr owner, Nr owns, and Nr owners. None of which are “Brown”.

​

Dence gets the suggestions Dance, December, Denied, Dancers, Decent, Dense, Dench, and Deuce. None of which are “Fence”.

​

It’s bothered me for years that it never ever picks up on a misspelt first letter.

​

Edit: I tried “umportant”, and it actually comes with 0 suggestions. Not umbrella, not unimportant, not even “important”. But “inportant” and “ikportant” and even “iqportant” are all recognised as “important”.

In: Technology

14 Answers

Anonymous 0 Comments

You’ve taught it incorrectly over time, or you’re not using the default dictionary/autocorrect features. All of your examples give me the correct word as an option. I’m also using a Samsung.

Anonymous 0 Comments

The point of autocorrect is to help you spell words you don’t know how to spell. Programmers generally assume you know the first letter, and that even if you made a typo, for that letter you know how to fix it.

So they programmed autocorrect to assume the first letter is correct.

If they didn’t, then when you didn’t know how to spell a word, you’d have to deal with loads of suggestions for words that start with the wrong letter.

Say you type “hense”, you want it to suggest “hence” not “dense” “tense” “sense” or “mense”, and you’d probably be a little irritated at having to scroll through those suggestions.

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

The most common way to implement autocomplete is with a data structure called a Trie combined with and algorithm that calculates the likelihood the word is misspelled by the number of character flips it takes to get from word A to word B. The more letters you have right from the start, the more accurate it is.  There is a name for the algorithm, it is escaping me. 

Because of the nature of a Trie, if you get earlier letters wrong the space it much search to find your intended target is MUCH larger and therefore can be less accurate. Additionally, as you type more, the trie can be updated to include more common combinations of letters and therefore predict your target more accurately.

Anonymous 0 Comments

I just started coding. This is my small understanding.

As we type, autocorrect checks if the words are present in a library of words. If it’s not present, then it’s misspelled and needs to be edited. It can be edited by inserting, deleting, switching, and replacing letters until it forms words that are present in the library. Suggestions can be based on how frequently words are used globally. In the case of smartphones, the words we frequently use and select for autocorrect are counted locally.

Ive too felt that autocorrect used be better for certain things. However, I think AI has changed the needs for autocorrect. It’s no longer just about checking a dictionary for suggestions, or finding typos by comparing what keys are in proximity of each other and often get swapped. Now, with natural language processing, there are other factors such as: what words are commonly used together, what word and syntax patterns a person tends to use.

I think in order to gather better data and save on processing power, autocorrect has less focus on proximity typos and now more on word and sentence comprehension.

Anonymous 0 Comments

Autocorrect isn’t just correcting misspelled words.

It also takes several other factors into play, but which essentially is one thing: it also predicts what you’re trying to say.

In other words, autocorrect is also informed by autocomplete.

The problem is that autocomplete is pretty much unreliable. If you’ve ever done an autocomplete meme challenge, you know what I’m talking about.

Part of that is training data, which is basically user fault. We sometimes misspell words and we fail to correct them, so the system sees that as a correction and factors that into the whole thing.

Anonymous 0 Comments

There are different metrics you can use for measuring ‘distance’ between different strings, for example https://en.m.wikipedia.org/wiki/Levenshtein_distance vs. https://en.m.wikipedia.org/wiki/Damerau–Levenshtein_distance. Autocorrect systems often order their suggestions using something like this, and different ones might weight differently. Like another commenter mentioned, they might be more likely to assume that you at least got the first letter right in a long word, and so be less likely to suggest corrections that change that.

There are also other very different models you might use, like trying to help people guessing at phonetic spelling. They might have a database of frequently phonetically misspelled words and look for potential matches to those as well. On a cellphone specifically they might also look for things related to your keyboard layout — if people tend to miss letters by offsetting from one side more than the other that might also change the weighting they give.

A lot of more modern systems also take context into account to some degree. Perhaps using some kind of digraph or trigraph model to look at the preceding one/two words and using that to guess how likely the different autocomplete suggestions are to be correct. I can tell it’s at least sort of doing this on my phone; ‘the quick nrown’ suggests “brown” but ‘turn that nrown’ doesn’t make any autocomplete suggestion as I type, perhaps because its model suggests none of the good substitutions are a likely fit.

Usually the ones on smartphones also have some kind of adaptive model that tries to learn how *you* misspell things in particular, and adds in words that it doesn’t know about but that you tell it are to be accepted.

I have an iPhone, but I get very different results than you with those misspellings in isolation. “umportant” suggests important (and nothing else), “wuick” suggests quick (-> Buick -> wick), “nrown” suggests brown (-> crown -> grown), “dence” suggests fence. However, they give very different suggestions in the middle of a sentence (‘dence’ is usually either “dance” or “Denise”).

Anonymous 0 Comments

probably because you only ever notice autocorrect being wrong, and don’t pay any attention when it corrects it,,, correctly. samsung probably has loads of data that shows people are more likely to make a spelling error towards the end of the word than at the beginning, maybe that doesn’t apply to your own typing patterns but it works for most people.