Why do some word processors convert “I’m” to some jargon-filled thing, eg “I’m currently out of the office”


Why do some word processors convert “I’m” to some jargon-filled thing, eg “I’m currently out of the office”

In: Technology


Because different processors may use different encoding by default. So when converting the file, if they don’t speak the same language, it makes up for the character it can’t read with something else.

This sounds like a problem with character encoding. Basically, each letter or symbol you use is stored as a certain code point. You need to know the encoding scheme the document was saved with to be able to make sense of this, though. If the program uses the wrong encoding it will read those code points as a different symbol than the author intended.

The basic Latin letters are in standard positions in most encoding schemes, but other symbols such as the apostrophe (especially a curly apostrophe) can cause issues. This is mostly an old-fashioned problem because modern programs all understand the UTF-8 character encoding standard, and that’s what you should normally save your files with.

I’m is actually fine and will never mess up. I’m will break however. If you can’t see the difference that is normal. The first has an apostrophe which is ASCII and the second is a closing single quote which is not. Not every program is good about how they are encoded especially when files on disk are involved so they can sometimes be messed up.

UTF-8 is a common way to encode higher things as it represents ASCII as ASCII however if you interpret it as extended ASCII with a certain page you turn the three 8 bit numbers that make up how the computer writes the character into three arbitrary characters.

For extra fun use a system that incorrectly does it in a non repeatable way. A single non ASCII character can grow every save.

Because Microsoft (and some others) thought it would be really nice to silently and secretly replace the standard ‘ or ” characters with fancier proprietary ’ or “” curly quotations (or other characters) which only work properly in the same character set and the same font (which at the time was often only on the same OS running the same software). And/or somewhere along the way, someone opened and re-saved the document in a standard format, so those nonstandard proprietary characters got scrambled.

More recently, standards have been expanded (UTF-8) to include a lot more characters, including those old proprietary characters, so as long as everyone is using the same standard, and all the software is expecting the text in that standard, things should look fine. But the software often incorrectly assumes that things are or aren’t from the older standards, or it just doesn’t even do the new standards right, so it tries to convert and the text gets garbled.

Never use word processors (or other major office software) for any important text. Stick to standard text editors that don’t make assumptions or try to fancy it up for you.