Why do computers sometimes change special characters like “&” into “&” or “ ”


Why do computers sometimes change special characters like “&” into “&” or “ ”

In: 77

“&” is the HTML entity for an ampersand “&” it’s mistake caused by decoding them literally rather than into their proper special character.

You will see this for example in URL, where the & character has a special meaning for the computer. In this context, & Is use as a separator between two character string.

But then how do you have & as part of the string? Well you replace it with &amp (amp for ampersand, it is the name of the & character)

Same for &nbsp, which means non breaking space

This phenomenon is exclusive to the HTML format used on the world-wide web. It defines a set of reserved characters to express its syntax, to indicate the start of code words. An example of a code word is <em> for start of *emphasized text*. Other unusual characters are unsafe to use or may hold a special meaning in file systems or other computer languages on the back end of the web server.

To transmit these characters they are substituted by these replacement strings called entities. Every web browser has a list of them, and replaces them with the corresponding character before displaying the web page. The mathematical symbols <> are “less than” and “greater than” are encoded as &lt; and &gt;. &amp; stands for “ampersand” and &nbsp; is a non-breaking space, which doesn’t allow to wrap text to a new line at that point.

When you see these special strings, a mistake has happened in the code of the website. It may have converted a character twice. & -> &amp; -> &amp;amp; When the browser receives the string “&amp;amp;”, it performs the task of restoring the complete “entity” in the first 5 characters, resulting in “&amp;” being output. Other possibilities is the encoding of the ampersand using a numerical character code, or omission of the semicolon.

This happens because of a bug in the software being used. The content has been “double escaped”

In the code that is used to generate webpages (HTML) the & character is used in a special way that tells the browser to render specific characters. `&nbsp;` for example is a code that tells the browser to render a space character that cannot be used as a break point for word-wrapping and multiples cannot be collapsed together. Because the & is used for these special codes, if you want to write an actual & character, you need to use a special code for it which is `&amp;`

When you see these codes in the text rather than what they are supposed to represent, that is a result of the software double-processing. For example, the first time it processes the text it converts & into `&amp;`. The second time, it converts `&amp;` into `&amp;amp;`. Your browser then renders the `&amp;` as & followed by the amp;

In HTML (the language used to make a website), & has a special meaning. It’s used to signal that you’re about to encode an unusual character (called an HTML entity) such as `&permil;` which turns into ‰. But what if you want to just show an & without it getting interpreted as having a special meaning? In that case, you need to encode it using & like this: `&amp;` (for ampersand). When you’re writing text which is going to be converted to HTML, all special characters need to be converted into HTML entities in order to be displayed correctly. Unfortunately, sometimes this process happens twice in a row by mistake and you end up having the ampersand, that is there as proper HTML, being converted again. So you end up with & turning into `&amp;` which turns into `&amp;amp;` resulting in what you see.