ASCII and Unicode are so-called text encodings. They are basically just a massive list of characters ordered in a particular way that digital systems can agree upon to store and communicate text.
Digital systems, like computers, are pretty good at storing and transmitting bits – it’s their native language. Going from bits to bytes, and bytes to numbers is pretty straightforward, but letters aren’t numbers – so why not just assign a number to each letter instead? As long as you remember that particular data was originally text, you can always recover what you put in there. This is called a representation.
What decides which letter gets what number is the text encoding. ASCII and Unicode assign so called codepoints (indices) to letters, and so programs that can interpret these encodings will know what letters to draw when they need to put text on the screen.
UTF-8 is kind of a layer on top of Unicode that exists because Unicode is a very long list; UTF-8 represents Unicode codepoints that then represent letters. The way UTF-8 does this is backwards compatible with ASCII and is generally storage size efficient (for Latin letter using languages).
Latest Answers