Every so often, if I open a non-text based document in either Microsoft Word or Notepad, it will open a massive file with an endless wall of completely garbled, gibberish text, most of the characters being either rectangle boxes or characters that can’t normally be typed. What does each of these characters represent? What happens if I insert or delete these characters?
Usually files would refuse to open with an incompatible format. How do these text-processing softwares somehow manage to run virtually any file?
In: 15
All digital computer data these days is made up of ones and zeros. How your computer reads that data depends on what format it is expecting.
For instance, The way computers typically store the character ‘a’ is with the binary 01100001. When reading a .txt file, the computer reads 8 bits at a time, then consults the [ascii table](http://www.gcsecs.com/uploads/2/6/5/0/26505918/ascii-table_orig.png) to translate what that group of ones and zeros mean. Every time it sees 01100001, it replaces that information with ‘a’ by the time it reaches your screen.
There are other ways to read 01100001 though. If the computer is expecting an 8 bit number to be there instead of a character, it will decode 01100001 to mean the number 97, because that’s what 01100001 is when converted from base 2 to base 10.
When you open a non text file in note pad, you’re feeding that non-text information into a binary to text decoder. It’s reading information that was never meant to represent text and telling you what those ones and zeros would be if converted to text.
It may come across a 16 bit integer 1000000011111111 meant to express the number 33023, but it’s expecting characters represented by groups of 8 bits, so it sections off 10000000 an translates that to ‘@’ and then the next 8 bits of 11111111 and translates that to ‘?’.
Latest Answers