Every so often, if I open a non-text based document in either Microsoft Word or Notepad, it will open a massive file with an endless wall of completely garbled, gibberish text, most of the characters being either rectangle boxes or characters that can’t normally be typed. What does each of these characters represent? What happens if I insert or delete these characters?
Usually files would refuse to open with an incompatible format. How do these text-processing softwares somehow manage to run virtually any file?
In: 15
Nobody understands how to do ELI5 anymore.
# ELI5
So we all accept that everything in a computer is just 1s and 0s, right?
Long ago, people got together and decided, “Hey everyone, let’s all treat 1011 as the letter A.”
And so everyone agreed. For text, treat 1011 as A, and 1100 as B, and so on.
But then someone came along and wanted to use 1s and 0s to convey sound. “A” and “B” don’t mean anything for sound, instead they want 1011 to mean “a sound at 440hz at volume 6” (for example).
And then someone else came along and wanted to use 1s and 0s to convey images. They want 1101 to mean “Set this pixel to navy blue intensity 8”.
So you’ve got a file. It’s a bunch of 1s and 0s.
Should we interpret those 1s and 0s as text (1101 means A)?
Should we interpret those 1s and 0s as sound?
Should we interpret those 1s and 0s as images?
The extension (ex: txt, doc, xls) gives you a *hint* about what’s inside, but it’s no guarantee.
So you open Notepad, and it will only interpret things as text. That’s its job, that’s what it knows how to do. And the extension (ex: jpg) is kind of meaningless, Notepad is happy to try to interpret the file as text.
And it finds “010000010110”, and it says “Hmm.[ It says here](https://unicode-table.com/en/#0416) that, if treated as text, that string of 1s and 0s should be: Ж.
That’s where you get those weird images. It’s trying to interpret those 1s and 0s as text. There are some strings of 1s and 0s that don’t have ANY corresponding letter, so it does its best, and sometimes those show up as a square or rectangle.
What happens if you insert/delete/change those characters?
Well, say you type a “K” where the “Ж” was.
You’ve just changed the 1s and 0s.
And if the file was meant to be interpreted as images, now you’ve told the thing that interprets those 1s and 0s to put a yellow pixel instead of a blue one.
It could be as harmless as that. Or it could tell your IV pump to send a lethal dose of medicine. It all depends on what’s interpreting those 1s and 0s, and how it’s doing the interpretation.
> Usually files would refuse to open with an incompatible format
Sure. You open the file as a jpg and the jpg interpreter says, “Whoa! I just found 001001 and I have no idea what that means! I better stop and alert the user that this file is incomprehensible!”
But Notepad doesn’t do that because just about every combination of 1s and 0s can be some kind of character.
Latest Answers