eli5: Lots of websites will have a file hash you can use to verify file integrity; computationally speaking how is this created? Do you really need to inspect every character or just make sure the first and last few are correct? For well known programs should you verify the hash with a third party?
In: 3
If your goal is to verify file integrity, then you **must** inspect every character. That’s the entire point. You have to check the whole file to make sure it hasn’t been altered; otherwise, someone could have altered (accidentally or otherwise) one of the bytes you didn’t check.
Hashes are specifically designed to do this.
A “good” hash function has something called the “avalanche property”: if you change a *single bit* of the input, then output is *completely different*. This, along with other math properties, means that it’s very difficult to create two files with the same hash, and damn-near-impossible to create two *similar* files with the same hash. A small alteration, like a transmission error or a subtle bug, will cause a totally different hash.
If you receive the hash through the same channel as the file, that protects against transmission errors but not against malicious attackers. To protect against attackers, you have to receive or verify the hash through a separate channel. Historically, back when storage and bandwidth were more expensive, people didn’t host their own file uploads. They would point a link to a download location hosted on a different server (often a university!). The hash let you know that other server hadn’t tampered with the file. Since the link and file lived on different servers, there was actually a point to this.
Latest Answers