eli5: Lots of websites will have a file hash you can use to verify file integrity; computationally speaking how is this created? Do you really need to inspect every character or just make sure the first and last few are correct? For well known programs should you verify the hash with a third party?
In: 3
**How is a hash created?**
A hash is created using “trapdoor functions –” mathematical functions that are believed to be easy to perform in one direction, but extremely difficult to perform in the other direction.
For example, if I give you two random prime numbers — 7,949 and 8,161 — you can easily multiply them together and find that their product is 64,871,789. However, if I just give you the number 64,871,789 and tell you to find the two prime numbers that multiply together to make it, you will have an extremely difficult time figuring it out.
A hash function is basically just doing a lot of complex binary math on the 1s and 0s of the file using trapdoor functions. This means that for every file, you can produce a hash, but it’s extremely difficult to take a hash and produce a useful file associated with it.
**Do you really need to inspect every character?**
You *must* use the entire file to do this because, by design, hash functions are extremely sensitive to small changes, and the entire file is used in the computation. That’s why you can take a file that’s three gigabytes large and end up with a hash that’s just 64 characters long. If you omit any part of the file or change any part of the file, then the hash changes to somsthing completely different.
You *could* do what you said and only check part of the file, but this is dangerous. If you only check part of the file, then an attacker could change the part that you’re not checking and you would have no idea.
For example, if I give you a number like 111222333444, and your algorithm only checks the first and last two digits — “11” and “44” — then I could change the number to something like “110000000044” and your algorithm wouldn’t be able to tell the difference.
**For well known programs should you verify the hash with a third party?**
The trouble with using the hash displayed on the website where you downloaded the file from is that, if a hacker can replace the file with something malicious, then they can also replace the hash to match their file. A hash is only useful if you can trust the source that gave it to you.
Luckily, most well-known apps are “signed.” I won’t go in-depth on how code-signing works because that’s a whole other discussion. Just know that when an app is signed, your computer is able to be 100% certain that the app is what it claims to be.
When you run an app on your computer, the signature is checked before the app runs. So, if you have a well-known app, then it’ll run without problems. It might even pop up a box asking if you want to run it. When that box says something like “Published by [whatever corporation],” then you know the signature is good.
If your computer doesn’t recognize the app — either because it’s not signed or because the signature is bad — it will pop up a warning of some kind that either refuses to run the app or tells you that it’s from an unknown publisher.
Thanks to code signing, when you’re running a well-known app, you can trust your computer to verify it for you, and there’s no need to verify the hash manually.
**Bonus question: what if it’s not a well-known app?**
If it’s some random file, then remember, the hash is only as good as the source that gave it to you. Just because the hash matches doesn’t mean that the file is safe. It only means that the person who gave you the hash has the same file that you have. You need to make sure that you trust the person who created the file *and* the person who gave you the hash.
Latest Answers