AnswerCult

Question

508 viewsJanuary 2, 2024

Question 100.55K September 10, 2022 0 Comments

How does a server know whether your local files are corrupted, version mismatched, or illegitimate without you uploading to the server? If they have to check through every file for integrity, wouldn’t that effectively be the same as downloading/uploading the entire folder?

In: 19

6 Answers

Answer 1 · 2022-09-10T14:06:23+00:00

I am assuming here that the server have a copy of the files as well as the client. That allows the client and server to calculate a hash of the data. If there is anything wrong with the file it will produce a different hash and they will not match.

Answer 2 · 2022-09-10T14:12:28+00:00

They do a weird math formula called a hash to the whole file. It comes out to a single really long number. They do that formula to the file before and after downloading. If the number matches, the file is identical.

Answer 3 · 2022-09-10T14:10:19+00:00

No. File integrity is primarily achieved through the use of hashes. A hash is a kind of process where you can take data of any length and it very efficiently produces a fixed length string of numbers known as the hash. The math behind it is chosen such that, for all intents and purposes, the hash is unique for the given input* and any slight change in the input results in a completely different hash.

So the server just needs to know what the hash is. It can then run the hashing program against your files and see if the hash produced matches what it has on file.

^((* – given the variable length input and fixed length output, hashes are necessarily not unique, but the inputs producing identical outputs are unrelated and for the purposes of this discussion, the odds of two functioning programs having the same hash is negligible for a sufficiently designed hashing algorithm))

Answer 4 · 2022-09-10T14:14:38+00:00

Algorithms can compute hash keys that are unique for each file. Those are way smaller than the actual file and used for comparing the file on the server and your local file.

Like when you’re describing your car, the color, brand and number plate are enough info for others to identify it without you having to list the number of seats and wheels, engine details, fabrication date, etc. Sry that’s the best example I could come up with, I’m sure there are better ones.

Answer 5 · 2022-09-10T14:51:29+00:00

While hashing is definitely a thing, there are simpler options. For instance, you can check the size of the file. Your file could be wrong and remain the same size but if the size is wrong then you know for sure that the file is wrong. The next level of complexity is called a [checksum](https://en.wikipedia.org/wiki/Checksum) and that’s a lot like measuring the length but you can do it in more dimensions and, depending on how your checksum works, use it to find out where the problem is if there is one.

Answer 6 · 2022-09-10T15:40:39+00:00

Enough people have explained hash sums, but it’s important to understand that a server might not know at about corrupted files at all. When transferring a file to or from the server, the networking protocol like TCP and the transfer tools take care that the data isn’t corrupted in transit. But that’s it. If the file is already corrupt at the source or becomes corrupt after storing (through bit rot or storage defects), the server will never know or care unless some external software/use-case decides to actively check integrity by comparing actual hash sum against expected value. So that’s not an inherent need. Generally, the server will just host the corrupt file.

AnswerCult

How do large files have their integrity verified?

6 Answers

Search questions

Popular Questions

Latest Answers