Hash does not result in a unique number. The goal of a hash is to demonstrate that the original file has not changed, not to identify or recreate that unique file.
Consider these different ways of adding up to 10.
1 + 4 + 2 + 3
2 + 2 + 1 + 3 + 2
5 + 2 + 1 + 2
3 + 6 + 1
Okay, all of those strings add up to 10. If you change any number in a string, that string no longer adds up to 10. The purpose of the hash is to alert you to any kind of corruption or replacement to the file you think you are getting. You are provided with a hash and a file. If they still match when you add up the numbers again, you can be confident the file has not been altered.
But, if all you have is a hash, you have no way of know which of the ways of adding up to 10 the file represents.
Expand this to the scale actually used in practice, you’re not wondering “which way of adding up to 10 did the original file use”. It’s “which way of adding up to some number on the order of trillions (actually, even much bigger than that) produced this result? It can’t be reversed engineered. There are billions of possible “right” answers.
Hashes are one way. All you are doing is checking that you’re still getting the same answer so you know nothing was changed. You’re not determining what the contents of the source file are.
One of the methods used in hashing is modulo arithmetic, which means taking the remainder of a division. As an example, most people would say 5 divided by 2 is 2.5, but with modulo, you would say 2 goes into 5 twice with 1 remainder. However, there are multiple ways to get 1 remainder. You could also have 5 modulo 4, or 7 modulo 2.
If you consider modulo as a little ingredient and your hash is a recipe made of 100 ingredients you can see how it’s easy to turn plaintext into a hash but hard to turn a hash into plaintext.
Even insecure hashes like MD5 can’t be reversed in the traditional sense; the best you can hope for is a collision, which means that two inputs produce the same output (which isn’t supposed to happen; in practice this would mean you could log into an account with two different passwords.)
There is a good analogy for this:
It’s easy to mix 2 colors together – just squeeze them on a piece of paper and stir.
But if I gave you a piece of paper with a blob of color and ask you to separate it into two colors which made it, you wouldn’t be able to do that. (Technically you could, with atomic tweezers and an electronic microscope … but that’s the point, doing the opposite action is waaay harder)
Many commenters are invoking the idea of “multiple inputs could produce the same output,” and while that is true of hash functions (their whole purpose is to produce a fixed-size digest from a much larger message), it’s not the real reason cryptographic hash functions are difficult to reverse. After all, you can have a one-to-one cryptographic hash function that’s very difficult to reverse: take for example SHA-256 but restricted to an fixed input size of 256-bit messages. Assuming no collisions, it’s a one-to-one correspondence, with every input mapping uniquely to one output, and yet you still could not for the life of you reverse it.
The reason is because the hash functions are *designed* that way, intentionally. They are designed to maximize properties like the [avalanche effect](https://en.wikipedia.org/wiki/Avalanche_effect), minimizing linearity between input and output, using techniques like [diffusion and confusion](https://en.wikipedia.org/wiki/Confusion_and_diffusion). The avalanche effect means that flipping a single bit in the input should result in flipping roughly half the bits in the output. This makes it very difficult to find a preimage given an image besides using exhaustive search (bruteforce).
Also keep in mind we strongly *believe* SHA-256 can’t be reversed easily, meaning the best strategy our best minds have discovered so far is little better than just bruteforcing it, and with a search space of 2^(64)-1 bits (not 64 bits, but *2^64-1* bits, i.e. 18 quintillion bits), which is the maximum message size SHA-2 works on, there is no hope of bruteforce if outputs truly are effectively randomly distributed with no patterns or predictable structure in relation to the input.
But that’s just a belief. There’s not a mathemtical proof that SHA-256 doesn’t have structural weaknesses that could be attacked. Fundamentally, SHA-256 and other cryptographic hash functions are what we believe to be [one-way functions](https://en.wikipedia.org/wiki/One-way_function), functions that are easy to compute (taking polynomial time), but difficult (meaning super-polynomial time, e.g., exponential) to reverse. Whether one-way functions exist is an open question. If one-way functions exist, then P ≠ NP. Contrapositively, if P = NP, then one-way functions cannot exist, meaning no cryptographic hash function is fundamentally hard to reverse.
I divide a number by 5, and I get that the remainder is 2. What was the number I divided? Well, there are infinitely many possibilities because of the way division and remainders work. Possible answers include 2, 7, 12, 17, 22, 27…etc. It is an impossible question to answer with a single “correct” answer. This is an example of a one-way operation where you can get a single answer going one way, but cannot get a single answer going back the other way.
It’s also worth keeping in mind that some operations in a computer are slow compared to other operations. Addition and subtraction are very fast. Multiplication can be fast (if you multiply by a power of 2) or relatively slow. Division is extremely slow. So, if I tell you that I multiplied two prime numbers together to get a product P, and I ask you to figure out which two numbers I started with, that is going to take a long time because I have to divide P by 2, then divide P by 3, then divide P by 5…. and all the prime numbers up to squareroot(P) to try and find the prime factors. That’s a lot of divisions and remember that divisions are very slow. It’s not that the computer can’t do it, it’s that *the computer can’t do it in a reasonable amount of time*.
So the reason why SHA256 and many other hash algorithms can’t be reversed is because of use of one-way functions and factorization routines that have too many divisions.
BUT this isn’t fullproof. If I say that I have a password “password123” and I know that the hash of that password is XYZ, I can just look through my database of password hashes and find everybody whose password hash is XYZ and then I know that their password must be “password123”. People compile large tables of these kinds of results of hashes of common passwords, and then you can just go down the list comparing each until you find a match and then you know what the password is. This is why websites want you to use a “strong password” because that means it’s less likely to appear in one of these tables.
I’m seeing a lot of answers her that are over-complicating things.
Hashes are typically a fixed size, and much smaller than the data used to generate it. For example, let’s suppose you have a 1-bit hash, just outputs 1 or zero. Let’s say you put in “Hello” and get out a ‘1’. How so you reverse this hash to get back the input? Short answer is you don’t.
Longer answer is that when you hash, you typically lose information, and you can’t get it back from just the hash, and for a given hash, the are generally multiple inputs that can give you the same hash.
A good hash will make hash collisions unlikely for the space input data it’s intended for, and will be irreversible.
If you want a second example, let’s say I add two numbers together and take the remainder of the sum divided by 10. Let’s say the input is 15 and 24. This gives 15+24 = 39, and the remainder after dividing by 10 is 9. Now, given only the number 9 as the output, could you tell me what the inputs were? You can’t because there are infinite ways to add two numbers that end with 9. This is a lot closer to what actual hash functions look like.
There are other operations that are very hard to reverse. Like a sum of many numbers into one. If I add N numbers (with N > 1) and the result is 18429491, how can you say which numbers were used to derive that result? They can be whatever…
Also, there’s the [Pigeonhole Principle](https://en.wikipedia.org/wiki/Pigeonhole_principle)…
In sha256, there are only 256 bits of information, but you can hash any number of bits into those 256 bits. So there must be multiple sources that produce the same hash (even though it’s pretty much impossible to find a collision).
Latest Answers