There are plenty of mathematical operations that are easy to do one way but very hard to do the other way.
For example, you could multiply two prime numbers like 4952019383323 and 16123689073 fairly easily. However, if I gave you the result and asked you “which two primes multiply to form this number” it would be much harder.
Hash functions make use of these kinds of operations.
That said, since hash functions are deterministic people have created mappings between inputs and outputs to common hash functions (called rainbow tables). There are ways of dealing with issues like this (i.e., salting).
The thing about how these hash algorithms work is they reduce and mix the data in such a way that you just don’t have enough data to reverse them to find the original.
Lets demonstrate with an extremely simple hash algorithm. We can make A=1, B=2, C=3 and so on. I’ll choose a word, add the numbers made from the letters together, and then take just the last 2 digits.
The hash of the word is 82.
What is the word? It could be anything! In fact there are lots of possible words that would make the same hash. It’s impossible to even say how long the word is.
SHA256 is 256 bits long, but it can hash files that are megabytes or gigabytes in size. So there must be lots of theoretically possible files that would have the same hash, it’s just very unlikely to hit them because there are so many combinations in the hash. But if you were trying to work backwards you’d have no idea which of the possible files it was.
There is one very obvious way to answer your question.
You can take the sha256 of any size of data. That means you can take the sha256 of a huge video file that’s many gigabytes. The result is a hash that is only a few hundred bits.
It should be obvious that you can’t take a few hundred bits of data and somehow “reverse” it into a file that’s many gigabytes in size. This is because any general-purpose hash algorithm that takes arbitrary input *must* lose information along the way.
Many mathematical operations lose information. For example, the remainder function (called “modulus” by us nerds). If I asked you, “I divided a number by 7 and the remainder is 3, what was the original number?” Could you answer? You might get lucky, but there are infinite possible numbers I might have used. Even knowing the answer and knowing exactly what mathematical operation I performed, you can’t reverse it and get the original number.
No. Its surprisingly easily to come up with mathematical steps that can’t be reversed, even if you know the step rules exactly! All you need is for *at least one step to lose information*. An example makes it easier to see.
Let’s use these rules to hash a sentence:
1. Take the first letter of each word in the sentence you’re hashing.
2. Assign A=1, B=2 etc.
3. Add the numbers.
**Sentence:** The quick brown fox jumps over the lazy dog.
**Hash:**
1. TQBFJOTLD
2. 20 17 2 6 10 15 20 12 4
3. 106
So the hash of that sentence is 106. Note that you’ll get 106 every time you apply those rules to that sentence, it’s not random. In fact, any sentence you pick will only generate one and only one hash, every time. But you still can’t reverse it back from 106 to the sentence, even if I tell you the complete set of rules!
You don’t even know how many words there are whose first letters add up to 106, so the search space is huge. And you could use those same hash rules to hash a whole book or program or whatever and then it would be impossible to even brute force with a supercomputer.
Reversing sha256 is like being given a plate of hash browns and trying to reconstruct the bag of potatoes. Even if the cook is perfectly precise every time, the potatoes are in such a mangled state (and some of the potato parts don’t even end up on your plate) that it’s effectively impossible to reverse the steps.
An analogy or two.
Your original data is like the raw ingredients for a cake.
You use a recipe (algorithm) and an oven (computer) to process the ingredients and bake a cake.
It’s not possible to un-bake a cake and get back the ingredients.
We ensure that the recipe (hashing algo) works in such a way that if the ingredients (data) are even the slightest bit different, the resulting cake is very different.
Thus if we find two identical cakes, we know they had to have identical ingredients and use the same recipe. I.e. Two output hashes had to have the same input data and algo.
With SHA256 the original message is expanded from 512 bits to 2048 bits. These expanded bits depend on the original message. That is, changing 1 bit in the original message changes many bits in the expanded message.
The 256-bit internal state of the function is initialized to its starting value (its a constant, the same every time). The 2048-bit message is then processed in 64 rounds, each round updates the internal state. The internal state then becomes the output.
Knowing this, suppose you were given a hash and wanted to work backwards. So you start by guessing message bits and working backwards through each round, calculating the internal state in reverse. Eventually you will get to the start of the function to find your internal state does not match the initial value. So you modify your internal state to match the initial value, but you need to offset those changes by modifying the message bits, but changing those message bits changes lots of message bits all over the place, and you need to offset those changes by modifying the state. It’s a mess.
You end up with a HUGE satisfiability problem which nobody knows how to efficiently solve.
Modulo arithmetic. You now how the hours on a clock can’t go past “23”, because they loop back to “0”? Same thing.
Let me multiply two numbers together, and I’ll show you the “remainder” or “modulus” (let’s say after dividing by 100, to make it really easy and so I’ll give you the last two digits). You have to guess the two original numbers that I thought of and multiplied together.
What were the original numbers:
81
Sure, it *COULD* be 9 x 9. But it could also be anything else that ends in 81, like 37 x 13 (481).
Now that’s just one very simple calculation but in fact it has basically an infinite number of possible solutions.
Modulo arithmetic (and Galois fields and other mathematical constructs that use the same kind of maths) means that one way is really easy, the other way is really hard. Combine it with prime numbers, factorisation and other tricks and – again – one way is really easy and the other way is impossibly difficult to get right.
And hence you have a lovely one-way function, like a hash (e.g. SHA256), or certain techniques used in encryption.
You’re baking a cake, and you can’t get the eggs back from it when you’re done.
Even simpler explaination: Sha256 gives you a number that’s only 256 (binary) digits long, and can be made from a file a million times as long.
If you took a book and only wrote down the first letter of each page onto a sticky note, you’d never be able to recover the book. But you could easily check whether or not the letters on the note correspond to a book.
Latest Answers