sha256 uses many binary operations where you lose information, specifically “and” and addition (which is basically just xor + and) operations. If the result from an “and” operation is 0 the only thing you know is that one of the operands must be 0 but you don’t know which one or if both are 0. Same for “or” but if the result is 1. This might not sound overly difficult to reverse but this is the main essence of what happens. The algorithm works something like this:
1. The binary values of the original message is shuffled, padded and modified then cut into groups of 32 bits. Let’s call them message bits.
2. Then 8 groups of 32 specific bits are set up, call them result bits. You take the first group of message bits and do one iteration of irreversible binary operations with the result bits. The result of these operations will now be the new result bits.
3. Take the second message group and perform the same operation on the new result bits. Do this for every group 64 times and you will have your final result bits which is the hashed result.
The result can not be predicted. Any slight variation will completely change the result. And yes there will always be at least 64 message groups of bits, even for short messages. The reason is because there will be at most 16 bit groups that are unique from the original message. The other 48 are generated from the other message groups, with more irreversible operands like “and” but also bit shifting. This also means that the 64th message groups will depend on a set of groups that will depends on a set of groups that will depend on a set of group and so on until that message group is actually part of the original message. A numerical nightmare to reverse.
If the original message is shorter than 16*8 bits it will get padded with 0s. If the message is very long then you will perform step 2 and 3 several times, which basically means that you modify the result bits 64*n times.
Just mathematically describing the solution of one bit is not possible, let alone solving it with any existing computer.
Hash functions are lossy conversions. The result does not have enough data to reconstruct the input.
Imagine a “hash” of your name that is just your initials. Can someone reconstruct your name from just the initials? They could try to guess, but there are a large number of names that “collide” on the same initials so you can’t tell which one is the right one.
Let me give you analogy using addition as an example. 4+6=10. That is, the inputs were 4 and 6 and output was 10. But so is the case for 5+5 and 9+1 and so on.
The sha256 you see is similar to that 10. Once you have 10, you can never tell what two numbers added up to give you 10.
Hence all hashing algorithms are one way functions.
When we say hash is reversed, all it means an input has been found that when hashed results into the same hash you got with original input.
In above example, if 10 was obtained using 4+6, a hash is broken with all those combinations that sums up to 10. Hence, hashing algorithms are designed in such a way that probability of getting these hash collisions are very low.
SHA is a hashing function. It has multiple inverse. But your question still hold, why can’t we easily find one inverse?
And it’s hard because it’s like a puzzle that you build from the solution. For example: take a password, give a unique number to each symbol, then you compute the sum of all those numbers and the product of all those numbers. Your hash is this sum and product.
Now inversing this means you need to find a list of numbers of which the sum and products are a match. It’s suddenly harder. There are many solutions, but they are tricky to find, you need to solve the puzzle, while building the puzzle from the solution was easy. And if somebody give you a valid solution, it’s easy to check it’s valid.
SHA256 has 256 puzzles to solve all at the same time (sum and product is only 2), and they are carefully chosen so that it’s hard to correlate each of them to each others.
Latest Answers