eli5: why do public keys encrypt some messages the same?


I heard that along with a public key, you need a initalization vector so that messages are not encryption identically in different instances. Since all a public key does is multiple prime numbers together, why are some keys the same as others? There are an infinite number of prime numbers (I think) so there should be no problem.

In: 0

The idea is that if you encrypt the *same* message over and over again with the same key – and users don’t change keys all that often, though it does happen – it would be possible for an adversary to notice that you’re sending the same encrypted data over and over again. Like, if after sending encrypted message A you then leave your house and go to the grocery store, they can deduce that encrypted message A is related even though they can’t actually read it.

And yes, since public keys are just a mathematical formula, applying the same formula to the same input would always produce the same output. That’s a weakness, and this is how we deal with it. The initialization vector is just a hint of randomness added to the message to prevent this. Instead of a series of encrypted messages decrypting to read:

* `I’m going to the grocery store`
* `I’m going to the grocery store`
* `I’m going to the grocery store`

It would become:

* `g74j I’m going to the grocery store`
* `894k I’m going to the grocery store`
* `0012 I’m going to the grocery store`

And so on and so forth.

This is not specific to public/private keys. Symmetrical encryption like AES absolutely has this same problem, and any half-decent encryption scheme has to deal with the fact that encryption is done in blocks – AES is 128 bits at a time, for example – and so you need to protect blocks that are part of the same bigger message from the same problem. It’s common for each block to use a constantly changing Initialization Vector that’s related to the previous block in some way, with the very first block being the where the IV is most critical to select, hence the name *Initial*ization Vector. Properly, I should not call it the IV for any other block but the first.

[Wikipedia has an example](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation) with a picture that encrypted as raw pixels, but no initialization vector used on any block. You can still make out the picture of the penguin that was encrypted, showing that encryption has failed to properly protect the hidden information.

In a *properly implemented* public key system with strong key length (such as 1024 bit RSA) the probability of the same public key coming up more than once is very very small (~1/2^200 ). Duplicate public keys are going to be exceedingly rare.

Unfortunately, about a decade ago it was found that many SSL certificate keys were using the same numbers as one of the two prime factors used to generate their public keys. This creates a weakness that could theoretically be exploited. It would still take significant effort, but the expected work required to break encryption would be quite reduced.

I’ve never seen an explanation of why this happened, but it’s likely due to poor random number generation. RNGs cannot be purely algorithmic if you want truly random outputs, and there are lots of ways to screw it up.

But it’s possible that there was some other programming error(s) responsible. Most cryptographic weaknesses are found to be due to flawed implementations, not flaws in the algorithm itself.

Lets say I’m sending you a message “password” with my private key. The encryption is extremely simple: “blah”. Now, the attacker may be able to deduce that this message is “password” based on frequency of the phrase or your messages might have some sort of standard. So they know the following details:

* public key
* encrypted message
* decrypted message (of some information)

instead of the following details w/o the initialization vector:

* public key
* encrypted message

they will be able to crack the private key much quicker w/ the known result depending on the cryptography function being used.

Your Public key is not likely to be the same as another person’s. It’s possible, if you both generated the same private key from some bad implementation, but as soon as you notice that you need to change keys.

That said, every time you encrypt the same message with your key, you will get the same answer. It’s math, and getting the same answer every time is the way it needs to work. So getting the same message isn’t the same thing as having the same key. You really ought to add a little IV to your message, to guard against that, as /u/DeHackEd explains in detail.

Small side note – public key cryptography does not typically use initialization vectors, they rely on other things (random padding in the case of RSA, or whole random keys in the case of elliptic curves) to get the property that IVs are used for in symmetric cryptography, which is to make sure the same message gets different ciphertexts when you encrypt it multiple times. An IV is specific to cryptographic algorithms that have many stages (e.g. in AES, you split the message into smaller blocks, and then encrypt each one) , so if you randomize the first state the remaining ones get correspondingly scrambled.