Why is it possible to “exfiltrate training data” from LLM models when models are just smaller hashes of large data given that a model is usually smaller than the data it was trained on?

140 viewsMathematicsOther

Why is it possible to “exfiltrate training data” from LLM models when models are just smaller hashes of large data given that a model is usually smaller than the data it was trained on?

In: Mathematics

Anonymous 0 Comments

I dont know where you got the idea that LLMs are hashes. They are closer to lossey compression algorithems.

They are trained by taking a large amount of data and anylizing it for patterns, which are then stored. when a new prompt comes in, they use those patterns to extrapolate from the prompt. so if a prompt is too similar to something it has was trained on, applying the patterns will reconstruct the original (or a close approximation)