: how does differential privacy works?

148 views

How does it protect the data in ML

In: 2

Anonymous 0 Comments

By injecting noise into an anonymized dataset, it becomes impossible (or at least, sufficiently difficult) to accurately de-anonymize the data, at the expense of accuracy.

For example, if someone had accurate location history from your phone, even if they didn’t know it was your phone, they could use home, work, and travel behavior to identify you.

However, if each of those points were shifted in a random direction by a random distance by a random time (e.g. to a point within a mile of each location within an hour of the actual timestamp), all they’d be able to tell is that the phone belongs to someone who lives within a mile of the adjusted point, works within a mile of the adjusted point, and commutes somewhere roughly between them, which makes it effectively impossible to identify you.

Of course, there are weaknesses, such as if you can get many location samples from the one phone (all of which will be within a mile of home/work/commute path), or if you are an extreme outlier (e.g. both living and working in the middle of nowhere), but additional noise could be added or the samples could be discarded if there’s that much risk.

And while this might make your specific data inaccurate, if you apply the same settings to the travel behavior of an entire city, you can get something reasonably similar to reality, while being unable to accurately identify any single individual.