K-anonymity.

1.03K views

I have read a bunch about this, but I am not a data scientist, mathematician, or programmer. I am doing a report on de-identification and this is one of the methods I am researching, but I cannot for the life of me wrap my head around it.

In: Technology

2 Answers

Anonymous 0 Comments

Say you have a database of people: age, gender, city of birth, postal code, occupation. If there’s a row in that database where a person has unique values in all those columns, knowing all the columns would 100% be able identify the person with all that info. Now say there are 2 women who share an age, city of birth, postal code and occupation, you’d have a 50% chance of identifying them knowing all those columns. You can calculate these probability based on the commonalities of all the columns.
Usually you’d set a value for the probability of identifying a person, say 25%. Now any cell that can be used to narrow a persons identity with more than 25% probability, you mask. So back to the 2 women, you can mask their city of birth for example so that you’d only have the other 3 columns to use to try and identify them – this would decrease the probability of finding them to be lower than 50% since there are others who share those 3 columns – it’s no longer only 2 people with those 3 columns in common.

You are viewing 1 out of 2 answers, click here to view all answers.