Say you have a database of people: age, gender, city of birth, postal code, occupation. If there’s a row in that database where a person has unique values in all those columns, knowing all the columns would 100% be able identify the person with all that info. Now say there are 2 women who share an age, city of birth, postal code and occupation, you’d have a 50% chance of identifying them knowing all those columns. You can calculate these probability based on the commonalities of all the columns.
Usually you’d set a value for the probability of identifying a person, say 25%. Now any cell that can be used to narrow a persons identity with more than 25% probability, you mask. So back to the 2 women, you can mask their city of birth for example so that you’d only have the other 3 columns to use to try and identify them – this would decrease the probability of finding them to be lower than 50% since there are others who share those 3 columns – it’s no longer only 2 people with those 3 columns in common.
Latest Answers