The idea is that many of our societies social norms are based on the reasonable expectation of privacy. But the reasonable expectation of privacy is increasingly a thing of the past. Three types of data I’ve been thinking about are:
- Obviously identifying data: Data like cellphone GPS traces and public social media posts are obviously information that is indentifiable and reduce privacy.
- Data that can be inferred from public data: We can also now infer a lot about people given the data that is public. For example a couple of years ago I challenged the students in my advanced data science class to predict the Gail score - one of the most widely used measures of breast cancer risk - using only the information available from a person’s public Facebook profile. While not all of the information was available, a good fraction of it was. This is an example of something you might not think that posting pictures of your family, your birthday celebrations, and family life events could enable. I was reminded of this when hearing about this paper that claims to be able to deidentify up to 99.98\% of Americans using only 15 pieces of demographic information.
- Data other people share about us: The stories around the capture of the Golden Gate Killer using genealogy data make it clear that even when you personally don’t share your data, someone else may be sharing it for you. The same can be said of photos of you that were tagged on Facebook even if you aren’t on the platform.
I don’t think these types of data are going to magically disappear. So like a lot of other people I’ve been wondering how we should individually and as a society adapt to the world where privacy is no longer an expectation.