The HIPAA Privacy Rule offers two ways to say that data has been de-identified: Safe Harbor and expert determination. This post is about the former. I help companies with the latter. Safe Harbor provision The Safe Harbor provision lists 18 categories of data that would cause a data set to not be considered de-identified unless […]
Hongsheng Dai, Murray Pollock (University of Warwick), and Gareth Roberts (University of Warwick) just arXived a paper we discussed together while I was at Warwick. Where fusion means bringing different parts of the target distribution f(x)∝f¹(x)f²(x)… together, once simulation from each part has been done. In the same spirit as in Scott et al. (2016) […]
Imagine this conversation. “Could you tell me your social security number?” “Absolutely not! That’s private.” “OK, how about just the last four digits?” “Oh, OK. That’s fine.” When I was in college, professors would post grades by the last four digits of student social security numbers. Now that seems incredibly naive, but no one objected […]
FAS posted an article yesterday explaining how blurring military installations out of satellite photos points draws attention to them, showing exactly where they are and how big they are. The Russian mapping service Yandex Maps blurred out sensitive locations in Israel and Turkey. As the article says, this is an example of the Streisand effect, […]
As mentioned in the previous post, Latanya Sweeney estimated that 87% of Americans can be identified by the combination of zip code, sex, and birth date. We’ll do a quick-and-dirty estimate and a simulation to show that this result is plausible. There’s no point being too realistic with a simulation because the actual data that […]
In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous. The state of Massachusetts had released data on 135,000 state employees and their families with obvious identifiers removed. However, the data contained zip code, birth date, and sex for each individual. Sweeney was able to cross reference this data with publicly available […]
The image below is a static screen shot of an interactive visualization of the world’s biggest data breaches. The site lets you filter the data by industry and type of breach. See the site for credits and the raw data.
Erlingsson et al give a poetic description of privacy-preserving analysis in their RAPPOR paper . They say that the goal is to … allow the forest of client data to be studied, without permitting the possibility of looking at individual trees. Related posts What is differential privacy? Data privacy consulting  Úlfar Erlingsson, Vasyl Pihur, and […]