Birth date, sex, and five-digit zip code are enough information to uniquely identify a large majority of Americans. See more on this here. So if you want to deidentify a data set, the HIPAA Safe Harbor provision says you should chop off the last to digits of a zip code. And even though three-digit zip […]
Category: Privacy
Hashing names does not protect privacy
Secure hash functions are practically impossible to reverse, but only if the input is unrestricted. If you generate 256 random bits and apply a secure 256-bit hash algorithm, an attacker wanting to recover your input can’t do much better than brute force, hashing 256-bit strings hoping to find one that matches your hash value. Even […]
What does CCPA say about de-identified data?
The California Consumer Privacy Act, or CCPA, takes effect January 1, 2020, less than six months from now. What does the act say about using deidentified data? First of all, I am not a lawyer; I work for lawyers, advising them on matters where law touches statistics. This post is not legal advice, but my […]
Protecting privacy while keeping detailed date information
A common attempt to protect privacy is to truncate dates to just the year. For example, the Safe Harbor provision of the HIPAA Privacy Rule says to remove “all elements of dates (except year) for dates that are directly related to an individual …” This restriction exists because dates of service can be used to […]
Internet privacy as seen from 1975
Science fiction authors set stories in the future, but they don’t necessarily try to predict the future, and so it’s a little odd to talk about they “got right.” Getting something right implies they were making a prediction rather than imagining a setting of a story. However, sometimes SF authors do indeed try to predict […]
Comparing Truncation to Differential Privacy
Traditional methods of data de-identification obscure data values. For example, you might truncate a date to just the year. Differential privacy obscures query values by injecting enough noise to keep from revealing information on an individual. Let’s compare two approaches for de-identifying a person’s age: truncation and differential privacy. Truncation First consider truncating birth date […]
State privacy laws to watch
A Massachusetts court ruled this week that obtaining real-time cell phone location data requires a warrant. Utah has passed a law that goes into effect next month that goes further. Police in Utah will need a warrant to obtain location data or to search someone’s electronic files. (Surely electronic files are the contemporary equivalent of […]
Safe Harbor and the calendar rollover problem
Data privacy is subtle and difficult to regulate. The lawmakers who wrote the HIPAA privacy regulations took a stab at what would protect privacy when they crafted the “Safe Harbor” list. The list is neither necessary or sufficient, depending on context, but it’s a start. Extreme values of any measurement are more likely to lead […]
Data privacy Twitter account
My newest Twitter account is Data Privacy (@data_tip). There I post tweets about ways to protect your privacy, statistical disclosure limitation, etc. I had a clever idea for the icon, or so I thought. I started with the default Twitter icon, a sort of stylized anonymous person, and colored it with the same blue and […]
Covered entities: TMPRA extends HIPAA
The US HIPAA law only protects the privacy of health data held by “covered entities,” which essentially means health care providers and insurance companies. If you give your heart monitoring data or DNA to your doctor, it comes under HIPAA. If you give it to Fitbit or 23andMe, it does not. Government entities are not […]
Inferring religion from fitness data
Fitness monitors reveal more information than most people realize. For example, it may be possible to infer someone’s religious beliefs from their heart rate data. If you have location data, it’s trivial to tell whether someone is attending religious services. But you could make a reasonable guess from respiration data alone. Muslim prayers occur at […]
US Census Bureau embraces differential privacy
The US Census Bureau is convinced that traditional methods of statistical disclosure limitation have not done enough to protect privacy. These methods may have been adequate in the past, but it no longer makes sense to implicitly assume that those who would like to violate privacy have limited resources or limited motivation. The Bureau has […]
Congress and the Equifax data breach
Dialog from a congressional hearing February 26, 2019. Representative Katie Porter: My question for you is whether you would be willing to share today your social security, your birth date, and your address at this public hearing. Equifax CEO Mark Begor: I would be a bit uncomfortable doing that, Congresswoman. If you’d so oblige me, […]
Twitter account for data privacy
I’ve started a new Twitter account for data privacy and related topics.
Twitter gave me the handle @data_tip even though that’s not what I typed in, and what I typed in is not being used. Apparently they don’t let you pick your handle…
Supercookies
Supercookies, also known as evercookies or zombie cookies, are like browser cookies in that they can be used to track you, but are much harder to remove. What is a supercookie? The way I first heard supercookies describe was as a cookie that you can appear to delete, but as soon as you do, software […]
Normal approximation to Laplace distribution?
I heard the phrase “normal approximation to the Laplace distribution” recently and did a double take. The normal distribution does not approximate the Laplace! Normal and Laplace distributions A normal distribution has the familiar bell curve shape. A Laplace distribution, also known as a double exponential distribution, it pointed in the middle, like a pole […]
Probabilisitic Identifiers in CCPA
The CCPA, the California Privacy Protection Act, was passed last year and goes into effect at the beginning of next year. And just as the GDPR impacts businesses outside Europe, the CCPA will impact businesses outside California. The law specifically mentions probabilistic identifiers. “Probabilistic identifier” means the identification of a consumer or a device to a […]
Font Fingerprinting
Web sites may not be able to identify you, but they can probably identify your web browser. Your browser sends a lot of information back to web servers, and the combination of settings for a particular browser are usually unique. To get an idea what information we’re talking about, you could take a look at […]
Unstructured data is an oxymoron
Strictly speaking, “unstructured data” is a contradiction in terms. Data must have structure to be comprehensible. By “unstructured data” people usually mean data with a non-tabular structure. Tabular data is data that comes in tables. Each row corresponds to a subject, and each column corresponds to a kind of measurement. This is the easiest data to […]
Why are dates of service on HIPAA’s Safe Harbor list?
The HIPAA Privacy Rule offers two ways to say that data has been de-identified: Safe Harbor and expert determination. This post is about the former. I help companies with the latter. Safe Harbor provision The Safe Harbor provision lists 18 categories of data that would cause a data set to not be considered de-identified unless […]