Blog Archives

Quantifying privacy loss in a statistical database

September 20, 2017
By
Quantifying privacy loss in a statistical database

In the previous post we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post we’ll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off. More general randomized response Suppose we have a binary response […]

Read more »

Randomized response, privacy, and Bayes theorem

September 19, 2017
By
Randomized response, privacy, and Bayes theorem

Suppose you want to gather data on an incriminating question. For example, maybe a statistics professor would like to know how many students cheated on a test. Being a statistician, the professor has a clever way to find out what he wants to know while giving each student deniability. Randomized response Each student is asked […]

Read more »

Quantifying the information content of personal data

September 12, 2017
By

It can be surprisingly easy to identify someone from data that’s not directly identifiable. One commonly cited result is that the combination of birth date, zip code, and sex is enough to identify most people. This post will look at how to quantify the amount of information contained in such data. If the answer to […]

Read more »

Negative correlation introduced by success

September 10, 2017
By
Negative correlation introduced by success

Suppose you measure people on two independent attributes, X and Y, and take those for whom X+Y is above some threshold. Then even though X and Y are uncorrelated in the full population, they will be negatively correlated in your sample. This article gives the following example. Suppose beauty and acting ability were uncorrelated. Knowing how […]

Read more »

Bayesian methods at Bletchley Park

July 25, 2017
By
Bayesian methods at Bletchley Park

From Nick Patterson’s interview on Talking Machines: GCHQ in the ’70s, we thought of ourselves as completely Bayesian statisticians. All our data analysis was completely Bayesian, and that was a direct inheritance from Alan Turing. I’m not sure this has ever really been published, but Turing, almost as a sideline during his cryptoanalytic work, reinvented […]

Read more »

Testing the PCG random number generator

July 7, 2017
By

M. E. O’Neill’s PCG family of random number generators looks very promising. It appears to have excellent statistical and cryptographic properties. And it takes remarkably little code to implement. (PCG stands for Permuted Congruential Generator.) The journal article announcing PCG gives the results of testing it with the TestU01 test suite. I wanted to try it out […]

Read more »

Effective sample size for MCMC

June 27, 2017
By
Effective sample size for MCMC

In applications we’d like to draw independent random samples from complicated probability distributions, often the posterior distribution on parameters in a Bayesian analysis. Most of the time this is impractical. MCMC (Markov Chain Monte Carlo) gives us a way around this impasse. It lets us draw samples from practically any probability distribution. But there’s a […]

Read more »

Why do linear prediction confidence regions flare out?

June 26, 2017
By

Suppose you’re tracking some object based on its initial position x0 and initial velocity v0. The initial position and initial velocity are estimated from normal distributions with standard deviations σx and σv. (To keep things simple, let’s assume our object is moving in only one dimension and that the distributions around initial position and velocity […]

Read more »

Extreme beta distributions

June 20, 2017
By
Extreme beta distributions

A beta probability distribution has two parameters, a and b. You can think of these as the number of successes and failures out of a+b trials. The PDF of a beta distribution is approximately normal if a and b are approximately equal and a + b is large. If a and b are close, they don’t have to be very large for the beta […]

Read more »

Quantile-quantile plots and powers of 3/2

April 2, 2017
By
Quantile-quantile plots and powers of 3/2

This post serves two purposes. It will empirically explore a question in number theory and demonstrate quantile-quantile (q-q) plots. It will shed light on a question raised in the previous post. And if you’re not familiar with q-q plots, it will serve as an introduction to such plots. The previous post said that for almost all x > […]

Read more »


Subscribe

Email:

  Subscribe