Blog Archives

Making sense of a probability problem in the WSJ

January 1, 2018
By
Making sense of a probability problem in the WSJ

Someone wrote to me the other day asking if I could explain a probability example from the Wall Street Journal. (“Proving Investment Success Takes Time,” Spencer Jakab, November 25, 2017.) Victor Haghani … and two colleagues told several hundred acquaintances who worked in finance that they would flip two coins, one that was normal and […]

Read more »

How can a statistician help a lawyer?

December 9, 2017
By
How can a statistician help a lawyer?

I’ll be presenting at a webinar on Wednesday, December 13 at 1:00 PM Eastern. The title of the presentation is “Seven questions a statistician and answer for an attorney.” I will discuss, among other things, when common sense applies and when correct analysis can be counter-intuitive. There will be ample time at the end of […]

Read more »

Handedness, introversion, height, blood type, and PII

November 16, 2017
By
Handedness, introversion, height, blood type, and PII

I’ve had data privacy on my mind a lot lately because I’ve been doing some consulting projects in that arena. When I saw a tweet from Tim Hopper a little while ago, my first thought was “How many bits of PII is that?”. [1] π Things Only Left Handed Introverts Over 6′ 5″ with O+ […]

Read more »

Pareto distribution and Benford’s law

November 16, 2017
By
Pareto distribution and Benford’s law

The Pareto probability distribution has density for x ≥ 1 where a > 0 is a shape parameter. The Pareto distribution and the Pareto principle (i.e. “80-20” rule) are named after the same person, the Italian economist Vilfredo Pareto. Samples from a Pareto distribution obey Benford’s law in the limit as the parameter a goes to […]

Read more »

Random number generation posts

November 15, 2017
By

Random number generation is typically a two step process: first generate a uniformly distributed value, then transform that value to have the desired distribution. The former is the hard part, but also the part more likely to have been done for you in a library. The latter is relatively easy in principle, though some distributions […]

Read more »

Quantifying information gain in beta-binomial Bayesian model

November 13, 2017
By

The beta-binomial model is the “hello world” example of Bayesian statistics. I would call it a toy model, except it is actually useful. It’s not nearly as complicated as most models used in application, but it illustrates the basics of Bayesian inference. Because it’s a conjugate model, the calculations work out trivially. For more on […]

Read more »

Wheels about to be reinvented

November 8, 2017
By

As companies get into data analysis for the first time, many of them are going to start by making the same mistakes that were common a century ago, then gradually recapitulate the development of modern statistics.  

Read more »

Database anonymization for testing

November 3, 2017
By

How do you create a database for testing that is like your production database? It depends on in what way you want the test database to be “like” the production one. Replacing sensitive data Companies often use an old version of their production database for testing. But what if the production database has sensitive information […]

Read more »

Quantifying privacy loss in a statistical database

September 20, 2017
By
Quantifying privacy loss in a statistical database

In the previous post we looked at a simple randomization procedure to obscure individual responses to yes/no questions in a way that retains the statistical usefulness of the data. In this post we’ll generalize that procedure, quantify the privacy loss, and discuss the utility/privacy trade-off. More general randomized response Suppose we have a binary response […]

Read more »

Randomized response, privacy, and Bayes theorem

September 19, 2017
By
Randomized response, privacy, and Bayes theorem

Suppose you want to gather data on an incriminating question. For example, maybe a statistics professor would like to know how many students cheated on a test. Being a statistician, the professor has a clever way to find out what he wants to know while giving each student deniability. Randomized response Each student is asked […]

Read more »


Subscribe

Email:

  Subscribe