# Posts Tagged ‘ Probability and Statistics ’

## Continuum between anecdote and data

March 4, 2016
By

The difference between anecdotal evidence and data is overstated. People often have in mind this dividing line where observations on one side are worthless and observations on the other side are trustworthy. But there’s no such dividing line. Observations are data, but some observations are more valuable than others, and there’s a continuum of value. I believe […]

## The empty middle: why no one is average

February 20, 2016
By

In 1945, a Cleveland newspaper held a contest to find the woman whose measurements were closest to average. This average was based on a study of 15,000 women by Dr. Robert Dickinson and embodied in a statue called Norma by Abram Belskie. Out of 3,864 contestants, no one was average on all nine factors, and fewer than 40 […]

## Improving on Chebyshev’s inequality

February 12, 2016
By

Chebyshev’s inequality says that the probability of a random variable being more than k standard deviations away from its mean is less than 1/k2. In symbols, This inequality is very general, but also very weak. It assumes very little about the random variable X but it also gives a loose bound. If we assume slightly more, […]

## Connection between hypergeometric distribution and series

February 8, 2016
By

What’s the connection between the hypergeometric distributions, hypergeometric functions, and hypergeometric series? The hypergeometric distribution is a probability distribution with parameters N, M, and n. Suppose you have an urn containing N balls, M red and the rest, N – M blue and you select n balls at a time. The hypergeometric distribution gives the probability of selecting k red balls. The probability generating function […]

## Reproducible randomized controlled trials

February 1, 2016
By

“Reproducible” and “randomized” don’t seem to go together. If something was unpredictable the first time, shouldn’t it be unpredictable if you start over and run it again? As is often the case, we want incompatible things. But the combination of reproducible and random can be reconciled. Why would we want a randomized controlled trial (RCT) to […]

## Random number generator seed mistakes

January 29, 2016
By

Long run or broken software? I got a call one time to take a look at randomization software that wasn’t randomizing. My first thought was that the software was working as designed, and that the users were just seeing a long run. Long sequences of the same assignment are more likely than you think. You […]

## MCMC burn-in

January 25, 2016
By

In Markov Chain Monte Carlo (MCMC), it’s common to throw out the first few states of a Markov chain, maybe the first 100 or the first 1000. People say they do this so the chain has had a chance to “burn in.” But this explanation by itself doesn’t make sense. It may be good to […]

## Introduction to MCMC

January 23, 2016
By

Markov Chain Monte Carlo (MCMC) is a technique for getting your work done when Monte Carlo won’t work. The problem is finding the expected value of f(X) where X is some random variable. If you can draw independent samples xi from X, the solution is simple: When it’s possible to draw these independent samples, the sum above is well […]

## Big p, Little n

January 7, 2016
By

Statisticians use n to denote the number of subjects in a data set and p to denote nearly everything else. You’re supposed to know from context what each p means. In the phrase “big n, little p” the symbol p means the number of measurements per subject. Traditional data sets are “big n, little p” […]

## The longer it has taken, the longer it will take

December 21, 2015
By

Suppose project completion time follows a Pareto (power law) distribution with parameter α. That is, for t > 1, the probability that completion time is bigger than t is t-α. (We start out time at t = 1 because that makes the calculations a little simpler.) Now suppose we know that a project has lasted […]