Blog Archives

Cepstrum, quefrency, and pitch

May 18, 2016
By
Cepstrum, quefrency, and pitch

John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he […]

Read more »

Continuum between anecdote and data

March 4, 2016
By
Continuum between anecdote and data

The difference between anecdotal evidence and data is overstated. People often have in mind this dividing line where observations on one side are worthless and observations on the other side are trustworthy. But there’s no such dividing line. Observations are data, but some observations are more valuable than others, and there’s a continuum of value. I believe […]

Read more »

The empty middle: why no one is average

February 20, 2016
By
The empty middle: why no one is average

In 1945, a Cleveland newspaper held a contest to find the woman whose measurements were closest to average. This average was based on a study of 15,000 women by Dr. Robert Dickinson and embodied in a statue called Norma by Abram Belskie. Out of 3,864 contestants, no one was average on all nine factors, and fewer than 40 […]

Read more »

Bayesian and nonlinear

February 13, 2016
By

Someone said years ago that you’ll know Bayesian statistics has become mainstream when people no longer put “Bayesian” in the titles of their papers. That day has come. While the Bayesian approach is still the preferred approach of a minority of statisticians, it’s no longer a novelty. If you want people to find your paper interesting, the substance […]

Read more »

Improving on Chebyshev’s inequality

February 12, 2016
By
Improving on Chebyshev’s inequality

Chebyshev’s inequality says that the probability of a random variable being more than k standard deviations away from its mean is less than 1/k2. In symbols, This inequality is very general, but also very weak. It assumes very little about the random variable X but it also gives a loose bound. If we assume slightly more, […]

Read more »

Trends and Opportunities in Data Analysis

February 11, 2016
By
Trends and Opportunities in Data Analysis

Andy Warhol said “In the future, everyone will be world-famous for 15 minutes.” Here’s my 15 seconds of fame, a soundbite from the IBM Insight conference last year. My comments start at 1:30. In a nutshell, I predict that data analyt...

Read more »

Connection between hypergeometric distribution and series

February 8, 2016
By
Connection between hypergeometric distribution and series

What’s the connection between the hypergeometric distributions, hypergeometric functions, and hypergeometric series? The hypergeometric distribution is a probability distribution with parameters N, M, and n. Suppose you have an urn containing N balls, M red and the rest, N – M blue and you select n balls at a time. The hypergeometric distribution gives the probability of selecting k red balls. The probability generating function […]

Read more »

Reproducible randomized controlled trials

February 1, 2016
By
Reproducible randomized controlled trials

“Reproducible” and “randomized” don’t seem to go together. If something was unpredictable the first time, shouldn’t it be unpredictable if you start over and run it again? As is often the case, we want incompatible things. But the combination of reproducible and random can be reconciled. Why would we want a randomized controlled trial (RCT) to […]

Read more »

Random number generator seed mistakes

January 29, 2016
By
Random number generator seed mistakes

Long run or broken software? I got a call one time to take a look at randomization software that wasn’t randomizing. My first thought was that the software was working as designed, and that the users were just seeing a long run. Long sequences of the same assignment are more likely than you think. You […]

Read more »

MCMC burn-in

January 25, 2016
By
MCMC burn-in

In Markov Chain Monte Carlo (MCMC), it’s common to throw out the first few states of a Markov chain, maybe the first 100 or the first 1000. People say they do this so the chain has had a chance to “burn in.” But this explanation by itself doesn’t make sense. It may be good to […]

Read more »


Subscribe

Email:

  Subscribe