I will be giving a talk “Bayesian statistics as a way to integrate intuition and data” at KeenCon, September 11, 2014 in San Francisco. Update: Use promo code KeenCon-JohnCook to get 75% off registration.

Suppose in a company of N employees, m are chosen randomly for drug screening [1]. In two independent screenings, what is the probability that someone will be picked both times? It may be unlikely that any given individual will be picked twice, while being very likely that someone will be picked twice. Imagine m employees […]

The normal distribution can approximate many other distributions, though the details such as quantitative error estimates and what factors improve or degrade the approximation are harder to find. Here are some notes on normal approximations to several ...

When you sort data and look at which sample falls in a particular position, that’s called order statistics. For example, you might want to know the smallest, largest, or middle value. Order statistics are robust in a sense. The median of a sample, for example, is a very robust measure of central tendency. If Bill […]

Suppose you’re drawing random samples uniformly from some interval. How likely are you to see a new value outside the range of values you’ve already seen? The problem is more interesting when the interval is unknown. You may be trying to estimate the end points of the interval by taking the max and min of […]

Cancer research is sometimes criticized for being timid. Drug companies run enormous trials looking for small improvements. Critics say they should run smaller trials and more of them. Which side is correct depends on what’s out there waiting to be discovered, which of course we don’t know. We can only guess. Timid research is rational […]

There’s a theorem in statistics that says You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution is equal to the expected value of the distribution itself. The shorter reading is confusing […]

Russ Roberts had this to say about the proposal to replacing the calculus requirement with statistics for students. Statistics is in many ways much more useful for most students than calculus. The problem is, to teach it well is extraordinarily difficult. It’s very easy to teach a horrible statistics class where you spit back the […]

David Hogg calls conventional statistical notation a “nomenclatural abomination”: The terminology used throughout this document enormously overloads the symbol p(). That is, we are using, in each line of this discussion, the function p() to mean something different; its meaning is set by the letters used in its arguments. That is a nomenclatural abomination. I […]

John Ioannidis stirred up a healthy debate when he published Why Most Published Research Findings Are False. Unfortunately, most of the discussion has been over whether the word “most” is correct, i.e. whether the proportion of false results is more or less than 50 percent. At least there is more awareness that some published results […]