Statistics in a world where nothing is random

December 17, 2012

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Rama Ganesan writes:

I think I am having an existential crisis.

I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay.

Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me?

Other than say, something like this.

My reply:

Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model variation, but it’s not necessary. Nor is it necessary to have “true” randomness (of the dice-throwing or urn-sampling variety) in order to have a useful probability model.

For example, consider our work in Red State Blue State, looking at patterns of voting given income and religious attendance by state. Here we did have random sampling—we were working with survey data—but even if we’d had no sampling at all, if we’d had a census of opinions of all voters, we’d still have statistics problems. So I don’t think random sampling is necessary for statistics.

To answer your question about nonrepresentative samples, there I think it’s best to adjust for known and modeled differences between sample and population. Here the idea of random sampling is a useful start and a useful comparison point.

Ganesan writes back:

Yes but all we seem to teach students is significance testing where randomness is assumed.

How far can I get away with saying that t-tests, ANOVAs are ‘robust’ to violations of this assumption??

My reply:

One approach is to forget the t tests, F tests, etc. and instead frame problems as quantitative comparisons, predictions, and causal inferences (which are a form of prediction of potential outcomes). You get the conf intervals, s.e.’s, etc from a random sampling model that you recognize is an approximation. This all loops back to Phil’s recent discussion.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: ,