# Yes, you can do statistical inference from nonrandom samples. Which is a good thing, considering that nonrandom samples are pretty much all we’ve got.

December 13, 2017
By

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Luiz Caseiro writes:

1. P-values and Confidence Intervals are used to draw inferences about a population from a sample. Is that right?

2. As far as I researched, standard statistical softwares usually compute confidence intervals (CI) and p-values assuming that we have a simple random sample. Is that right?

3. If we have another kind of representative sample, different from a simple random sample (i.e. a complex sample), we should take into account our sample design before calculating CI and p-values. Is that right?

4. If we do not have a representative sample, as it is often the case in political science (specially when the sample is a convenience sample, made of some countries for which data is available), would not it be irrelevant and even misleading to report CI and p-values?

This question comes up from time to time (for example, in 2009, 2011, 2014, and 2014), so I’m well prepared to reply to this one.

In response to Caseiro: Yes, the starting point in statistical theory is the assumption of simple random sampling, but there are methods for dealing with stratified samples, cluster samples, etc. There are textbooks on this and statistical packages that do it. If you have a convenience sample, it’s still a good idea to report standard errors etc.; you just need to make assumptions.

Caseiro follows up:

1. If I have a convenience sample the assumption that I need to make when reporting standard errors, CI, etc. is that my convenience sample is not very different from a random sample? This sounds like a very strong assumption.

2. Would not it be more accurate to just say that I cannot reach external validity from my sample?

3. If I do not claim external validity, then the standard errors become unnecessary?

My reply: It all depends on what questions you want to answer. If you simply want to describe the data you have, go for it. But usually we gather data to understand something about unobservables or to make predictions about new situations. In that case, you’ll have to make some assumptions. If your data are weak, your assumptions need to be correspondingly stronger.

To put it another way: Sure, it’s fine to say that you “cannot reach external validity” from your sample alone. But in the meantime you still need to make decisions. We don’t throw away the entire polling industry just cos their response rates are below 10%; we work on doing better. Our samples are never perfect but we can make them closer to the population.

Remember the Chestertonian principle that extreme skepticism is a form of credulity.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: ,

 Tweet

Email: