An interesting point came up in a comment thread the other day and you might have missed it, so I’ll repeat it here.
Dan Goldstein wrote to me:
Many times I’ve heard you say people should improve the quality of their measurements. Have you considered that people may be quite close to the best quality of measurement they can achieve?
Have you thought about the degree of measurement improvement that might actually be achievable?
And what that would mean for the quality of statistical inferences?
Competent psychophysicists are getting measurements that are close to the best they can reasonably achieve. Equipment that costs ten times more might only reduce error by one thousandth. It’s the variation between people that gets ya.
There are subfields where measurement is taken seriously. You mention psychophysics; other examples include psychometrics and of old-fashioned physics and chemistry. In those fields, I agree that there can be diminishing returns from improved measurement.
What I was talking about are the many, many fields of social research where measurement is sloppy and noisy. I think the source of much of this is a statistical ideology that measurement doesn’t really matter.
The reasoning, I think, goes like this:
1. Measurement has bias and variance.
2. If you’re doing a randomized experiment, you don’t need to worry about bias because it cancels out in the two groups.
3. Variance matters because if your variance his higher, your standard errors will be higher and so you’ll be less likely to achieve statistical significance.
4. If your findings are statistically significant, then retroactively you can say that your standard error was not too high, hence measurement variance did not materially affect your results.
5. Another concern is that you were not measuring quite what you thought you were measuring. But that’s ok because you’ve still discovered something. If you claimed that Y is predicted from X but you didn’t actually measure X, you were actually measuring Z, then you just change the interpretation of your finding: you’ve now discovered that Y is predicted from Z, and you still have a finding.
Put the above 5 steps together and you can conclude that as long as you achieve statistical significance from a randomized experiment, you don’t have to worry about measurement. And, indeed, I’ve seen lots and lots of papers in top journals, written by respected researchers, that don’t seem to take measurement at all seriously (again, with exceptions, especially in fields such as psychometrics that are particularly focused on measurement).
I’ve never seen steps 1-5 listed explicitly in the above form, but it’s my impression that this the implicit reasoning that allows many many researchers to go about their work without concern about measurement error. Their reasoning is, I think, that if measurement error were a problem, it would show up in the form of big standard errors. So when standard errors are big and results are not statistically significant, then they might start to worry about measurement error. But not before.
I think the apparent syllogism of steps 1-5 above is wrong. As Eric Loken and I have discussed, when you have noisy data, a statistical significant finding doesn’t tell you so much. The fact that a result is statistically significant does not imply that your measurement error was so low that your statistically significant finding can be trusted.
If all of social and behavioral science were like psychometrics and psychophysics, I’d still have a lot to talk about, but I don’t think I’d need to talk so much about measurement error.
tl;dr: Measurement is always important and should always be emphasized, but in some fields there is already a broad recognition of the importance of measurement, and researchers in those fields don’t need me to harangue them about the importance of measurement. But even they often don’t mind that I talk about measurement so much, because they recognize researchers in other subfields are not always aware of the importance of measurement, with the unawareness arising perhaps from a misunderstanding of statistical significance and evidence.
Ummm, I guess I just violated the “tl;dr” principle by writing a tl;dr summary that itself was a long paragraph. That’s academic writing for ya! Whatever.