Continuum between anecdote and data

March 4, 2016

(This article was originally published at Statistics – John D. Cook, and syndicated at StatsBlogs.)

The difference between anecdotal evidence and data is overstated. People often have in mind this dividing line where observations on one side are worthless and observations on the other side are trustworthy. But there’s no such dividing line. Observations are data, but some observations are more valuable than others, and there’s a continuum of value.

Rib eye steak

I believe rib eye steaks are better for you than rat poison. My basis for that belief is anecdotal evidence. People who have eaten rib eye steaks have fared better than people who have eaten rat poison. I don’t have exact numbers on that, but I’m pretty sure it’s true. I have more confidence in that than in any clinical trial conclusion.

Hearsay evidence about food isn’t very valuable, per observation, but since millions of people have eaten steak for thousands of years, the cumulative weight of evidence is pretty good that steak is harmless if not good for you. The number of people who have eaten rat poison is much smaller, but given the large effect size, there’s ample reason to suspect that eating rat poison is a bad idea.

Now suppose you want to get more specific and determine whether rib eye steaks are good for you in particular. (I wouldn’t suggest trying rat poison.) Suppose you’ve noticed that you feel better after eating a steak. Is that an anecdote or data? What if you look back through your diary and noticed that every mention of eating steak lately has been followed by some remark about feeling better than usual. Is that data? What if you decide to flip a coin each day for the next month and eat steak if the coin comes up heads and tofu otherwise. Each of these steps is an improvement, but there’s no magical line you cross between anecdote and data.

Suppose you’re destructively testing the strength of concrete samples. There are better and worse ways to conduct such experiments, but each sample gives you valuable data. If you test 10 samples and they all withstand two tons of force per square inch, you have good reason to believe the concrete the samples were taken from can withstand such force. But if you test a drug on 10 patients, you can’t have the same confidence that the drug is effective. Human subjects are more complicated than concrete samples. Concrete samples aren’t subject to placebo effects. Also, cause and effect are more clear for concrete. If you apply a load and the sample breaks, you can assume the load caused the failure. If you treat a human for a disease and they recover, you can’t be as sure that the treatment caused the recovery. That doesn’t mean medical observations aren’t data.

Carefully collected observations in one area may be less statistically valuable than anecdotal observations in another. Observations are never ideal. There’s always some degree of bias, effects that can’t be controlled, etc. There’s no quantum leap between useless anecdotes and perfectly informative data. Some data are easy to draw inference from, but data that’s harder to understand doesn’t fail to be data.

Click to learn more about Bayesian statistics consulting

Please comment on the article here: Statistics – John D. Cook

Tags: , ,