This post was provoked by a recent blog at the New Yorker by Gary Marcus and Ernest Davis. The topic of their blog was a criticism of Nate Silver's book, The Signal and the Noise. In the course of their critique, Marcus and Davis go through a case of the standard textbook example, involving diagnosis of breast cancer. They assume, for purposes of the example, that there is a well-known prior probability of breast cancer. They say,
A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation. ... But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be.I think that the second sentence above is exactly wrong. The Bayesian approach is also helpful, perhaps uniquely helpful, when there is uncertainty in the prior. All we have to do is express the uncertainty in a mathematical model, and then let Bayesian inference tell us how to re-allocate uncertainty given the data. The present blog post is an explicit illustration of one way to do this.
Please note that this post is not defending or criticizing Nate Silver's book. This post is about what Marcus and Davis say about Bayesian methods, illustrated specifically by the case of disease diagnosis. This post is not about anything Silver does or doesn't say about Bayesian or non-Bayesian methods. My goal is to clarify what Bayesian methods can do, and specifically one way for expressing uncertainty in the case of disease diagnosis.
First, the standard textbook example, as provided by Marcus and Davis in their blog:
Suppose, for instance (borrowing an old example that Silver revives), that a woman in her forties goes for a mammogram and receives bad news: a “positive” mammogram. However, since not every positive result is real, what is the probability that she actually has breast cancer? To calculate this, we need to know four numbers. The fraction of women in their forties who have breast cancer is 0.014 [bold added], which is about one in seventy. The fraction who do not have breast cancer is therefore 1 - 0.014 = 0.986. These fractions are known as the prior probabilities. The probability that a woman who has breast cancer will get a positive result on a mammogram is 0.75. The probability that a woman who does not have breast cancer will get a false positive on a mammogram is 0.1. These are known as the conditional probabilities. Applying Bayes’s theorem, we can conclude that, among women who get a positive result, the fraction who actually have breast cancer is (0.014 x 0.75) / ((0.014 x 0.75) + (0.986 x 0.1)) = 0.1, approximately. That is, once we have seen the test result, the chance is about ninety per cent that it is a false positive.In the case above, the prior probability (emphasized by bold font), is assumed to be exactly 0.014. This probability presumably came from some gigantic previous survey of women, using some extremely accurate assay of the disease, so that the prior probability is conceived as having no uncertainty.
But what if there is uncertainty about the prior probability of the disease? What if, instead a gigantic previous survey, there was only a small survey, or no previous survey at all? Bayesian methods handle this situation well. I've got to introduce a little mathematical notation here, but the ideas are straightforward. Let's denote the true probability of the disease in the population as the symbol θ (Greek letter theta). The standard example above stated that θ=0.014. But in Bayesian methods, we can put a distribution of relative credibility across the entire range of θ values from 0 to 1. Instead of assuming we believe only in the exact value θ=0.014, we say there is a spectrum of possibilities, and the prior knowledge expresses how certain we are in the various possible values of θ. The distribution across the range of θ values could be very broad --- that's how we express uncertainty. Or, the distribution could be sharply peaked over θ=0.014 --- that's how we express certainty.
Here is the standard example in which there is high certainty that θ=0.014, with the positive test result denoted mathematically as y=1.
We can use Bayesian inference seamlessly when the prior is less certain. Suppose, for example, that the previous survey involved only 20 women. Then the prior is a broader distribution over possible underlying probabilities of the disease, and the result looks like this:
We can use Bayesian inference seamlessly even when the prior is hugely uncertain. Suppose that the previous survey involved only 2 women, which abstractly means merely that we know the disease can happen, but that's all we know. Then the prior distribution looks like this:
In this post I've tried to show that Bayesian inference is seamlessly useful regardless of how much uncertainty there is in prior knowledge. The simplistic framing of the standard textbook example should not be construed as the only way to do Bayesian analysis. Indeed, the whole point of Bayesian inference is to express uncertainty and then re-allocate credibility when given new knowledge. If there is lack of consensus in prior knowledge, then the prior should express the lack of consensus, for example with a broad prior distribution as illustrated in the examples above. If different camps have different strong priors, then Bayesian analysis can tell each camp how they should re-allocate their idiosyncratic beliefs. With enough data, the idiosyncratic posterior distributions will converge, despite starting with different priors.
By the way, I am not aware of the sort of analysis I've provided above appearing elsewhere in the literature. But it's straightforward, and I imagine it must be "out there" somewhere. If any of you can point me to it, I'd appreciate it.
Please comment on the article here: Doing Bayesian Data Analysis