(This article was originally published at Error Statistics Philosophy » Statistics, and syndicated at StatsBlogs.)

*Jean A. Miller, PhD*

*Department of Philosophy
Virginia Tech*

*MIX & MATCH MESS: A NOTE ON A MISLEADING DISCUSSION OF MAYO’S BIRNBAUM PAPER*

Mayo in her “rejected” post (12/27/12) briefly points out how Mark Chang, in his book *Paradoxes of Scientific Inference* (2012, pp. 137-139), took pieces from the two distinct variations she gives of Birnbaum’s arguments, either of which shows the unsoundness of Birnbaum’s purported proof, and illegitimately combines them. He then mistakenly maintains that it is Mayo’s conclusions that are “faulty” rather than Birnbaum’s argument. In this note, I just want to fill in some of the missing pieces of what is going on here, so that others will not be misled. I put together some screen shots so you can read exactly what he wrote pp. 137-139. (See also Mayo’s note to Chang on Xi’an’s blog here.)

First, let us be clear about what Mayo is doing in her Birnbaum paper. She is applying what philosopher’s call the principle of charity to Birnbaum’s arguments, which is to try to generously interpret an argument so as to put it in its best light, to make it as strong and convincing as possible. In plain English, she is giving Birnbaum’s argument its best shot. To follow what she is doing requires that we be clear on what a deductive argument is and what is needed to defeat one. A deductive argument is first and foremost characterized by validity, which simply means that the form of the argument (how the premises and conclusion link together) is such that: if all the premises are true then the conclusion cannot be false—it must also be true. But validity is not enough, for it only states that if the premises are true, then the conclusion cannot be false, and what is really desired is a sound argument—a valid one in which the premises are true. Thus there are two ways to attack a deductive argument: show that the premises cannot all be true or show that the argument (form) is invalid. Success in either case means the truth of the conclusion is no longer guaranteed.

**Mayo’s Two Variations of Birnbaum’s argument for the Strong Likelihood Principle (SLP):
**Mayo’s first outline of Birnbaum’s argument for the SLP shows that, if his argument is rendered formally valid, then his premises cannot both be true at the same time—hence his argument is unsound. (If the premises are not true then the validity of form does not guarantee that the conclusion is true.) Mayo’s second interpretation allows both of the two premises to be true, but his conclusion still does not follow. Here the two premises are formulated as true conditional, or “if then” statements, but the conclusion may be false. This shows that his argument is invalid and thus again unsound. The problem with Chang’s response to Mayo’s interpretations (e.g., Mayo 2010; or recent updated paper here) is that he conflates these two separate variations into an interpretation that is not Mayo’s.

**Mayo’s First Interpretation:
**Mayo’s first variation shows that Birnbaum’s proof is

*unsound because “both premises cannot be true at the same time*[because] the crucial term shifts its meaning in the two premises.” (Mayo rejected posts Dec 26, emphasis added.) What is the crucial term? The crucial term is the experiment “E-B” that Mayo shows has different meanings in Premise 1 and Premise 2. In premise 1, “E-B” is a third unconditional model of the experiment—using the convex combination of the two possible sampling distributions of the two experiments (Birnbaum’s test statistic); whereas, in Premise 2 it is modeled only on the sampling distribution of the experiment (Ej) actually performed.[i] In effect, Birnbaum has changed the experiment that was run and so for any SLP violations, the distributions of the single experiments and the combined experiment (E-B) will differ (See Mayo paper FN 15). So the two premises cannot both be true as each experiment, which are purportedly the same experiment (this is the slip in crucial term Mayo is pointing out) give different results (lead to different sampling distributions). This then leads to different frequentist measures of evidence, while Birnbaum’s conclusion is that they give the same measure of evidence. Note, she is not claiming that Birnbaum’s argument is invalid here–she is generously putting it in valid form–only that both premises cannot be true at the same time, and hence his argument is unsound.

**Mayo’s Second (Generous) Interpretation:
**In her second rendition of Birnbaum’s argument, Mayo shows that Birnbaum’s

*argument is unsound because it is*

*invalid*. She shows that both premises can be formulated as “if then” (conditional) statements. When construed as conditional claims, both premises can be true, but the SLP cannot be validly inferred as contradictory antecedents would need to hold. As Mayo points out the formal invalidity is proved by any SLP violation since, in that case, the premises are true and the conclusion is false. The problem is that T-B is an unconditional statistic averaged over the sampling distributions of the convex combination of E’ and E”; while the other member of the SLP pair is the normal frequentist statistic based on the sampling distribution of E’ or E” but not both. But you cannot have your cake and eat it too—that is you cannot both condition and not condition without contradiction. (However, these different experiments will (and ought to) give rise to different inferences about the evidence of the experiment.) As Mayo shows “whenever we deal with an SLP violation pair, the two “if then” premises when true yield a false conclusion” (p. 22 of this paper). The two examples she gives of this–(e.g., binomial and negative binomial; stopping rules examples)—provide concrete instantiations of Birnbaum’s argument that simultaneously provide counterexamples to it.[ii]

**Mixing Apples & Oranges: Where Does Chang Go Wrong?
**Chang takes premise 1 from Mayo’s first variation and combines it with premise 2 and her conclusion from her second variation of Birnbaum’s argument, and then claims the resulting interpretation is faulty, that premise 1 makes no sense. Well, of course it doesn’t as it is not the premise she used for making that conclusion.

Remember, Mayo’s first interpretation shows that Birnbaum’s two premises cannot both be true at the same time under the same experiment (it was by changing the experiment that Birnbaum’s proof succeeded); while her second argument showed that even if we allowed for both his premises to be true, done by reformulating them into conditional statements,[iii] then Birnbaum’s proof still failed as it was invalid. The two true premises lead to a false conclusion whenever we have SLP violations. The antecedents of the two now “true” premises led to a contradiction, but this has nothing to do with sufficiency as Chang discusses (pg. 128):

Mayo states: “The problem is that, even by allowing all the premises to be true, the conclusion could follow only if it is assumed that you both should and should not use the conditional formulation. The antecedent of premise (1) is the denial of the antecedent of premise (2).”

However, Mayo’s disproof is faulty because her presumption about the antecedent of premise (1) in Birnbaum’s argument is odd. A sufficient statistic is sufficient for a FAMILY of distributions with different values of the parameter; such a family of distributions often consists of the distributions under Ho and Ha . …(Chang 138)

But the conditional statement Mayo is talking about is not the statement about sufficiency of the Birnbaum statistic T-B in premise 1 of argument 1, but instead is about the *relationship *between the two antecedents in the “true” conditional statements in argument 2 which are:

PR 1 antecedent: Infr

_{E-B}(x’*) iscomputed unconditionallyaveraging over the sampling distribution of T-B

PR 2 antecedent: Infr_{E-B}(E^{j},x^{j}*) iscomputed conditionallyusing the sampling distribution of E^{j}

And these two antecedents (even though the conditional statements of which each *is* *a part* are both true) contradict one another—the inference which Birnbaum wants to make cannot be both unconditional and conditional. Mayo’s point in drawing our attention to the contradiction between the two antecedents matters because Birnbaum needs the two consequents of both “if then” statements to be detached.

So Chang is mistaken, the problem is not that: “Frequentists don’t believe the Likelihood Principle because they don’t believe conditionality…p, 139.” Instead, for frequentists, as Mayo puts it: “The principles of evidence SP and WCP hold within a given statistical model of experiment E.” The problem (rejections) occurs when the SP and WCP are “[a]pplied simultaneously to two opposed models of an experiment, [then] conflicting results are just what should be expected” p. 30.” *In sum, the problem is not with Frequentists accepting SP or WCP but with applying them simultaneously across different models of the experiment—just as Chang’s problem is applying premises across different renditions!*

**References:**

Chang, M. (2012), Paradoxes in Scientific Inference, Chapman and Hall/CRC. Kindle Edition. (Relevant pages can be found here.)

Mayo, D. G. (2012) A Recent update of Birnbaum paper: “On the Birnbaum Argument for the Strong Likelihood Principle“.

Mayo, D. G. (2010). “An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle” in *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science* (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 305-14.

[i] *In Premise 1*, E-B is modeled as reporting the value of sufficient statistic T-B in the ENLARGED experiment. Birnbaum is using the convex combination of the two sampling distributions (E’ and E”).

* In Premise 2*, Birnbaum is replacing the above with the inference from Ej, Xj* modeled in terms of the sampling distribution of Ej ALONE.

The inferences related to each measure using sampling distributions (as sampling theorists do) cannot both be true simultaneously. That is the distribution of T-B (Premise 1) does not equal the distribution of Ej (Premise 2) so both cannot be true. Thus his argument is UNSOUND.

[ii] This shift in crucial terms can be illustrated intuitively using the US tax code as Mayo has also pointed out (12/25/11 post). Married couples have two choices for filing their taxes. They can file “married jointly” where their incomes are combined. In this case, one can calculate the tax responsibility for each spouse: there is a single amount. However, they can also file married separately where each is responsible for paying their portion of the income. Assume the two would not owe the same if they were filing married separately. So the tax responsibility will be different for each person *in each case depending on the method used* to file and so statements about a person’s tax liability will not be simultaneously true under each scenario. Again, just like Birnbaum’s experiment E-B, even though the term “Married Person P has tax liability X” use the same words, the different methods used for computing owed taxes make for different referants (e.g., tax column/experiment run) for the computation.

Both premises can be true (see end note iii below) and the conclusion false. That is:

If Person P’ files married jointly, his/her liability is the same as his/her spouse P”.

If Person P’ files married separately, his/her liability is $A, and if P” files married separately, his/her liability is $B.

Therefore $A = $B.

Birnbaum’s argument in effect infers that $A = $B; whereas we have started with them being different. So the argument is invalid. (See also Mayo’s 12/25/11 post for the “logical set-up.”)

Filed under: Statistics, strong likelihood principle, U-Phil

**Please comment on the article here:** **Error Statistics Philosophy » Statistics**