(This article was originally published at Error Statistics Philosophy » Statistics, and syndicated at StatsBlogs.)

Our current topic, the strong likelihood principle (SLP), was recently mentioned by blogger Christian Robert (nice diagram). So ,since it’s Saturday night, and given the new law just passed in the state of Washington*, I’m going to reblog a post from Jan. 8, 2012, along with a new UPDATE (following a **video** we include as an experiment). The new material will be in red (slight differences in notation are explicated within links).

**(A)** “It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin[i], or else to embrace the strong likelihood principle which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma … The ‘dilemma’ argument is therefore an illusion”. (Cox and Mayo 2010, p. 298)

The “illusion” stems from the sleight of hand I have been explaining in the Birnbaum argument—it starts with Birnbaumization.

** (B)** A reader wrote in that he awaits approval of my argument by either Sir David Cox or Christian Robert ; I cannot vouchsafe for Robert, unless he has revised his first impression in his October 6, 2011 blog (as I hope he has). For in that blog post Robert says

“If Mayo’s frequentist stance leads her to take the sampling distribution into account at all times, this is fine within her framework. But I do not see how this argument contributes to invalidate Birnbaum’s proof.”

[See UPDATE BELOW]

I am taking sampling distributions into account because Birnbaum’s “proof” is supposed to be relevant for a sampling theorist! If it is not relevant for a sampling theorist (my error statistician) then there is no “breakthrough” and there is no special interest in the result (given that Bayesians already have the LP, as do the likelihoodists).[ii] It is only because principles that are already part of the sampling theorist’s steady diet are alleged to entail the LP (in Birbaum’s argument) that Savage declared that, once made aware of Birnbaum’s result, he doubted people would stop at the LP appetizer, but would instead go all the way to consuming the full Bayesian omelet! (For Savage reference, see my new **PAPER **or “Breaking through the Breakthrough” posts **Dec 6 & Dec 7, 2011;]**

Robert’s remark is just [an example] that reveals a deep misunderstanding of sampling theory. (Although I prefer error statistics, I will use sampling theory for this post.) Even if Robert has corrected himself, as I very much hope he has, other readers may be under the same illusion. I had paused to clarify this point in my October 20, 2011 post.

**(C)** Likelihood Principle Violations

My Oct. 20 post was devoted to arguing that it is impossible to understand the whole issue without understanding how it is that frequentist sampling theory violates the LP. That it does so is not a point of controversy, so far as I know:

As Lindley (1971) stresses:

“.. sampling distributions, significance levels, power, all depend on something more [than the likelihood function]–something that is irrelevant in Bayesian inference–namely the sample space” (Lindley p. 436).

He means, once the data are known the sample space is irrelevant for appraisal. (The LP already assumes the statistical model underlying the likelihood is given or not in question.) Or, more recently, take Kadane 2011:

“Significance testing violates the Likelihood Principle, which states that, having observed the data, inference must rely only on what happened, and not on what might have happened but did not. The Bayesian methods explored in this book obey this principle” (Kadane, 439).

“Like their testing cousins, confidence intervals and sets violate the likelihood principle” (ibid. 441).

So it’s hard to see how Robert can really mean to say that sampling distribution considerations are irrelevant, when they are the heart and centerpiece of the Birnbaum argument. Far from being irrelevant, Birnbaum’s result is all about sampling distributions (even if addressed by someone who is not herself a sampling theorist!)

**(D)** Now to consider what Robert says in his OCT. 2011 post, with my remarks following:

**Robert**: “The core of Birnbaum’s proof is relatively simple: given two experiments *E’* and *E”* about the same parameter *θ* with different sampling distributions *f¹* and *f²*, such that there exists a pair of outcomes *(y’, y”) *from those experiments with proportional likelihoods, one considers the mixture experiment where *E’ *and *E”* are each chosen with probability ½.

Then it is possible to build a sufficient statistic *T* that is equal to the data *(j,z)*, except when *j=2* and *z=y”*, in which case *T(j,z)=(1,y’)*.”

**Mayo: ** Put more informally, if y’ and y” is any LP violation pair (i.e., the two would yield different inferences/assessments of the evidence due to the difference in sampling distributions), then it is possible to “build” a statistic T for interpreting them such that y” (from E”) is always reported as y’ from E’.[iii] I called this Birnbaum’s statistic T-BB.[iv] It is possible, in short, to Birnbaumize the result (E’, y’) whenever there is an experiment E”, not performed, that could have resulted in y”, with a proportional likelihood (with the same parameter under investigation and the model assumptions granted).

**Robert**: “This statistic [T-BB] is sufficient”.

**Mayo**: Yes, T-BB is sufficient for an experiment that will report its inference based on the rules of Birnbaumization: The sampling distribution of T-BB is to be the convex combination of the sampling distributions of E’ and E” whenever confronted with an outcome that has an LP violation pair (for more details see posts from Dec. 6, 7, and references within).[v] Cox rightly questions even this first step, but I’m prepared to play along since the “proof” breaks down anyway.[vi]

It should be emphasized that in carrying out this Birnbaumization, one is not free from considering the accompanying sampling distribution (corresponding to the statistic T-BB just “built”): the Birnbaumization move *depends* on having a single sampling distribution (otherwise sufficiency would not apply)[vii].

While Robert switches our Infr_{E}(z) notation (Cox and Mayo 2010) to Birnbaum’s Ev(E, z), I will go ahead and leave it as Ev. Infr_{E} was deliberately designed to be clearer, easier to read, and less likely to hide the very equivocation that is overlooked in this example.

Robert observes:

Whether j = 1 or j = 2, Ev(E-BB, (j, z)) = Ev(E-BB, T(j,z))

This corresponds to my premise (1):

(1) Infr_{E-BB}(E’, y’) = Infr_{E-BB}(E”, y”)

In the relevant case, y’ and y” are LP violation pairs, since only those pose the threat to obeying the LP. So we can focus just on those in this note. In Mayo 2010 I used the * to indicate an outcome is part of an LP violation pair.

**(E) ** Next Robert gives premise (2), though he switches the order: this corresponds to two applications of weak conditionality (WCP) [combining my 2a and 2b]:

(2) Whether j = 1 or j = 2, Ev(E-BB, (j, z)) = Ev(E^{j}, z)

The key issue concerns a quote from me (with Robert’s substitutions of Ev for Infr). Note, by the way, that Robert is alluding to my chapter in Mayo 2010, not the short version that I posted on this blog, Dec 6, 7

**Robert**: “Now, Mayo argues this is wrong because [it asserts that]:

‘[the mixed experiment E-BB] is appropriately identified with an inference from outcome y

^{j}based on the sampling distribution of E^{j}, which is clearly false’”.(p.310)

*(continuing Robert’s quote of me):*

“ ‘The sampling distribution to arrive at Ev(E-BB, (j, y

^{j})) would be the convex combination averaged over the two ways that y^{j}could have occurred. This differs from the sampling distributions of both Ev(E’, y’) and Ev(E”, y”)’. This sounds to me like a direct rejection of the conditionality principle, so I do not understand the point.”(Robert, Oct. 6, 2011 post, p.310)

**Mayo**: I am not at all rejecting the WCP. The passage Robert quotes merely states the obvious; namely, the assertion: the inference computed using the sampling distribution of E-BB is identical to the inference using the sampling distribution of E’ by itself (or E” by itself)—is false! If we are playing Birnbaumization, then the appropriate sampling distribution is the convex combination. (In the section from which Robert is quoting, a reader will note, I have put Birnbaum’s argument in valid form.)

But wait a minute, just a few lines later it turns out Robert does *not* deny my claim! He repeats it as obviously true, …..but suddenly it has become irrelevant.

**Robert**: “Indeed, and rather obviously, the sampling distribution of the evidence *Ev(E ^{*},z^{*})* will differ depending on the experiment. But this is not what is stated by the likelihood principle, which is that the inference itself should be the same for

*y’*and

*y”*not the [sampling?] distribution of this inference” (Robert, p. 310).

**Mayo**: Huh? This makes no sense. There is no inference apart from the sampling distribution for a sampling theorist. One cannot assume there is somehow an inference apart from the sampling distribution. Sampling theory has simply not been understood. Robert’s own rendition of the argument [my Premise 1], depends on a merged sampling distribution, thanks to Birnbaumization; it certainly does not ignore sampling distributions. So I’m afraid I don’t know what Robert is talking about here. (This same point arose in the discussion by Aris Spanos when Robert’s post first appeared.)

Robert [seems to?] go on to deny there are any LP counterexamples, because they all turn on pointing up the difference in sampling distributions! All I can do at this point is go back to where I bagan: listen to Birnbaum, Kadane, Lindley, Savage and everyone else who has discussed the (uncontroversial) fact that error statistics violates the LP! No one would be claiming sampling theory was incoherent were it not that it is prepared to reach different inferences from y’, y” despite their having proportional likelihoods (i.e., despite the conditions for the LP being met), and it does so solely because of a difference in sampling distributions.[viii] [ix]

Kadane, J. (2011), *Principles of Uncertainty*, CRC Press.

Mayo: 10/20/2011 Post: blogging-likelihood-principle-2

* The title is a distant analogue to that song “Don’t Bogart that chalk my friend, pass it on to me”.

**UPDATE**: DECEMBER 8, 2012: Christian Robert writes, on his Nov. 30 blog, that “[a]fter reading again Birnbaum’s proof, while sitting down in a quiet room….I do not see any reason to doubt it.” His confusion, he says,

**“**was caused by mixing

*sufficiency*in the sense of Birnbaum’s mixed experiment with

*sufficiency*in the sense” of his Bayesian model selection method. The point, I take it, is that Birnbaum’s proof doesn’t go through for this method and thus it needn’t obey the SLP.

**But it doesn’t go through for sufficiency in the sense of sampling theory either!**(at least not together with the additional premise needed to detach the SLP.) In fact, I argue, it would only hold for a sense of sufficiency that assumes “SLP pairs” are evidentially equivalent for informative inference (for definitions see several previous discussions). That is just to make its appeal in a “proof” for the SLP entirely circular, as it is.

I haven’t yet seen the book to which Robert is alluding (*Paradoxes in Scientific Inference*)—tried to Kindle it, didn’t work–, but I’ve no reason to doubt his claim that the author has really mixed things up[i]. In Robert’s Nov. 23, 2012 post (reviewing *Paradoxes*) he hints that he came up with “another interpretation of Mayo’s argument that could prove her right!”. He directs the reader to the later, Nov. 30, post which, if I’m understanding it, appears to go back to his initial belief in Birnbaum(?)

“The chapter on statistical controversies actually focus on the opposition between frequentist, likelihood, and Bayesian paradigms. The author seems to have studied Mayo and Spanos’to great lengths. (As I did, as I did!) He spends around twenty pages in Chapter 3 on this opposition and on the conditionality, sufficiency, and likelihood principles that were reunited by Birnbaum and recently deconstructed by Mayo. In my opinion, Chang makes a mess of describing the issues at stake in this debate and leaves the reader more bemused at the end than at the beginning of the chapter. For instance, the conditionality principle is confused with theError and Inferencep-value being computed conditional on the null (hypothesis) model (p.110).” Chang, M. (2012),Paradoxes in Scientific Inference, Chapman and Hall.

**y**in a sampling theory experiment E by means of the abbreviation Infr

_{E}(

**y**), we assume, for simplicity, that packed into E would be the probability model, parameters, and the sampling distribution corresponding to the inference in question. We prefer it because it underscores the need to consider the associated methodology and context. Birnbaum construes Ev(E,

**x**) as “the evidence about the parameter arising from experiment E and result

**x**“ and allows it to range over the inference, conclusion, or report, including p-values, confidence intervals and levels, posteriors. So our notation accomplishes the same, but with (hopefully) less chance of equivocations.

Filed under: Birnbaum Brakes, Likelihood Principle, Statistics

**Please comment on the article here:** **Error Statistics Philosophy » Statistics**