Continuing with posts in recognition of R.A. Fisher’s birthday, I post one from a few years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability.
[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).
The entire episode of fiducial probability is fraught with minefields. Many say it was Fisher’s biggest blunder; others suggest it still hasn’t been understood. The majority of discussions omit the side trip to the Fiducial Forest altogether, finding the surrounding brambles too thorny to penetrate. Besides, a fascinating narrative about the Fisher-Neyman-Pearson divide has managed to bloom and grow while steering clear of fiducial probability–never mind that it remained a centerpiece of Fisher’s statistical philosophy. I now think that this is a mistake. It was thought, following Lehman (1993) and others, that we could take the fiducial out of Fisher and still understand the core of the Neyman-Pearson vs Fisher (or Neyman vs Fisher) disagreements. We can’t. Quite aside from the intrinsic interest in correcting the “he said/he said” of these statisticians, the issue is intimately bound up with the current (flawed) consensus view of frequentist error statistics.
So what’s fiducial inference? I follow Cox (2006), adapting for the case of the lower limit:
We take the simplest example,…the normal mean when the variance is known, but the considerations are fairly general. The lower limit, [with Z the standard Normal variate, and M the sample mean]:
M0 – zc σ/√n
derived from the probability statement
Pr(μ > M – zc σ/√n ) = 1 – c
is a particular instance of a hypothetical long run of statements a proportion 1 – c of which will be true, assuming the model is sound. We can, at least in principle, make such a statement for each c and thereby generate a collection of statements, sometimes called a confidence distribution. (Cox 2006, p. 66).
For Fisher it was a fiducial distribution. Once M0 is observed, M0 – zc σ/√n is what Fisher calls the fiducial c per cent limit for μ. Making such statements for different c’s yields his fiducial distribution.
In Fisher’s earliest paper on fiducial inference in 1930, he sets 1 – c as .95 per cent. Start from the significance test of μ (e.g., μ< μ0 vs. μ>μ0 ) with significance level .05. He defines the 95 percent value of the sample mean M, M.95 , such that in 95% of samples M< M.95 . In the Normal testing case, M.95 = μ0 + 1.65σ/√n. Notice M.95 is the cut-off for rejection in a .05 one-sided test T+ (of μ< μ0 vs. μ>μ0).
We have a relationship between the statistic [M] and the parameter μ such that M.95 = is the 95 per cent value corresponding to a given μ. This relationship implies the perfectly objective fact that in 5 per cent of samples M> M.95. (Fisher 1930, p. 533; I use μ for his θ, M in place of T).
That is, Pr(M < μ + 1.65σ/√n) = .95.
The event M > M.95 occurs just in case μ0 < M − 1.65σ/√n .[i]
For a particular observed M0 , M0 − 1.65σ/√n is the fiducial 5 per cent value of μ.
We may know as soon as M is calculated what is the fiducial 5 per cent value of μ, and that the true value of μ will be less than this value in just 5 per cent of trials. This then is a definite probability statement about the unknown parameter μ which is true irrespective of any assumption as to it’s a priori distribution. (Fisher 1930, p. 533 emphasis is mine).
This seductively suggests that μ < μ.05 gets the probability .05! But we know we cannot say that Pr(μ < μ.05) = .05.[ii]
However, Fisher’s claim that we obtain “a definite probability statement about the unknown parameter μ” can be interpreted in another way. There’s a kosher probabilistic statement about the pivot Z, it’s just not a probabilistic assignment to a parameter. Instead, a particular substitution is, to paraphrase Cox “a particular instance of a hypothetical long run of statements 95% of which will be true.” After all, Fisher was abundantly clear that the fiducial bound should not be regarded as an inverse inference to a posterior probability. We could only obtain an inverse inference, Fisher explains, by considering μ to have been selected from a superpopulation of μ‘s with known distribution. But then the inverse inference (posterior probability) would be a deductive inference and not properly inductive. Here, Fisher is quite clear, the move is inductive.
People are mistaken, Fisher says, when they try to find priors so that they would match the fiducial probability:
In reality the statements with which we are concerned differ materially in logical content from inverse probability statements, and it is to distinguish them from these that we speak of the distribution derived as a fiducial frequency distribution, and of the working limits, at any required level of significance, ….as the fiducial limits at this level. (Fisher 1936, p. 253).
So, what is being assigned the fiducial probability? It is, Fisher tells us, the “aggregate of all such statements…” Or, to put it another way, it’s the method of reaching claims to which the probability attaches. Because M and S (using the student’s T pivot) or M alone (where σ is assumed known) are sufficient statistics “we may infer, without any use of probabilities a priori, a frequency distribution for μ which shall correspond with the aggregate of all such statements … to the effect that the probability that μ is less than M – 1.65σ/√n is .05.” (Fisher 1936, p. 253)[iii]
Suppose you’re Neyman and Pearson aiming to clarify and justify Fisher’s methods.
”I see what’s going on’ we can imagine Neyman declaring. There’s a method for outputting statements such as would take the general form
μ >M – zcσ/√n
Some would be in error, others not. The method outputs statements with a probability of 1 – c of being correct. The outputs are instances of general form of statement, and the probability alludes to the relative frequencies that they would be correct, as given by the chosen significance or fiducial level c . Voila! “We may look at the purpose of tests from another viewpoint,” as Neyman and Pearson (1933) put it. Probability qualifies (and controls) the performance of a method.
There is leeway here for different interpretations and justifications of that probability, from actual to hypothetical performance, and from behavioristic to more evidential–I’m keen to develop the latter. But my main point here is that in struggling to extricate Fisher’s fiducial limits, without slipping into fallacy, they are led to the N-P performance construal. Is there an efficient way to test hypotheses based on probabilities? ask Neyman and Pearson in the opening of the 1933 paper.
Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong (Neyman and Pearson 1933, pp. 141-2/290-1).
At the time, Neyman thought his development of confidence intervals (in 1930) was essentially the same as Fisher’s fiducial intervals. Fisher’s talk of assigning fiducial probability to a parameter, Neyman thought at first, was merely the result of accidental slips of language, altogether expected in explaining a new concept. There was evidence that Fisher accepted Neyman’s reading. When Neyman gave a paper in 1934 discussing confidence intervals, seeking to generalize fiducial limits, but making it clear that the term “confidence coefficient” is not synonymous to the term probability, Fisher didn’t object. In fact he bestowed high praise, saying Neyman “had every reason to be proud of the line of argument he had developed for its perfect clarity. The generalization was a wide and very handsome one,” the only problem being that there wasn’t a single unique confidence interval, as Fisher had wanted (for fiducial intervals).[iv] Slight hints of the two in a mutual admiration society are heard, with Fisher demurring that “Dr Neyman did him too much honor” in crediting him for the revolutionary insight of Student’s T pivot. Neyman responds that of course in calling it Student’s T he is crediting Student, but “this does not prevent me from recognizing and appreciating the work of Professor Fisher concerning the same distribution.”(Fisher comments on Neyman 1934, p. 137). For more on Neyman and Pearson being on Fisher’s side in these early years, see Spanos’s post.
So how does this relate to the current consensus view of Neyman-Pearson vs Fisher? Stay tuned.[v] In the mean time, share your views.
The next installment is here.
[i] (μ < M – zc σ/√n) iff M > M(1 – c) = M >μ + zc σ/√n
[ii] In terms of the pivot Z, the inequality Z >zc is equivalent to the inequality
μ < M –zc σ/√n
“so that this last inequality must be satisfied with the same probability as the first.” But the fiducial value replaces M with M0 and then Fisher’s assertion
Pr(μ > M0 –zc σ/√n ) = 1 – c
no longer holds. (Fallacy of probabilistic instantiation.) In this connection, see my previous post on confidence intervals in polling.
[iii] If we take a number of samples of size n from the same or from different populations, and for each calculate the fiducial 5 percent value for μ, then in 5 per cent of cases the true value of μ will be less than the value we have found. There is no contradiction in the fact that this may differ from a posterior probability. “The fiducial probability is more general and, I think, more useful in practice, for in practice our samples will all give different values, and therefore both different fiducial distributions and different inverse probability distributions. Whereas, however, the fiducial values are expected to be different in every case, and our probabilty statements are relative to such variability, the inverse probability statement is absolute in form and really means something different for each different sample, unless the observed statistic actually happens to be exactly the same.” (Fisher 1930, p. 535)
[iv]Fisher restricts fiducial distributions to special cases where the statistics exhaust the information. He recognizes”The political principle that ‘Anything can be proved with statistics’ if you don’t make use of all the information. This is essential for fiducial inference”. (1936, p. 255). There are other restrictions to the approach as he developed it; many have extended it. There are a number of contemporary movements to revive fiducial and confidence distributions. For references, see the discussants on my likelihood principle paper.
[v] For background, search Fisher on this blog. Some of the material here is from my forthcoming book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP).
Cox, D. R. (2006), Principles of Statistical Inference. Cambridge.
Fisher, R.A. (1930), “Inverse Probability,” Mathematical Proceedings of the Cambridge Philosophical Society, 26(4): 528-535.
Fisher, R.A. (1936), “Uncertain Inference,”Proceedings of the American Academy of Arts and Sciences 71: 248-258.
Lehmann, E. (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?” Journal of the American Statistical Association 88 (424): 1242–1249.
Neyman, J. (1934), “On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection,” Early Statistical Papers of J. Neyman: 98-141. [Originally published (1934) in The Journal of the Royal Statistical Society 97(4): 558-625.]
This material is now part of Section 5.8 in Statistical Inference as Severe Testing: how to Get Beyond the Statistics Wars (Mayo 2018, CUP)