Comments on Wasserman’s “what is Bayesian/frequentist inference?”

November 20, 2012

(This article was originally published at Error Statistics Philosophy » Statistics, and syndicated at StatsBlogs.)

What I like best about Wasserman’s blogpost (Normal Deviate) is his clear denial that merely using conditional probability makes the method Bayesian (even if one chooses to call the conditional probability theorem Bayes’s theorem, and even if one is using ‘Bayes’s’ nets). Else any use of probability theory is Bayesian, which trivializes the whole issue. Thus, the fact that conditional probability is used in an application with possibly good results is not evidence of (yet another) Bayesian success story [i].

But I do have serious concerns that in his understandable desire (1) to be even-handed (hammers and screwdrivers are for different purposes, both perfectly kosher tools), as well as (2) to give a succinct sum-up of methods,Wasserman may encourage misrepresenting positions. Speaking only for “frequentist” sampling theorists [ii], I would urge moving away from the recommended quick sum-up of “the goal” of frequentist inference: “Construct procedures with frequency guarantees”. If by this Wasserman means that the direct aim is to have tools with “good long run properties”, that rarely err in some long run series of applications, then I think it is misleading. In the context of scientific inference or learning, such a long-run goal, while necessary is not at all sufficient; moreover, I claim, that satisfying this goal is actually just a byproduct of deeper inferential goals (controlling and evaluating how severely given methods are capable of revealing/avoiding erroneous statistical interpretations of data in the case at hand.) (So I deny that it is even the main goal to which frequentist methods direct themselves.) Even arch behaviorist Neyman used power post-data to ascertain how well corroborated various hypotheses were—never mind long-run repeated applications (see one of my Neyman’s Nursery posts).

It is true that frequentist methods should have good error probabilities, computed with an appropriate sampling distribution (hence, they are often called “sampling theory”). Already this is different from merely saying that their key aim is “frequency guarantees”. Listening to Fisher, to Neyman and Pearson, to Cox and others, one hears of very different goals. (Again, I don’t mean merely that there are other things they care about, but rather, that long-run error goals are a byproduct of satisfying other more central goals regarding the inference at hand!) One will hear from Fisher that there are problems of “distribution” and of “estimation” and that a central goal is “reduction” so as to enable data to be understood and used by the human mind. For Fisher,  statistical method aims to capture the “embryology” of human knowledge (Mayo and Cox (2010). “Frequentist Statistics as a Theory of Inductive Inference”)–i.e., pursues the goal of discovering new things; he denied that we start out with the set of hypotheses or models to be reached (much less an exhaustive one). From Neyman and Pearson, one hears of the aims of quantifying, controlling and appraising reliability, precision, accuracy, sensitivity, and power to detect discrepancies and learn piecemeal. One learns that a central goal is to capture uncertainty–using probability, yes, but attached to statistical methods, not statistical hypotheses. A central goal is to model, distinguish and learn from canonical types of variability—and aspects of phenomena that may be probed by means of a cluster of deliberately idealized or “picturesque” (Neyman) models of chance regularity. One hears from David Cox about using frequentist methods for the goal of determining consistency/inconsistency of given data with a deliberately abstract model, so as to get statistical falsification at specified levels–essentially the only kind of falsification possible in actual inquiry (with any but the most trivial kinds of hypotheses).

It is a mistake to try and delimit the goals of frequentist sampling statistics so as to fit it into a twitter entry [iii]. Moreover, to take a vague “low long-run error goal” (which is open to different interpretations) as primary encourages the idea that “unification” is at hand when it might not be. Let Bayesians have their one updating rule–as some purport. If there is one thing Fisher, Neyman, Pearson and all the other “frequentist” founders fought was the very idea that there is a single “rational” or “best” account or rule that is to be obeyed: they offered a hodge-podge of techniques which are to be used in a piecemeal fashion to answer a given question so that the answers can be communicated and criticized by others. (This is so even for a given method, e.g., Cox’s taxonomy of different null hypotheses). They insist that having incomplete knowledge and background beliefs about the world do not mean that the object of study is or ought to be our beliefs. Frequentist sampling methods do embody some fundamental principles such as: if a procedure had very little capability of finding a flaw in a claim H, then finding no flaw is poor grounds for inferring H. Please see my discussion here, (e.g., Severity versus Rubbing Off).  I have been raising these points (and hopefully much, much more clearly) for a while on this blog and elsewhere[iv], and it is to be hoped that people interested in “what is Bayesian/frequentist” will take note.

Of course it is possible that Wasserman and I are using terminology in a different manner —I return to this.  Regardless, I am very pleased that Wasserman has so courageously decided to wade into the frequentist/Bayesian issue from a contemporary perspective: an Honorable Mention goes to him (11/19/12).

[i]  Some have dubbed this “Bayesian boosterism”.

[ii] The sampling distribution is used to assess/obtain error probabilities of methods. Thus, a more general term for these methods might be “methods that use error probabilities” or “error probability statistics”. I abbreviate the last to “error statistics”. It is not the use of frequentist probability that is essential; it is the use of frequentist sampling distributions in order to evaluate the capabilities of methods to severely probe discrepancies, flaws, and falsehoods in deliberately idealized hypotheses.

[iii]Some people have actually denied there are any frequentist statistical methods, because they have been told that frequentist statistical theory just means evaluating long-run performance of methods. I agree that one can (and often should) explore the frequentist properties of non-frequentist methods, but that’s not all there is to frequentist methodology. Moreover, one often finds the most predominant (“conventional” Bayesian) methods get off the ground by echoing the numbers reached by frequentists. See “matching numbers across philosophies”. 

[iv]Mayo publications may be found here.

Filed under: Error Statistics, Neyman's Nursery, Philosophy of Statistics, Statistics

Please comment on the article here: Error Statistics Philosophy » Statistics

Tags: , , ,