Tour Guide Mementos (Excursion 1 Tour II of How to Get Beyond the Statistics Wars)

Stat Museum

Excursion 1 Tour II: Error Probing Tools vs. Logics of Evidence 

Blurb. Core battles revolve around the relevance of a method’s error probabilities. What’s distinctive about the severe testing account is that it uses error probabilities evidentially: to assess how severely a claim has passed a test. Error control is necessary but not sufficient for severity. Logics of induction focus on the relationships between given data and hypotheses–so outcomes other than the one observed drop out. This is captured in the Likelihood Principle (LP). Tour II takes us to the crux of central wars in relation to the Law of Likelihood (LL) and Bayesian probabilism. (1.4) Hypotheses deliberately designed to accord with the data can result in minimal severity. The likelihoodist wishes to oust them via degrees of belief captured in prior probabilities. To the severe tester, such gambits directly alter the evidence by leading to inseverity. (1.5) Stopping rules: If a tester tries and tries again until significance is reached–optional stopping–significance will be attained erroneously with high probability. According to the LP, the stopping rule doesn’t alter evidence. The irrelevance of optional stopping is an asset for holders of the LP, it’s the opposite for a severe tester. The warring sides talk past each other.

1.4 The Law of Likelihood and Error Statistics: Key Items

Ian Hacking (1965) – the Law of Likelihood.

Law of Likelihood (LL): Data x are better evidence for hypothesis H1

than for H0 if x is more probable under H1 than under H0.

Likelihoods are defined and several examples are given.

Likelihoods of hypotheses should not be confused with their probabilities.

The Law of Likelihood (LL) is seen to fail the minimal severity requirement – at least if it is taken as an account of inference.

Gellerized hypotheses: maximally fitting, but minimally severely tested, hypotheses.

We observe one outcome, but we can consider that for any outcome, unless it makes H0 maximally likely, we can find an H1 that is more likely.

A severity assessment is one level removed: you give me the rule, and I consider its latitude for erroneous outputs.

Sampling distribution.

Richard Royall: He distinguishes three questions: belief, action, and evidence:

  1. What do I believe, now that I have this observation?
  2. What should I do, now that I have this observation?
  3. How should I interpret this observation as evidence regarding [H0] versus [H1]?

Exhibit (i): Law of Likelihood Compared to a Significance Test.

Why the LL Reject Composite Hypotheses

Royall holds that all attempts to say whether x is good evidence for H, or even if x is better evidence for H than is y, are futile. Similarly,

“What does the [LL] say when one hypothesis attaches the same probability to two different observations? It says absolutely nothing . . . [it] applies when two different hypotheses attach probabilities to the same observation” (Royall 2004, p. 148).

The severe tester distinguishes the evidential warrant for one and the same hypothesis H in two cases: one where it was constructed post hoc, cherry picked, and so on, a second where it was predesignated.

Souvenir B: Likelihood versus Error Statistical

To the Likelihoodist, points in favor of the LL are:

  • The LR offers “a precise and objective numerical measure of the strength of statistical evidence” for one hypotheses over another; it is a frequentist account and does not use prior probabilities (Royall 2004, p. 123).
  • The LR is fundamentally related to Bayesian inference: the LR is the factor by which the ratio of posterior probabilities is changed by the data.
  • A Likelihoodist account does not consider outcomes other than the one observed, unlike P-values, and Type I and II errors. (Irrelevance of the sample space.)
  • Fishing for maximally fitting hypotheses and other gambits that alter error probabilities do not affect the assessment of evidence; they may be blocked by moving to the “belief” category.

To the error statistician, problems with the LL include:

  • LRs do not convey the same evidential appraisal in different contexts.
  • The LL denies it makes sense to speak of how well or poorly tested a single hypothesis is on evidence, essential for model checking; it is inapplicable to composite hypothesis tests.
  • A Likelihoodist account does not consider outcomes other than the one observed, unlike P-values, and Type I and II errors. (Irrelevance of the sample space.)
  • Fishing for maximally fitting hypotheses and other gambits that alter error probabilities do not affect the assessment of evidence; they may be blocked by moving to the “belief” category.

Notice, the last two points are identical for both. What’s a selling point for a Likelihoodist is a problem for an error statistician.

 

1.5 Trying and Trying again: Key Items

“ trying and trying again” to achieve statistical significance, stopping rules and their relevance/irrelevance

Edwards, Lindman, and Savage (E, L, & S, 1963).

Simmons, Nelson, and Simonsohn

The Likelihood Principle (LP).

Weak Repeated Sampling Principle.

(Cox and Hinkley 1974, p. 51). “ [W]e should not follow procedures which for some possible parameter values would give, in hypothetical repetitions, misleading conclusions most of the time” (ibid., pp. 45– 6).

The 1959 Savage Forum

Arguments from Intentions:

Error Probabilities Violate the LP

Problem of “ known (or old) evidence” made famous by Clark Glymour (1980).

Souvenir C. A severe Tester’s Translation Guide [i]

HOW TO FIND MATERIAL FROM EXCURSION 1 TOUR II (if you don’t have a copy of the book). I have not posted Excursion 1 Tour I. (Andrew Gelman may post a draft for a possible discussion on his blog.)

However, there are posts on this bog that cover much of the material (in blog form). For the material on Royall and the Law of Likelihood in 1.4 (including a link to an article by Royall), see this post; for stopping rules and the likelihood principle, see this post. That post also offers Museum links to the Savage Forum! You can also search this blog for terms of interest, and there’s quite a lot on those in 1.4 and 1.5. Have fun! Please share comments, queries, favorite quotes, etc.

[i] I may post Souvenir C separately.

Tour Guide Mementos (Excursion 1, Tour I of How to Get Beyond the Statistics Wars)

FOR ALL OF TOUR I (proofs)SIST Excursion 1 Tour I

SIST Itinerary