Brian Bucher (who describes himself as “just an engineer, not a statistician”) writes:
I’ve read your paper with John Carlin, Beyond Power Calculations. Would you happen to know of instances in the published or unpublished literature that implement this type of design analysis, especially using your retrodesign() function [here’s an updated version from Andy Timm], so I could see more examples of it in action? Would you be up for creating a blog post on the topic, sort of a “The use of this tool in the wild” type thing?
I [Bucher] found this from Clay Ford and this from Shravan Vasishth and plan on working my way through them, but it would be great to have even more examples.
I promised to write such a post asking for more examples—and here it is! So feel free to send some in. I have a couple examples in section 2 of this paper.
After I told Bucher the post is coming, he threw in another question:
I’d also be curious about if you would apply this methodology in cases where there was technically no statistical significance. I’m thinking primarily of these two cases:
(a) There was no alpha value chosen before the study, and the authors weren’t testing a p-value against an alpha, but just reporting a p-value (such as 0.06) and deciding that it was sufficiently small to conclude that there was likely an effect and worth further experimentation/investigation. (Fisher-ian?)
(b) There was an alpha value chosen (0.05), and the t-test didn’t reject the null because the p-value was 0.08. However, in addition to the frequentist analysis, the authors generated a Bayes factor of 2.0 and claimed this showed that a difference between the two groups was twice as likely as having no difference between groups, and, therefore, conclude a difference in groups.
Letter (a) is a decent description of the type of analyses that I often do (mostly DOEs), since I don’t use alpha-thresholds unless required by a third party.
Letter (b) is (basically) something from a paper that I’m analyzing, and it would be great if I could estimate the Type-S/M errors without violating any statistical laws.
I have my fingers crossed, because in your Beyond Power Calculations paper you do say,
If the result is not statistically significant, the chance of the estimate having the wrong sign is 49% (not shown in the Appendix; this is the probability of a Type S error conditional on nonsignificance)—so that the direction of the estimate gives almost no information on the sign of the true effect.
…so I do have hope that the methods are generally applicable to nonsignificant results as well.
Full disclosure, I [Bucher] posted a version of this question to stackexchange but have not (yet) received any comments.
My reply:
We were thinking of type M and type S errors as frequency properties. The idea is that you define a statistical procedure and then work out its average properties over repeated use. So far, we’ve mostly thought about the procedure which is “do an analysis and report it if it’s ‘statistically significant'”—in my original paper with Tuerlinckx on type M and type S errors (full text here), we talked about the frequency properties of “claims with confidence.”
In your case it seems that you want inference about a particular effect size given available information, and I think you’d be best off just attacking the problem Bayesianly. Write down a reasonable prior distribution for your effect size and then go from there. Sure, there’s a challenge here in having to specify a prior, but that’s the price you have to pay: Without prior, you can’t do much in the way of inference when your data are noisy.