Category: Miscellaneous Statistics

“The idea of replication is central not just to scientific practice but also to formal statistics . . . Frequentist statistics relies on the reference set of repeated experiments, and Bayesian statistics relies on the prior distribution which represents the population of effects.”

Rolf Zwaan (who we last encountered here in “From zero to Ted talk in 18 simple steps”), Alexander Etz, Richard Lucas, and M. Brent Donnellan wrote an article, “Making replication mainstream,” which begins: Many philosophers of science and methodologists have argued that the ability to repeat studies and obtain similar results is an essential component […]

The post “The idea of replication is central not just to scientific practice but also to formal statistics . . . Frequentist statistics relies on the reference set of repeated experiments, and Bayesian statistics relies on the prior distribution which represents the population of effects.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

If you have a measure, it will be gamed (politics edition).

They sometimes call it Campbell’s Law: New York Governor Andrew Cuomo is not exactly known for drumming up grassroots enthusiasm and small donor contributions, so it was quite a surprise on Monday when his reelection campaign reported that more than half of his campaign contributors this year gave $250 or less. But wait—a closer examination […]

The post If you have a measure, it will be gamed (politics edition). appeared first on Statistical Modeling, Causal Inference, and Social Science.

The statistical checklist: Could there be a list of guidelines to help analysts do better work?

[image of cat with a checklist] Paul Cuffe writes: Your idea of “researcher degrees of freedom” [actually not my idea; the phrase comes from Simmons, Nelson, and Simonsohn] really resonates with me: I’m continually surprised by how many researchers freestyle their way through a statistical analysis, using whatever tests, and presenting whatever results, strikes their […]

The post The statistical checklist: Could there be a list of guidelines to help analysts do better work? appeared first on Statistical Modeling, Causal Inference, and Social Science.

He wants to model a proportion given some predictors that sum to 1

Joël Gombin writes: I’m wondering what your take would be on the following problem. I’d like to model a proportion (e.g., the share of the vote for a given party at some territorial level) in function of some compositional data (e.g., the sociodemographic makeup of the voting population), and this, in a multilevel fashion (allowing […]

The post He wants to model a proportion given some predictors that sum to 1 appeared first on Statistical Modeling, Causal Inference, and Social Science.

Divisibility in statistics: Where is it needed?

The basics of Bayesian inference is p(parameters|data) proportional to p(parameters)*p(data|parameters). And, for predictions, p(predictions|data) = integral_parameters p(predictions|parameters,data)*p(parameters|data). In these expressions (and the corresponding simpler versions for maximum likelihood), “parameters” and “data” are unitary objects. Yes, it can be helpful to think of the parameter objects as being a list or vector of individual parameters; and […]

The post Divisibility in statistics: Where is it needed? appeared first on Statistical Modeling, Causal Inference, and Social Science.

On this 4th of July, let’s declare independence from “95%”

Plan your experiment, gather your data, do your inference for all effects and interactions of interest. When all is said and done, accept some level of uncertainty in your conclusions: you might not be 97.5% sure that the treatment effect is positive, but that’s fine. For one thing, decisions need to be made. You were […]

The post On this 4th of July, let’s declare independence from “95%” appeared first on Statistical Modeling, Causal Inference, and Social Science.

Flaws in stupid horrible algorithm revealed because it made numerical predictions

Kaiser Fung points to this news article by David Jackson and Gary Marx: The Illinois Department of Children and Family Services is ending a high-profile program that used computer data mining to identify children at risk for serious injury or death after the agency’s top official called the technology unreliable. . . . Two Florida […]

The post Flaws in stupid horrible algorithm revealed because it made numerical predictions appeared first on Statistical Modeling, Causal Inference, and Social Science.

Problems with surrogate markers

Paul Alper points us to this article in Health News Review—I can’t figure out who wrote it—warning of problems with the use of surrogate outcomes for policy evaluation: “New drug improves bone density by 40%.” At first glance, this sounds like great news. But there’s a problem: We have no idea if this means the […]

The post Problems with surrogate markers appeared first on Statistical Modeling, Causal Inference, and Social Science.

In my role as professional singer and ham

Pryor unhooks the deer’s skull from the wall above his still-curled-up companion. Examines it. Not a good specimen –the back half of the lower jaw’s missing, a gap that, with the open cranial cavity, makes room enough for Pryor’s head. He puts it on. – Will Eaves, Murmur So as we roll into the last […]

The post In my role as professional singer and ham appeared first on Statistical Modeling, Causal Inference, and Social Science.

Regression to the mean continues to confuse people and lead to errors in published research

David Allison sends along this paper by Tanya Halliday, Diana Thomas, Cynthia Siu, and himself, “Failing to account for regression to the mean results in unjustified conclusions.” It’s a letter to the editor in the Journal of Women & Aging, responding to the article, “Striving for a healthy weight in an older lesbian population,” by […]

The post Regression to the mean continues to confuse people and lead to errors in published research appeared first on Statistical Modeling, Causal Inference, and Social Science.

Ways of knowing in computer science and statistics

Brad Groff writes: Thought you might find this post by Ferenc Huszar interesting. Commentary on how we create knowledge in machine learning research and how we resolve benchmark results with (belated) theory. Key passage: You can think of “making a a deep learning method work on a dataset” as a statistical test. I would argue […]

The post Ways of knowing in computer science and statistics appeared first on Statistical Modeling, Causal Inference, and Social Science.

Data science teaching position in London

Seth Flaxman sends this along: The Department of Mathematics at Imperial College London wishes to appoint a Senior Strategic Teaching Fellow in Data Science, to be in post by September 2018 or as soon as possible thereafter. The role will involve developing and delivering a suite of new data science modules, initially for the MSc […]

The post Data science teaching position in London appeared first on Statistical Modeling, Causal Inference, and Social Science.

What is the role of qualitative methods in addressing issues of replicability, reproducibility, and rigor?

Kara Weisman writes: I’m a PhD student in psychology, and I attended your talk at the Stanford Graduate School of Business earlier this year. I’m writing to ask you about something I remember you discussing at that talk: The possible role of qualitative methods in addressing issues of replicability, reproducibility, and rigor. In particular, I […]

The post What is the role of qualitative methods in addressing issues of replicability, reproducibility, and rigor? appeared first on Statistical Modeling, Causal Inference, and Social Science.

Power analysis and NIH-style statistical practice: What’s the implicit model?

So. Following up on our discussion of “the 80% power lie,” I was thinking about the implicit model underlying NIH’s 80% power rule. Several commenters pointed out that, to have your study design approved by NSF, it’s not required that you demonstrate that you have 80% power for real; what’s needed is to show 80% […]

The post Power analysis and NIH-style statistical practice: What’s the implicit model? appeared first on Statistical Modeling, Causal Inference, and Social Science.

Chasing the noise in industrial A/B testing: what to do when all the low-hanging fruit have been picked?

Commenting on this post on the “80% power” lie, Roger Bohn writes: The low power problem bugged me so much in the semiconductor industry that I wrote 2 papers about around 1995. Variability estimates come naturally from routine manufacturing statistics, which in semicon were tracked carefully because they are economically important. The sample size is […]

The post Chasing the noise in industrial A/B testing: what to do when all the low-hanging fruit have been picked? appeared first on Statistical Modeling, Causal Inference, and Social Science.