February 13, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

The Kangaroo with a feather effect

OK, guess the year of this quote:

Experimental social psychology today seems dominated by values that suggest the following slogan: “Social psychology ought to be and is a lot of fun.” The fun comes not from the learning, but from the doing. Clever experimentation on exotic topics with a zany manipulation seems to be the guaranteed formula for success which, in turn, appears to be defined as being able to effect a tour de force. One sometimes gets the impression that an ever-growing coterie of social psychologists is playing (largely for one another’s benefit) a game of “can you top this?” Whoever can conduct the most contrived, flamboyant, and mirth-producing experiments receives the highest score on the kudometer. There is, in short, a distinctly exhibitionistic flavor to much current experimentation, while the experimenters themselves often seem to equate notoriety with achievement.

It’s from Kenneth Ring, Journal of Experimental Social Psychology, 1967.

Except for the somewhat old-fashioned words (“zany,” “mirth”), the old-fashioned neologism (“kudometer”) and the lack of any reference to himmicanes, power pose, or “cute-o-nomics,” the above paragraph could’ve been written yesterday, or five years ago, or any time during the career of Paul Meehl.

Or, as authority figures Susan Fiske, Daniel Schacter, and Shelley Taylor would say, “Every few decades, critics declare a crisis, point out problems, and sometimes motivate solutions.”

I learned about the above Kenneth Ring quote from this recent post by Richard Morey who goes positively medieval on the recently retracted paper by psychology professor Will Hart, a case that was particularly ridiculous because it seems that the analysis in that paper was faked by the student who collected the data . . . but was not listed as a coauthor or even thanked in the paper’s acknowledgments!

In his post, Morey describes how bad this article was, as science, even if all the data had been reported correctly. In particular, he described how the hypothesized effect sizes were much larger than could make sense based on common-sense reasoning, and how the measurements are too noisy to possibly detect reasonable-sized effects. These are problems we see over and over again; they’re central to the Type M and Type S error epidemic and the “What does not kill my statistical significance makes it stronger” fallacy. I feel kinda bad that Morey has to use, as an example, a retracted paper by a young scholar who probably doesn’t know any better . . . but I don’t feel so bad. The public record is the public record. If the author of that paper was willing to publish his paper, he should be wiling to let it be criticized. Indeed, from the standpoint of the scientist (not the careerist), getting your papers criticized by complete strangers is one of the big benefits of publication. I’ve often found it difficult to get anyone to read my draft articles, and it’s a real privilege to get people like Richard Morey to notice your work and take the trouble to point out its fatal flaws.

Oh, and by the way, Morey did not find these flaws in response to that well-publicized reaction. The story actually happened in the opposite order. Here’s Morey:

When I got done reading the paper, I immediately requested the data from the author. When I heard nothing, I escalated it within the University of Alabama. After many, many months with no useful response (“We’ll get back to you!”), I sent a report to Steve Lindsay at Psychological Science, who, to his credit, acted quickly and requested the data himself. The University then told him that they were going to retract the paper…and we never even had to say why we were asking for the data in the first place. . . .

The basic problem here is not the results, but the basic implausibility of the methods combined with the results. Presumably, the graduate student did not force Hart to measure memory using four lexical decision trials per condition. If someone claims to have hit a bullseye from 500m in hurricane-force winds with a pea-shooter, and then claims years later that a previously-unmentioned assistant faked the bullseye, you’ve got a right to look at them askance.

At this point I’d like to say that Hart’s paper should never have been accepted for publication in the first place—but that misses the point, as everything will get published, if you just keep submitting it to journal after journal. If you can’t get it into Nature, go for Plos-One, and if they turn you down, there’s always Psychological Science or JPSP (but that’ll probably only work if you’re (a) already famous and (b) write something on ESP).

The real problem is that this sort of work is standard operating practice in the field of psychology, no better and no worse (except for the faked data) than the papers on himmicanes, air rage, etc., endorsed by the prestigious National Academy of Sciences. As long as this stuff is taken seriously, it’s still a crisis, folks.

The post Crossfire appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science