Pizzagate update: Don’t try the same trick twice or people might notice

February 9, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

I’m getting a bit sick of this one already (hence image above; also see review here from Jesse Singal) but there are a couple of interesting issues that arose in recent updates.

1. One of the weird things about the Brian Wansink affair (“Pizzagate”) was how he responded to such severe criticism (serious in its content and harsh in its tone) in such an equanimous way.

Critic A:

You pushing an unpaid PhD-student into salami slicing null-results into 5 p-hacked papers . . . Because more worthless, p-hacked publications = obviously better….? . . . I really hope this story is a joke. If not, your behaviour is one of the biggest causes of the proliferation of junk science in psychology and you are the one who should be shamed, not the postdoc.

Wansink’s reply:

I understand the good points you make. . . .

Critic B:

I’m very grateful to you for exposing how these valueless and misleading studies are generated. . . .

Wansink’s reply:

Hi George, You are right on target. . . .

Critic C:

There is something catastrophically wrong in the genesis of the papers you describe . . . the moment you start exploring your data, the p-value becomes worse than worthless: it is positively misleading. It misled you, the entire field of psychology and many more. To be honest, it’s even worse than this . . .

Wansink’s reply:

Outstanding! Thank you so much for point out those two paper. (I downloaded the draft form and will be making it required reading for my team). You make outstanding points. . . .

As I wrote the other day, Wansink’s reactions just don’t add up. When people point out major problems with your work and say to you that your work is worthless, you might react with anger, or you might realize they’re right and go clean up your act, or you might change your career.

But Wansink’s responses are just . . . off. He doesn’t act annoyed or offended at all; instead he thanks the critics. That’s fine, except then he doesn’t react to the criticisms. Wansink’s “You are right on target” and “I understand the good points you make” are empty, in that in effect he’s admitting to repeated research incompetence and perhaps unethical behavior in pressuring students and postdocs to use poor research practices.

Similarly puzzling are Wansink’s reaction to the 150 errors that an outside research team found in his four papers. Who has 150 errors in four papers? When does that ever happen? But Wansink’s like, no big deal, he’ll “correct some of these oversights.” 150 errors is not an “oversight”; it’s an absolute disaster!

So what’s going on?

I got some insight by reading this post by Tim Smits, who experienced the very same behavior from Wansink, five years ago, regarding a completely different paper! Here’s Smits:

A series of academic social media posts and a critical article target the lab’s inferior methodology and old school approach to rendering null-effects into (a set of) publishable papers. In this post, I [Smits] want to give my account of a previous similar situation that I had with the same lab in 2012. . . .

In 2011, the Cornell research published an article (Zampollo, Kiffin, Wansink & Shimizu, 2011) on how children’s preferences for food are differentially affected by the how the foods are presented on a plate compared to adults. . . . some of the findings were incomprehensible from the article . . . I wrote a polite email. Asking for some specific information about the statistics. This was the response I got.

Dear Tim, Thank you for being interested in our paper.Actually there are several errors in the results section and Table 1. What we did was two step chi-square tests for each sample (children and adults), so we did not do chi-square tests to compare children and adults.As indicated in the section of statistical analysis, we believe doing so is more conclusive to argue, for example, that children significantly prefer six colors whereas adults significantly prefer three colors (rather than that children and adults significantly differ in their preferred number of color). Thus, for each sample, we first compared the actual number of choices versus the equal distribution across possible number of choices. For the first hypothesis, say #1=0, #2=0, #3=1, #4=0, #5=2, #6=20 (n=23), then we did a chi-square test (df=5) to compare those numbers with 3.83 — this verified the distribution is not equal. Then, we did second chi-square test (df=1) to compare 20 and 0.6 (the average of other choices), which should yield 18.3. However, as you might already notice, some of values in the text and the table are not correct — according to my summary notes, the first 3 results for children should be: 18.3 (rather than 40.4)16.1 (rather than 23.0)9.3 (rather than 26.88) Also, the p-value for .94 (for disorganized presentation) should not be significant apparently. I am sorry about this confusion — but I hope this clarify your question.

Well, that was interesting. Just one email, and immediately a bunch of corrections followed. Too bad the answer was nonsensical. So I wrote back to them (bold added now):

When reading the paper, I did understand the first step of the chi-square tests. I was puzzled by the second step, and to be honest, I still am a bit. The test you performed in that second step boils down to a binomial test, examining the difference between the observed number of counts in the most preferred cell and the H0 expected number of counts. Though this is informative, it does not really tell you something about how significant the preferences were. For instance, if you would have the following hypothetical cell counts [0 ; 0 ; 11; 0; 0 ; 12], cell 6 would still be preferred the most, but a similar binomial test on cell 3 would also be strongly significant. In my opinion, I thus believe that the tests do not match their given interpretations in the article. From a mathematical point of view, your tests on how much preferred a certain type of plate is raise the alpha level to .5 instead of .05. What you do test on the .05 level is just the deviation in the observed cell count from the hypothesized count in that particular cell, but this is not really interesting

Then, this remarkable response came. . . . they agree with the “shoddy statistics” . . . Moreover, they immediately confess to having published this before.

I carefully read your comments and I think I have to agree with you regarding the problem in the second-step analysis.I employed this two-step approach because I employed similar analyses before (Shimizu & Pelham, 2008, BASP). But It is very clear that our approach is not appropriate test for several cases like the hypothetical case you suggested. Fortunately, such case did not happen so often (only case happened in for round position picture for adults). But more importantly, I have to acknowledge that raising the p-value to .5 in this analysis has to be taken seriously. Thus, like you suggested, I think comparing kids counts and adults counts (for preferred vs rest of cells) in 2×2 should be better idea. I will try to see if they are still significant as soon as I have time to do.

You see what happened? Wansink did the exact same thing years ago! Someone sent him a devastating criticism, he or someone in his lab responded in a cordial and polite way, gave tons of thanks, and then they did essentially nothing. As Smits put it, these are “old school researchers and labs, still empowered by the false feedback of the publishing system that tends to reward such practices. . . . But how on earth can you just continue with “shoddy methodology” after someone else pointed that out for you?”

Smits concludes:

Just to take this one article as an example: Their own press releases and outreach about that study did not show a single effort of self-correction. You can still find some of that material on their website. Similarly, despite the recent turmoil, I have seen them just continue their online communication efforts.

Indeed, here are the most recent items on Wansink’s “Healthier and Happier” blog:

Behind-the-Scenes with Rachael Ray

Keeping the Change [about someone’s weight loss]

Foreign Weight [inviting people to participate in an online survey]

Congratulations, You’re Already Losing Weight

First Seen is First Eaten – The Solution

The Solution to Mindless Eating

I’ll decline the opportunity here to make a joke like, “The Solution to Mindless Publishing and Promoting.”

The point is:

(a) Over the past few months, Wansink has received harsh and on-the-mark criticism about his research methods and his claimed results. In his words, he’s accepted much of this criticism, but in his actions, he’s ignoring it, minimizing it, indeed seems to be using his words in an attempt to defuse the criticism without ever addressing it.

(b) The same thing happened five years ago. Back then, the strategy worked, in the sense that the revelation of research incompetence in that earlier paper did not stop him from continuing full steam ahead with his group, getting funding, publishing in respected journals, going on national TV, etc.

Hearing about (b) gives me a lot more insight into (a). I’m no longer puzzled by Wansink’s recent behavior. Now it all makes sense: he’s following the same strategy that worked before.

That said, from my perspective it would seem like less effort to just not write papers that have 150 errors. I still can’t figure out how that happened. But by Wansink’s own admission he puts his students and postdocs under a huge amount of pressure, and accuracy is clearly not high on anybody’s priority list over there. People don’t always respond to incentives, but responding to incentives is usually a lot easier than not responding to incentives.

2. Jesse Singal’s aforementioned news article had this wonderfully revealing quote from Cornell University’s public relations officer:

Recent questions have arisen regarding the statistical methods utilized by Professor Brian Wansink

That’s misleading. The issue is that Wansink doesn’t seem to be using any statistical method at all, as no known statistical method can produce the numbers in his tables.

Yah, yah, sure, the P.R. guy’s job is not to spread truth, it’s to promote (or, in this case, to minimize damage to) Cornell University. The whole thing’s kind of sad, though. Who’d want a job like that?

The P.R. guy also says, “we respect our faculty’s role as independent investigators to determine the most appropriate response to such requests, absent claims of misconduct.”

Which makes me wonder: If you publish four different papers on the same failed experiment, and you make 150 errors in the process, and in the meantime you’re pressuring your students and postdocs to do this work . . . does this count as “misconduct”? Recall Clarke’s Law.

Sure, all of this put together still only rates a zero on the Richter scale compared to what’s happening every day in Washington D.C. but it still bothers me that this is standard operating procedure in big-time science. From a certain point of view, it’s just kind of amazing.

The post Pizzagate update: Don’t try the same trick twice or people might notice appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science