Pizzagate gets even more ridiculous: “Either they did not read their own previous pizza buffet study, or they do not consider it to be part of the literature . . . in the later study they again found the exact opposite, but did not comment on the discrepancy.”

June 15, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)


Several months ago, Jordan Anaya​, Tim van der Zee, and Nick Brown reported that they’d uncovered 150 errors in 4 papers published by Brian Wansink, a Cornell University business school professor and who describes himself as a “world-renowned eating behavior expert for over 25 years.”

150 errors is pretty bad! I make mistakes myself and some of them get published, but one could easily go through an entire career publishing less than 150 mistakes. So many in a single paper is kind of amazing.

After the Anaya et al. paper came out, people dug into other papers of Wansink and his collaborators and found lots more errors.

Wansink later released a press release pointing to a website which he said contained data and code from the 4 published papers.

In that press release he described his lab as doing “great work,” which seems kinda weird to me, given that their published papers are of such low quality. Usually we would think that if a lab does great work, this would show up in its publications, but this did not seem to have happened in this case.

In particular, even if the papers in question had no data-reporting errors at all, we would have no reason to believe any of the scientific claims that were made therein, as these claims were based on p-values computed from comparisons selected from uncontrolled and abundant researcher degrees of freedom. These papers are exercises in noise mining, not “great work” at all, not even good work, not even acceptable work.

The new paper

As noted above, Wansink shared a document that he said contained the data from those studies. In a new paper, Anaya, van der Zee, and Brown analyzed this new dataset. They report some mistakes they (Anaya et al.) had made in their earlier paper, and many places where Wanink’s papers misreported his data and data collection protocols.

Some examples:

All four articles claim the study was conducted over a 2-week period, however the senior author’s blog post described the study as taking one month (Wansink, 2016), the senior author told Retraction Watch it was a two-month study (McCook, 2017b), a news article indicated the study was at least 3 weeks long (Lazarz, 2007), and the data release states the study took place from October 18 to December 8, 2007 (Wansink and Payne, 2007). Why the articles claimed the study only took two weeks when all the other reports indicate otherwise is a mystery.

Furthermore, articles 1, 2, and 4 all claim that the study took place in spring. For the Northern Hemisphere spring is defined as the months March, April, and May. However, the news report was dated November 18, 2007, and the data release states the study took place between October and December.

And this:

Article 1 states that the diners were asked to estimate how much they ate, while Article 3 states that the amount of pizza and salad eaten was unobtrusively observed, going so far as to say that appropriate subtractions were made for uneaten pizza and salad. Adding to the confusion Article 2 states:
“Unfortunately, given the field setting, we were not able to accurately measure consumption of non-pizza food items.”

In Article 3 the tables included data for salad consumed, so this statement was clearly inaccurate.

And this:

Perhaps the most important question is why did this study take place? In the blog post the senior author did mention having a “Plan A” (Wansink, 2016), and in a Retraction Watch interview revealed that the original hypothesis was that people would eat more pizza if they paid more (McCook, 2017a). The origin of this “hypothesis” is likely a previous study from this lab, at a different pizza buffet, with nearly identical study design (Just and Wansink, 2011). In that study they found diners who paid more ate significantly more pizza, but the released data set for the present study actually suggests the opposite, that diners who paid less ate more. So was the goal of this study to replicate their earlier findings? And if so, did they find it concerning that not only did they not replicate their earlier result, but found the exact opposite? Did they not think this was worth reporting?
Another similarity between the two pizza studies is the focus on taste of the pizza. Article 1 specifically states:

“Our reading of the literature leads us to hypothesize that one would rate pizza from an $8 pizza buffet as tasting better than the same pizza at a $4 buffet.”

Either they did not read their own previous pizza buffet study, or they do not consider it to be part of the literature, because in that paper they found ratings for overall taste, taste of first slice, and taste of last slice to all be higher in the lower price group, albeit with different levels of significance (Just and Wansink, 2011). However, in the later study they again found the exact opposite, but did not comment on the discrepancy.

Anaya et al. summarize:

Of course, there is a parsimonious explanation for these contradictory results in two apparently similar studies, namely that one or both sets of results are the consequence of modeling noise. Given the poor quality of the released data from the more recent articles . . . it seems quite likely that this is the correct explanation for the second set of studies, at least.

And this:

No good theory, no good data, no good statistics, no problem. Again, see here for the full story.

Not the worst of it

And, remember, those 4 pizzagate papers are not the worst things Wansink has published. They’re only the first four articles that anyone bothered to examine carefully enough to see all the data problems.

There was this example dug up by Nick Brown:

A further lack of randomness can be observed in the last digits of the means and F statistics in the three published tables of results . . . Here is a plot of the number of times each decimal digit appears in the last position in these tables:

These don’t look like so much like real data but they do seem consistent with someone making up numbers and not wanting them to seem too round, and not being careful to include enough 0’s and 5’s in the last digits.

And this discovery by Tim van der Zee:

Wansink, B., Cheney, M. M., & Chan, N. (2003). Exploring comfort food preferences across age and gender. Physiology & Behavior, 79(4), 739-747.

Citations: 334

Using the provided summary statistics such as mean, test statistics, and additional given constraints it was calculated that the data set underlying this study is highly suspicious. For example, given the information which is provided in the article the response data for a Likert scale question should look like this:

Furthermore, although this is the most extreme possible version given the constraints described in the article, it is still not consistent with the provided information.

In addition, there are more issues with impossible or highly implausible data.


Sığırcı, Ö, Rockmore, M., & Wansink, B. (2016). How traumatic violence permanently changes shopping behavior. Frontiers in Psychology, 7,

Citations: 0

This study is about World War II veterans. Given the mean age stated in the article, the distribution of age can only look very similar to this:

The article claims that the majority of the respondents were 18 to 18.5 years old at the end of WW2 whilst also have experienced repeated heavy combat. Almost no soldiers could have had any other age than 18.

In addition, the article claims over 20% of the war veterans were women, while women only officially obtained the right to serve in combat very recently.

There’s lots more at the link.

From the NIH guidelines on research misconduct:

Falsification: Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.

The post Pizzagate gets even more ridiculous: “Either they did not read their own previous pizza buffet study, or they do not consider it to be part of the literature . . . in the later study they again found the exact opposite, but did not comment on the discrepancy.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science