“The issue of how to report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly.”

Ricardo Vieira writes:

I recently came upon this study from Princeton published in PNAS:

Implicit model of other people’s visual attention as an invisible, force-carrying beam projecting from the eyes

In which the authors asked people to demonstrate how much you have to tilt an object before it falls. They show that when a human head is looking at the object in the direction that it is tilting, people implicitly rate the tipping point as being lower than when a person is looking in the opposite direction (as if the eyes either pushed the object down or prevented it from falling). They further show that no such difference emerges when the human head is blindfolded. The experiment a few times with different populations (online and local) and slight modifications.

In a subsequent survey, they found that actually 5% of the population seems to believe in some form of eye-beams (or extramission if you want to be technical).

I have a few issues with the article. For starters, they do not compare directly the non-blindfolded and blindfolded conditions, although they emphasize that the difference in the first is significant and in the second is not several times. This point was actually brought up in the blog Neuroskeptic. The author of the blog writes:

This study seems fairly solid, although it seems a little fortuitious that the small effect found by the n=157 Experiment 1 was replicated in the much smaller (and hence surely underpowered) follow-up experiments 2 and 3C. I also think the stats are affected by the old erroneous analysis of interactions error (i.e. failure to test the difference between conditions directly) although I’m not sure if this makes much difference here.

In the discussion that ensued, one of the study authors responds to the two points raised. I feel the first point is not that relevant, as the first experiment was done on mturk and the subsequent ones in a controlled lab, and the estimated standard errors were pretty similar across the board. Now on to the second point, the author writes:

The issue of how the report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly. First, it should be noted that each of the bars shown in the figure is already a difference between two means (mean angular tilt toward the face vs. mean angular tilt away from the face), not itself a raw mean. What we report, in each case, is a statistical test on a difference between means. If I interpret your argument correctly, it suggests that the critical comparison for us is not this tilt difference itself, but the difference of tilt differences. In our study, however, I would argue that this is not the case, for a couple of reasons:

In experiment 1 (a similar logic applies to exp 2), we explicitly spelled out two hypotheses. The first is that, when the eyes are open, there should be a significant difference between tilts toward the face and tilts away from the face. A significant different here would be consistent with a perceived force emanating from the eyes. Hence, we performed a specific, within-subjects comparison between means to test that specific hypothesis. Doing away with that specific comparison would remove the critical statistical test. Our main prediction would remain unexamined. Note that we carefully organized the text to lay out this hypothesis and report the statistics that confirm the prediction. The second hypothesis is that, when the eyes are closed, there should be no significant difference between tilts toward the face and tilts away from the face (null hypothesis). We performed this specific comparison as well. Indeed, we found no statistical evidence of a tilt effect when the eyes were closed. Thus, each hypothesis was put to statistical test. One could test a third hypothesis: any tilt difference effect is bigger when the eyes are open than when the eyes are closed. I think this is the difference of tilt differences asked for. However, this is not a hypothesis we put forward. We were very careful not to frame the paper in that way. The reason is that this hypothesis (this difference of differences) could be fulfilled in many ways. One could imagine a data set in which, when the eyes are open, the tilt effect is not by itself significant, but shows a small positivity; and when the eyes are closed, the tilt effect shows a small negativity. The combination could yield a significant difference of differences. The proposed test would then provide a false positive, showing a significant effect while the data actually do not support our hypotheses.

Of course, one could ask: why not include both comparisons, reporting on the tests we did as well as the difference of differences? There are at least two reasons. First, if we added more tests, such as the difference of differences, along with the tests we already reported, then we would be double-dipping, or overlapping statistical tests on the same data. The tests then become partially redundant and do not represent independent confirmation of anything. Second, as easy as it may sound, the difference-of-differences is not even calculatable in a consistent manner across all four experiments (e.g., in the control experiment 4), and so it does not provide a standardized way to evaluate all the results.

For all of these reasons, we believe the specific statistical methods reported in the manuscript are the simplest and the most valid. I totally understand that our statistics may seem to be affected by the erroneous analysis of interactions error, at first glance. But on deeper consideration, analyzing the difference-of-differences turns out to be somewhat problematical and also not calculatable for some of our data sets.

Is this reasonable?

My other issues relates to the actual effect. First the size of the difference is not clear (average difference is around 0.67 degrees, which are never described in terms of visual angle). I tried to draw two lines separated by 0.67 degrees on Paint.net, and I couldn’t tell the difference unless they were superimposed, but I am not sure I got the scale correct. Second, they do not state in the article how much rotation is caused by each key-press (is this average difference equivalent to one key-press, half, two?). Finally, the participants do not see the full object rendered during the experiment, but just one vertical line. The authors argue that otherwise people would use heuristics such as move the top corner over the opposite bottom corner. This necessity seems to refute their hypothesis (if the eye-beam bias only work on lines, than they seem of little relevance to the 3d world).

Okay, perhaps what really bothers me is the last paragraph of the article:

We speculate that an automatic, implicit model of vision as a beam exiting the eyes might help to explain a wide range of cultural myths and associations. For example, in StarWars, a Jedi master can move an object by staring at it and concentrating the mind. The movie franchise works with audiences because it resonates with natural biases. Superman has beams that can emanate from his eyes and burn holes. We refer to the light of love and the light of recognition in someone’s eyes, and we refer to death as the moment when light leaves the eyes. We refer to the feeling of someone else’s gaze boring into us. Our culture is suffused with metaphors, stories, and associations about eye beams. The present data suggest that these cultural associations may be more than a simple mistake. Eye beams may remain embedded in the culture, 1,000 y after Ibn al-Haytham established the correct laws of optics (12), because they resonate with a deeper, automatic model constructed by our social machinery. The myth of extramission may tell us something about who we are as social animals.

Before getting to the details, let me share my first reaction, which is appreciation that Arvid Guterstam, one of the authors of the published paper, engaged directly with external criticism, rather than ignoring the criticism, dodging it, or attacking the messenger.

Second, let me emphasize the distinction between individuals and averages. In the above-linked post, Neuroskeptic writes:

Do you believe that people’s eyes emit an invisible beam of force?

According to a rather fun paper in PNAS, you probably do, on some level, believe that.

And indeed, the abstract of the article states: “when people judge the mechanical forces acting on an object, their judgments are biased by another person gazing at the object.” But this finding (to the extent that it’s real, in the sense of being something that would show up in a large study of the general population under realistic conditions) is a finding about averages. It could be that everyone behaves this way, or that most people behave this way, or that only some people behave this way: any of these can be consistent with an average difference.

Also Neuroskeptic’s summary takes a little poetic license, in that the study does not claim that most people believe that eyes emit any force; the claim is that people on average make certain judgments as if eyes emit that force.

This last bit is no big deal but I bring it up because there’s a big difference between people believing in the eye-beam force and implicitly reacting as if there was such a force. The latter can be some sort of cognitive processing bias, analogous in some ways to familiar visual and cognitive illusions that persist even if they are explained to you.

Now on to Vieira’s original question: did the original authors do the right thing in comparing significant to not significant? No, what they did was mistaken, for the usual reasons.

The author’s explanation quoted above is wrong, I believe in an instructive way. The author talks a lot about hypotheses and a bit about the framing of the data, but that’s not so relevant to the question of what can we learn from the data. Procedural discussions such as “double-dipping” also miss the point: Again, what we should want to know is what can be learned from these data (plus whatever assumptions go into the analysis), not how many times the authors “dipped” or whatever.

The fundamental fallacy I see in the authors’ original analysis, and in their follow-up explanation, is deterministic reasoning, in particular the idea that whether a comparison is “statistically significant” is equivalent to an effect being real.

Consider this snippet from Guterstam’s comment:

The second hypothesis is that, when the eyes are closed, there should be no significant difference between tilts toward the face and tilts away from the face (null hypothesis).

This is an error. A hypothesis should not be about statistical significance (or, in this case, no significant difference) in the data; it should be about the underlying or population pattern.

And this:

One could imagine a data set in which, when the eyes are open, the tilt effect is not by itself significant, but shows a small positivity; and when the eyes are closed, the tilt effect shows a small negativity. The combination could yield a significant difference of differences. The proposed test would then provide a false positive, showing a significant effect while the data actually do not support our hypotheses.

Again, the problem here is the blurring of two different things: (a) underlying effects and (b) statistically significant patterns in the data.

A big problem

The error of comparing statistical significance to non-significance is a little thing.

A bigger mistake is the deterministic attitude by which effects are considered there or not, the whole “false positive / false negative” thing. Lots of people, I expect most statisticians, don’t see this as a mistake, but it is one.

But an even bigger problem comes in this sentence from the author of the paper in question:

The issue of how the report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly.

He’s “quite sure”—but he’s wrong. This is a big, big, big problem. People are so so so sure of themselves.

Look. This guy could well be an excellent scientist. He has a Ph.D. He’s a neuroscientist. He knows a lot of stuff I don’t know. But maybe he’s not a statistics expert. That’s ok—not everyone should be a statistics expert. Division of labor! But a key part of doing good work is to have a sense of what you don’t know.

Maybe don’t be so quite sure next time! It’s ok to get some things wrong. I get things wrong all the time. Indeed, one of the main reasons for publishing your work is to get it out there, so that readers can uncover your mistakes. As I said above, I very much appreciate that the author of this article responded constructively to criticism. I think it’s too bad he was so sure of himself on the statistics, but even that is a small thing compared to his openness to discussion.

I agree with my correspondent

Finally, I agree with Vieira that the last paragraph of the article (“We speculate that an automatic, implicit model of vision as a beam exiting the eyes might help to explain a wide range of cultural myths and associations. For example, in StarWars, a Jedi master can move an object by staring at it and concentrating the mind. The movie franchise works with audiences because it resonates with natural biases. Superman has beams that can emanate from his eyes and burn holes. We refer to the light of love and the light of recognition in someone’s eyes, and we refer to death as the moment when light leaves the eyes. We refer to the feeling of someone else’s gaze boring into us. Our culture is suffused with metaphors, stories, and associations about eye beams. The present data suggest that these cultural associations may be more than a simple mistake. Eye beams may remain embedded in the culture, 1,000 y after Ibn al-Haytham established the correct laws of optics (12), because they resonate with a deeper, automatic model constructed by our social machinery. The myth of extramission may tell us something about who we are as social animals.”) is waaaay over the top. I mean, sure, who knows, but, yeah, this is story time outta control!

P.S. One amusing feature of this episode is that the above-linked comment thread has some commenters who seem to actually believe that eye-beams are real:

If “eye beam” is the proper term then I have no difficulty in registering my belief in them. Any habitué of the subway is familiar with the mysterious effect where looking at another’s face, who may be reading a book or be absorbed in his phone, maybe 20 or 30 feet away, will cause him suddenly to swivel his glance toward the onlooker. Let any who doubt experiment.

Just ask hunters or bird watchers if they exist. They know never to look directly at the animals head/eyes or they will be spooked.

I have had my arse saved by ‘sensing’ the gaze of others. This ‘effect’ is real. Completely subjective…yes. That I am here and able to write this comment…is a fact.

No surprise, I guess. There are lots of supernatural beliefs floating around, and it makes sense that they should show up all over, including on blog comment threads.