(This article was originally published at Error Statistics Philosophy » Statistics, and syndicated at StatsBlogs.)
In the early ‘80s, fresh out of graduate school, I persuaded Persi Diaconis, Jack Good, and Patrick Suppes to participate in a session I wanted to organize on ESP and statistics. It seems remarkable to me now—not only that they agreed to participate*, but the extent that PSI research was taken seriously at the time. It wasn’t much later that all the recurring errors and loopholes, and the persistent cheating self-delusion —despite earnest attempts to trigger and analyze the phenomena—would lead many nearly everyone to label PSI research a “degenerating programme” (in the Popperian-Lakatosian sense).
(Though I’d have to check names and dates, I seem to recall that the last straw was when some of the Stanford researchers were found guilty of (unconscious) fraud. Jack Good continued to be interested in the area, but less so, I think. I do not know about the others.)
It is interesting to see how background information enters into inquiry here. So, even though it’s late on a Saturday night, here’s a snippet from one of the papers that caught my interest in graduate school: Diaconis’s (1978) “Statistical Problems in ESP Research”, in Science, along with some critical “letters”
Summary. In search of repeatable ESP experiments, modern investigators are using more complex targets, richer and freer responses, feedback, and more naturalistic conditions. This makes tractable statistical models less applicable. Moreover, controls often are so loose that no valid statistical analysis is possible. Some common problems are multiple end points, subject cheating, and unconscious sensory cueing. Unfortunately, such problems are hard to recognize from published records of the experiments in which they occur; rather, these problems are often uncovered by reports of independent skilled observers who were present during the experiment. This suggests that magicians and psychologists be regularly used as observers. New statistical ideas have been developed for some of the new experiments. For example, many modern ESP studies provide subjects with feedback—partial information about previous guesses—to reward the subjects for correct guesses in hope of inducing ESP learning. Some feedback experiments can be analyzed with the use of skill-scoring, a statistical procedure that depends on the information available and the way the guessing subject uses this information. (p. 131)
Is modern parapsychological research worthy of serious consideration? The volume of literature by reputable scientists, the persistent interest of students, and the government’s funding of ESP projects make it difficult to evade this question. Over the past 10 years, in the capacity of statistician and professional magician, I have had personal contact with more than a dozen paranormal experiments. My background enourages a thorough skepticism, but I also find it useful to recall that skeptics make mistakes. …
Critics of ESP must acknowledge the possibility of missing a real phenomenon because of the difficulty of designing a suitable experiment. However, the characteristics which lead many to be dubious about claims for ESP—its sporadic appearance, its need for a friendly environment, and its common association with fraud—require of the most sympathetic analyst not only skill in the analysis of nonstandard types of experimental design but appreciation of the differences between a sympathetic environment with flexible study design and experimentation which is simply careless or so structured as to be impossible to evaluate.
In this article I use examples to indicate the problems associated with the generally informal methods of design and evaluation of ESP experiments—in particular, the problems of multiple end points and subject cheating. I then review some of the commentaries of outstanding statisticians on the problems of evaluation. Finally, as an instance of using new analytic methods for non-standard experiments, I give examples of some new statistical techniques that permit appropriate evaluation of studies that allow instant feedback of information to the subject after each trial, an entirely legitimate device used to facilitate whatever learning process may be involved. (p. 131)
…
Statisticians and ESP (p. 133)
The only widely respected evidence for paranormal phenomena is statistical. Classical statistical tests are reported in each of the published studies described above. Most often these tests are ‘highly statistically significant.’ This only implies that the results are improbable under simple chance models. In complex, badly controlled experiments simply chance models cannot be seriously considered as tenable explanations; hence, rejection of such models is not of particular interest. For example, the high significance claimed for the famous Zenith Radio experiment is largely a statistical artifact (18). Listeners were invited to mail in their guesses on a random sequence of playing cards. The proportion of correct guesses was highly significant when calculations were based on the assumption of random guessing on the part of each listener. It is well known (19) that the distribution of sequences produced by human subjects is far from random, and hence the crucial hypothesis of independence fails in this situation. More sophisticated analysis of the Zenith results gives no cause for surprise.
In well-run experiments, statistics can aid in the design and final analysis. The idea of deliberately introducing external, well-controlled randomization in investigation of paranormal phenomena seems due to Richet (20 ) and Edgeworth (21). Later, Wilks (22) wrote a survey article on reasonable statistical procedures for analyzing paranormal experiments popular at the time. Fisher developed new statistical methods that allow credit for ‘close’ guesses in card-guessing experiments (23). Good (24) continues to suggest new experiments and explanations for ESP. The parascience community, well aware of the importance of statistical tools, has solved numerous statistical riddles in its own literature. Any of the three best known parascience journals is a source of a number of good surveys and discussions of inferential problems (25).
For the full article and citations: “Statistical Problems in ESP Research”
The grounds for the growing skepticism of the period were based on the obstacles standing in the way of valid testing of the variety of different ESP hypotheses. Examples include: multiple end points, subject cheating, unconscious cueing, gaps between published records and actual experimental protocols, poorly designed, badly run, and inappropriately analyzed experiments. “Even if there had not been subject cheating, the experiments described above would be useless because they were out of control. The confusing and erratic experimental conditions I have described are typical of every test of paranormal phenomena I have witnessed”. (Diaconis, p. 133)
My takeaway message: The background knowledge here, insofar as it is relevant for inquiry, consists of very specific problems as well as specific recommendations/requirements for experimental designs. Communicating and using the background information in inquiry also involves describing specific protocols, checks, and stipulations for any future experimental demonstrations to pass muster.
You may be interested to read some critical “letters” by Tart, and Puthoff and Targ, with an author response.
*There’s more: it was part of a ‘popular culture society’ meeting!
Filed under: philosophy of science, Philosophy of Statistics, Statistics
Please comment on the article here: Error Statistics Philosophy » Statistics

