Oh, That ESP Paper
I have to hand it to Professor Daryl Bem. He has produced what some commentators have referred to as a publicist’s dream: A scientific paper to be published in a flagship journal of his discipline (Journal of Personality and Social Psychology) that involves the supernatural and sex. To quote from the abstract, “Data are presented for 4 time-reversed effects: precognitive approach to erotic stimuli and precognitive avoidance of negative stimuli; retroactive priming; retroactive habituation; and retroactive facilitation of recall.” All of these are ostensibly examples of the influence of not-yet-presented stimuli on here-and-now responses. In short, Bem is claiming to have found examples at the molar level of cause-effect reversal, with the effect occurring before the cause.
Naturally, this paper has generated considerable comment, controversy and some apprehension. My post here is a late entry, but it does give me the advantage of being able to comment on others’ earlier comments. A New York Times article on January 5th summarized the paper and on the 7th its Opinion pages featured responses from nine high-profile scientists in several fields.
Some commentators say, in effect, that this is a tempest in a teapot. Trimble, Wiseman, Helfand and Kraus, in somewhat different ways, declare that this paper is like many others insofar as the proof of the pudding will emerge as other researchers repeat Bem’s experiments.
Douglas Hofstadter, on the other hand, wants the gate closed against people like Bem. Why? “If any of his claims were true, then all of the bases underlying contemporary science would be toppled, and we would have to rethink everything about the nature of the universe.” Really? And even if so, why is that a reason not to publish Bem?
Back in 1989 Oxford University Press published a book by physicist Roger Penrose (The Emperor’s New Mind) which included claims that quantum mechanical events play an important role in human cognition and that specific such events are the key to understanding consciousness. Unlike Bem, Penrose didn’t present any direct empirical evidence for his claims. Most experts on cognition and consciousness thought Penrose’s ideas were misguided and wrong, but they (including Hofstadter, whose views were among those attacked by Penrose) didn’t call for his book to be suppressed. Heterodoxy should not bar an experimental scientist from having their case given a fair hearing.
Let me declare myself here as a skeptic regarding the existence of psi. I’ll be very surprised if Bem’s results hold up under replication. Nevertheless, I don’t find Bem’s paper crazy or unscientific. He begins his paper not only by acknowledging widespread skepticism about psi but also observing that this skepticism is greater among psychologists than scholars in many other disciplines, including the natural sciences. He also knows the history of research on psi and soberly assesses the accumulated evidence prior to his own investigations. Moreover, his experimental methods seem sound. His analyses of his data and the inferences he draws from them are of the accepted standard in the discipline.
Not only did four qualified reviewers plus the relevant editor on the board of the journal deem his methods sufficiently rigorous, none of the NY Times scientific commentators pointed to any faults in the paper (but then some of them don’t seem to have read the paper in depth—if at all). Had I been a reviewer (and I’ve reviewed plenty of papers for psychology journals over three decades), I’d have given Bem’s paper the go-ahead.
Now, Bem’s evidence is almost entirely statistical. His entire case depends on demonstrating greater-than-chance experimental effects. This kind of demonstration hinges on three important matters: Which school of statistical inference one subscribes to, how great a risk one is willing to run of mistaking a chance effect for a real one, and how great a risk of mistaking a real effect for one due to chance. Bem, along with the vast majority of experimental psychologists, subscribes to the Neyman-Pearson-Fisher school of statistical inference. This school has been strongly criticized for more than a half-century, and among its most effective critics are those subscribing to Bayesian statistical methods.
So, enter the Bayesians with two rebuttals to be published along with Bem’s paper. The Bayesian critiques and correctives proposed by Wagenmakers et al. and Rouder and Morey are valid but not new. Without going into technical aspects, both of these critiques begin with claims that Bem’s statistical analyses are flawed because Neyman-Pearson-Fisher methods are flawed. I agree, as have numerous statistically sophisticated folks for more than 50 years—The conventional (statistical) significance tests used by psychologists are incoherent and biased against the “null hypothesis” (in Bem’s case, the absence of the retroactive psi-effects he’s looking for). Again, let me declare myself as believing that the discipline of Psychology would be much better off using Bayesian statistics.
But here’s the rub. The arguments against null-hypothesis significance tests and in favor of the Bayesian approach were valid before Bem’s paper appeared. Bem’s paper doesn’t add anything to the validity of those arguments. The same arguments could be raised against any experimental psychological paper (i.e., almost all of them) that use the conventional Neyman-Pearson-Fisher significance tests.
Ah, but if psychologists did use Bayesian statistics, would experiments like Bem’s ever produce credible-looking evidence for something as anomalous as psi? The subtext answer in Wagenmakers et al. and, to a lesser extent, Rouder and Morey, seems to be “No:” Bayesian methods would future-proof psychology against straying into such errors. But of course this isn’t true. Bayesian methods won’t prevent first-time anomalous and fallacious findings from occurring from time to time, simply by chance. Bayesian methods will help enormously to winnow such findings from valid ones as evidence from replications accumulates.
How about adopting a more stringent criterion for “better than chance?” In part, this is what Wagenmakers et al. advocate: “We argue that in order to convince a skeptical audience of a controversial claim, one needs to conduct strictly confirmatory studies and analyze the results with statistical tests that are conservative rather than liberal.” This is not an unreasonable position but it does come at a price and also fails to guarantee, again, that a rogue (heretical, dogma-defying and fallacious) chance finding won’t occasionally find its way into the scientific literature.
What’s the price? By adopting more stringent criteria for deciding when we can reject the null hypothesis, indeed the rate of such false-positive findings will decline. However, ceteris paribus, the false-negative rate also will increase. More conservative tests will make it harder to detect real experimental effects. In other words, raising the bar may also prevent more valid non-chance findings from being recognized as such. Given the bias against publication of non-significant results and (I would claim) against even trying to replicate negative findings, the end result could be an increased tendency to overlook genuine effects.
One last point needs to be considered about the “raising-the-bar” remedy. Various commentators have been using Carl Sagan’s aphorism that extraordinary claims require extraordinary evidence to imply that we should hold claims like the existence of psi to stronger evidentiary criteria than “normal” claims, whatever those might be. Carl isn’t here to explain what he meant by his aphorism, but this implication makes no sense to me.
If I present an experiment that I claim replicates Asch’s conformity study, no matter that it already has been replicated more than 100 times, my claim about my experiment should be subjected to the same level of scrutiny and held to the same scientific standards as any other experimental claim of that kind. And so should Bem’s experimental claims. Otherwise, we end up with a double standard: More lenient tests of propositions we think we already know, and more stringent tests of propositions contrary to what we think we know. That’s just confirmation bias institutionalized.
Instead, the evidence we need is just what we should demand of any new findings, namely independent replications of Bem’s experiments and unbiased publication of the results thereof. All 9 of Bem’s experiments are first-time studies. Bem is quoted as saying he’s received “hundreds” of requests for materials and descriptions of his experimental methods (what did I say two weeks ago about “positive replication bias?”). The more sensible commentaries on this affair have pointed out that replication is our best guide to deciding whether Bem’s results should be taken seriously.
To ward off the threat of publication bias (again, see my “disappearing truths” post two weeks ago), Richard Wiseman has offered to set up a registry of all attempts at replication. It will then remain for the statistically knowledgeable to provide appropriate (Bayesian) statistical analyses and assessments of the body of evidence as it accumulates. Actually, it might be better still if a peak body such as the Royal Society or even the American Psychological Association formed a task-force of suitably qualified, impartial, experts to manage the registry and the meta-analysis.
In short, as commentators Trimble, Wiseman, Helfand and Kraus have it, all that is required to resolve the controversy raised by Bem’s paper is good scientific methods and further research conducted by researchers with integrity. This could be a good occasion on which to recall that the great sociologist Emile Durkheim, when asked what is the most important component of scientific method, is said to have answered “honesty.” I’d put “impartiality” right up there too.