The Stapel Case and Data Fabrication
By now it’s all over the net (e.g., here) and international news media: Tilburg University sacked high-profile social psychologist Diederik Stapel, after he was outed as having faked data in his research. Stapel was director of the Tilburg Institute for Behavioral Economics Research, a successful researcher and fundraiser, and as a colleague expressed it, “the poster boy of Dutch social psychology.” He had more than 100 papers published, some in the flagship journals not just of psychology but of science generally (e.g., Science), and won prestigious awards for his research on social cognition and stereotyping.
Tilburg University Rector Philip Eijlander said that Stapel had admitted to using faked data, apparently after Eijlander confronted him with allegations by graduate student research assistants that his research conduct was fraudulent. The story goes that the assistants had identified evidence of data entry by Stapel via copy-and-paste.
Willem Levelt, psycholinguist and former president of the Royal Netherlands Academy of Arts and Sciences, is to lead a panel investigating the extent of the fraud. That extent could be very widespread indeed. In a press conference the Tilburg rector made it clear that Stapel’s fraudulent research practices may have ranged over a number of years. All of his papers would be suspected, and the panel will advise on which papers will have to be retracted. Likewise, the editors of all journals that Stapel published in are also investigating details of his papers that were published in these journals. Then there are Stapel’s own students and research collaborators, whose data and careers may have been contaminated by his.
I feel sorry for my social psychological colleagues, who are reeling in shock and dismay. Some of my closest colleagues knew Stapel (one was a fellow graduate student with him), and none of them suspected him. Among those who knew him well and worked with him, Stapel apparently was respected as a researcher and trusted as a man of integrity. They are asking themselves how his cheating could have gone undetected for so long, and how such deeds could be prevented in the future. They fear its impact on public perception of their discipline and trust in scientific researchers generally.
An understandable knee-jerk reaction is to call for stricter regulation of scientific research, and alterations to the training of researchers. Mark Van Vugt and Anjana Ahuja’s blog post exemplifies this reaction, when they essentially accuse social psychologists of being more likely to engage in fraudulent research because some of them use deception of subjects in their experiments:
“…this means that junior social psychologists are being trained to deceive people and this might be a first violation of scientific integrity. It would be good to have a frank discussion about deception in our discipline. It is not being tolerated elsewhere so why should it be in our field.”
They make several recommendations for reform, including the declaration that “… ultimately it is through training our psychology students into doing ethically sound research that we can tackle scientific fraud. This is no easy feat.”
The most obvious problems with Van Vugt’s and Ahuja’s recommendations are, first, that there is no clear connection between using deception in research designs and faking data, and second, many psychology departments already include research ethics in researcher education and training. Stapel isn’t ignorant of research ethics. But a deeper problem is that none of their recommendations and, thus far, very few of the comments I have seen about this or similar cases, address three of the main considerations in any criminal case: Means, opportunity, and motive.
Let me speak to means and opportunity first. Attempts to more strictly regulate the conduct of scientific research are very unlikely to prevent data fakery, for the simple reason that it’s extremely easy to do in a manner that is extraordinarily difficult to detect. Many of us “fake data” on a regular basis when we run simulations. Indeed, simulating from the posterior distribution is part and parcel of Bayesian statistical inference. It would be (and probably has been) child’s play to add fake cases to one’s data by simulating from the posterior and then jittering them randomly to ensure that the false cases look like real data. Or, if you want to fake data from scratch, there is plenty of freely available code for randomly generating multivariate data with user-chosen probability distributions, means, standard deviations, and correlational structure. So, the means and opportunities are on hand for virtually all of us. They are the very same means that underpin a great deal of (honest) research. It is impossible to prevent data fraud by these means through conventional regulatory mechanisms.
Now let us turn to motive. The most obvious and comforting explanations of cheats like psychologists Stapel or Hauser, or plagiarists like statistician Wegman and political scientist Fischer, are those appealing to their personalities. This is the “X cheated because X is psychopathic” explanation. It’s comforting because it lets the rest of us off the hook (“I wouldn’t cheat because I’m not a psychopath”). Unfortunately this kind of explanation is very likely to be wrong. Most of us cheat on something somewhere along the line. Cheating is rife, for example, among undergraduate university students (among whom are our future researchers!), so psychopathy certainly cannot be the deciding factor there. What else could be the motivational culprit? How about the competitive pressures on researchers generated by the contemporary research culture?
Cognitive psychologist E,J, Wagenmakers (as quoted in Andrew Gelman’s thoughtful recent post) is among the few thus far who have addressed possible motivating factors inherent in the present-day research climate. He points out that social psychology has become very competitive, and
“high-impact publications are only possible for results that are really surprising. Unfortunately, most surprising hypotheses are wrong. That is, unless you test them against data you’ve created yourself. There is a slippery slope here though; although very few researchers will go as far as to make up their own data, many will “torture the data until they confess”, and forget to mention that the results were obtained by torture….”
I would add to E.J.’s observations the following points.
First, social psychology journals (and journals for other areas in psychology) exhibit a strong bias towards publishing only studies that have achieved a statistically significant result. This bias is widely believed in by researchers and their students. The obvious temptation arising from this is to ease an inconclusive finding into being conclusive by adding more “favorable” cases or making some of the unfavorable ones more favorable.
Second, and of course readers will recognize one of my hobby-horses here, the addiction in psychology to hypothesis-testing over parameter estimation amounts to an insistence that every study yield a conclusion or decision: Did the null hypothesis get rejected? The obvious remedy for this is to develop a publication climate that does not insist that each and every study be “conclusive,” but instead emphasizes the importance of a cumulative science built on multiple independent studies, careful parameter estimates and multiple tests of candidate theories. This adds an ethical and motivational rationale to calls for a shift to Bayesian methods in psychology.
Third, journal editors and reviewers routinely insist on more than one study to an article. On the surface, this looks like what I’ve just asked for, a healthy insistence on independent replication. It isn’t, for two reasons. First, even if the multiple studies are replications, they aren’t independent because they come from the same authors and/or lab. Genuinely independent replicated studies would be published in separate papers written by non-overlapping sets of authors from separate labs. However, genuinely independent replication earns no kudos and therefore is uncommon (not just in psychology, either—other sciences suffer from this problem, including those that used to have a tradition of independent replication).
The second reason is that journal editors don’t merely insist on study replications, they also favor studies that come up with “consistent” rather than “inconsistent” findings (i.e., privileging “successful” replications over “failed” replications). By insisting on multiple studies that reproduce the original findings, journal editors are tempting researchers into corner-cutting or outright fraud in the name of ensuring that that first study’s findings actually get replicated. E.J.’s observation that surprising hypotheses are unlikely to be supported by data goes double (squared, actually) when it comes to replication—Support for a surprising hypothesis may occur once in a while, but it is unlikely to occur twice in a row. Again, remedies are obvious: Develop a publication climate which encourages or even insists on independent replication, that treats well-conducted “failed” replications identically to well-conducted “successful” ones, and which does not privilege “replications” from the same authors or lab of the original study.
None of this is meant to say I fall for cultural determinism—Most researchers face the pressures and motivations described above, but few cheat. So personality factors may also exert an influence, along with circumstances specific to those of us who give in to the temptations of cheating. Nevertheless if we want to prevent more Stapels, we’ll get farther by changing the research culture and its motivational effects than we will by exhorting researchers to be good or lecturing them about ethical principles of which they’re already well aware. And we’ll get much farther than we would in a futile attempt to place the collection and entry of every single datum under surveillance by some Stasi-for-scientists.
Subscribe to comments with RSS.