## I Can’t Believe What I Teach

For the past 34 years I’ve been compelled to teach a framework that I’ve long known is flawed. A better framework exists and has been available for some time. Moreover, I haven’t been forced to do this by any tyrannical regime or under threats of great harm to me if I teach this alternative instead. And it gets worse: I’m not the only one. Thousands of other university instructors have been doing the same all over the world.

I teach statistical methods in a psychology department. I’ve taught courses ranging from introductory undergraduate through graduate levels, and I’m in charge of that part of my department’s curriculum. So, what’s the problem—Why haven’t I abandoned the flawed framework for its superior alternative?

Without getting into technicalities, let’s call the flawed framework the “Neyman-Pearson” approach and the alternative the “Bayes” approach. My statistical background was formed as I completed an undergraduate degree in mathematics during 1968-72. My first courses in probability and statistics were Neyman-Pearson and I picked up the rudiments of Bayes toward the end of my degree. At the time I thought these were simply two valid alternative ways of understanding probability.

Several years later I was a newly-minted university lecturer teaching introductory statistics to fearful and sometimes reluctant students in the social sciences. The statistical methods used in the social science research were Neyman-Pearson, so of course I taught Neyman-Pearson. Students, after all, need to learn to read the literature of their discipline.

Gradually, and through some of my research into uncertainty, I became aware of the severe problems besetting the Neyman-Pearson framework. I found that there was a lengthy history of devastating criticisms raised against Neyman-Pearson even within the social sciences, criticisms that had been ignored by practising researchers and gatekeepers to research publication.

However, while the Bayesian approach may have been conceptually superior, in the late ‘70’s through early ‘80’s it suffered from mathematical and computational impracticalities. It provided few usable methods for dealing with complex problems. Disciplines such as psychology were held in thrall to Neyman-Pearson by a combination of convention and the practical requirements of complex research designs. If I wanted to provide students or, for that matter, colleagues who came to me for advice, with effective statistical tools for serious research then usually Neyman-Pearson techniques were all I could offer.

But what to do about teaching? No university instructor takes a formal oath to teach the truth, the whole truth, and nothing but the truth; but for those of us who’ve *been called to teach* it feels as though we do. I was sailing perilously close to committing Moore’s Paradox in the classroom (“I assert Neyman-Pearson but I don’t believe it”).

I tried slipping in bits and pieces alerting students to problematic aspects of Neyman-Pearson and the existence of the Bayesian alternative. These efforts may have assuaged my conscience but they did not have much impact, with one important exception. The more intellectually proactive students did seem to catch on to the idea that theories of probability and statistics are just that—Theories, not god-given commandments.

Then Bayes got a shot in the arm. In the mid-80’s some powerful computational techniques were adapted and developed that enabled this framework to fight at the same weight as Neyman-Pearson and even better it. These techniques sail under the banner of Markov chain Monte Carlo methods, and by the mid-90’s software was available (free!) to implement them. The stage was set for the Bayesian revolution. I began to dream of writing a Bayesian introductory statistics textbook for psychology students that would set the discipline free and launch the next generation of researchers.

It didn’t happen that way. Psychology was still deeply mired in Neyman-Pearson and, in fact, in a particularly restrictive version of it. I’ll spare you the details other than saying that it focused, for instance, on whether the researcher could reject the claim that an experimental effect was nonexistent. I couldn’t interest my colleagues in learning Bayesian techniques, let alone undergraduate students.

By the late ‘90’s a critical mass of authoritative researchers convinced the American Psychological Association to form a task-force to reform statistical practice, but this reform really amounted to shifting from the restrictive Neyman-Pearson orientation to a more liberal one that embraced estimating how big an experimental effect is and setting a “confidence interval” around it.

It wasn’t the Bayesian revolution, but I leapt onto this initiative because both reforms were a long stride closer to the Bayesian framework and would still enable students to read the older Neyman-Pearson dominated research literature. So, I didn’t write a Bayesian textbook after all. My 2000 introductory textbook was, so far as I’m aware, one of the first to teach introductory statistics to psychology students from a confidence interval viewpoint. It was generally received well by fellow reformers, and I got a contract to write a kind of researcher’s confidence interval handbook that came out in 2003. The confidence interval reform in psychology was under weigh, and I’d booked a seat on the juggernaut.

Market-wise, my textbook flopped. I’m not singing the blues about this, nor do I claim sour grapes. For whatever reasons, my book just didn’t take the market by storm. Shortly after it came out, a colleague mentioned to me that he’d been at a UK conference with a symposium on statistics teaching where one of the speakers proclaimed my book the “best in the world” for explaining confidence intervals and statistical power. But when my colleague asked if the speaker was using it in the classroom he replied that he was writing his own. And so better-selling introductory textbooks continued to appear. A few of them referred to the statistical reforms supposedly happening in psychology but the majority did not. Most of them are the n^{th} edition of a well-established book that has long been selling well to its set of long-serving instructors and their students.

My 2003 handbook fared rather better. I had put some software resources for computing confidence intervals on a webpage and these got a lot of use. These, and my handbook, got picked up by researchers and their graduate students. Several years on, the stuff my scripts did started to appear in mainstream commercial statistics packages. It seemed that this reform was occurring mainly at the advanced undergraduate, graduate and researcher levels. Introductory undergraduate statistical education in psychology remained (and still remains) largely untouched by it.

Meanwhile, what of the Bayesian movement? In this decade, graduate-level social science oriented Bayesian textbooks began to appear. I recently reviewed several of them and have just sent off an invited review of another. In my earlier review I concluded that the market still lacked an accessible graduate-level treatment oriented towards psychology, a gap that may have been filled by the book I’ve just finished reviewing.

Have I tried teaching Bayesian methods? Yes, but thus far only in graduate-level workshops, and on my own time (i.e., not as part of the official curriculum). I’ll be doing so again in the second half of this year, hoping to recruit some of my colleagues as well as graduate students. Next year I’ll probably introduce a module on Bayes for our 4^{th}-year (Honours) students.

It’s early days, however, and we remain far from being able to revamp the entire curriculum. Bayesian techniques still rarely appear in the mainstream research literature in psychology, and so students still need to learn Neyman-Pearson to read that literature with a knowledgably critical eye. A sea-change may be happening, but it’s going to take years (possibly even decades).

Will I try writing a Bayesian textbook? I already know from experience that writing a textbook is a lot of time and hard work, often for little reward. Moreover, in many universities (including mine) writing a textbook counts for nothing. It doesn’t bring research money, it usually doesn’t enhance the university’s (or the author’s) scholarly reputation, it isn’t one of the university’s “performance indicators,” and it seldom brings much income to the author. The typical university attitude towards textbooks is as if the stork brings them. Writing a textbook, therefore, has to be motivated mainly by a passion for teaching. So I’m thinking about it…

Written by michaelsmithson

May 2, 2011 at 8:32 am

Posted in Uncategorized

Tagged with Bayesian inference, Confidence interval, Jerzy Neyman, Markov chain Monte Carlo, Philosophy, Probability, Psychology, Social sciences, Statistics, Teaching, Thomas Bayes, Uncertainty

### 2 Responses

Subscribe to comments with RSS.

Being a student in your introductory course previously, I’d say that your biggest flaw compared to other books (obviously which came out later) was that it was difficult to quickly look up what you were looking for, instead having read the whole page or 5 discussing the topic.

Second was that it lacked worked examples, so you didn’t really know what the formula’s looked like in the real world.

A more broad criticism of most statistics textbooks is that it is very difficult to keep a consistent vocabulary, as well as cover all the important terms. One of my stratagies for statistics courses is to find 2-3 different attempts at explaining a concept so I can understand what parts are relevant, which parts aren’t, and which parts are equivalent.

NiroZMay 2, 2011 at 9:12 pm

Yours are fair comments. People have different learning styles, and it’s probably the case that no single textbook is going to suit all of them. I opted to incorporate most of my worked examples in the form of software that could generate infinitely many practice problems and examples. This worked really well for many students, but not all.

And your point about inconsistent vocabulary also is fair. I’d add “inconsistent notation” as well, in reference to some textbooks. Part of the problem there is that various disciplines have contributed to the statistical literature without being aware of what the others were doing. As a result, a particular concept, technique, or statistic can end up being called by different names depending on which textbook one is reading.

michaelsmithsonMay 3, 2011 at 8:33 am