ignorance and uncertainty

All about unknowns and uncertainties

My MOOC on Ignorance!

leave a comment »

After a year of preparation and hard slog, together with my colleague Gabriele Bammer, I’ve prepared a MOOC (Massive Open Online Course) on Ignorance. The MOOC is based on my work on this topic, and those of you who have followed my blog will find extensions and elaborations of the material covered in my posts. Gabriele’s contributions focus on the roles of ignorance in complex problems.

The course presents a comprehensive framework for understanding, coping with, and making decisions in the face of ignorance. Course participants will learn that ignorance is not always negative, but has uses and benefits in domains ranging from everyday life to the farthest reaches of science where ignorance is simultaneously destroyed and new ignorance created. They will discover the roles ignorance plays in human relationships, culture, institutions, and how it underpins important kinds of social capital.

In addition to video lectures, discussion topics, glossaries, and readings, I’ve provided some html-5 games that give hands-on experience in making decisions in the face of various kinds of unknowns.  The video lectures have captions in Chinese as well as English, and transcripts of the lectures are available in three languages: English, Simplified Chinese and Traditional Chinese. There also are Wiki glossaries for each week in both English and Chinese.

Putting all this together was quite a learning experience for me, and also a dream come true.  I’d come to realize that an interdisciplinary course on ignorance was never going to be a core component of any single discipline’s curriculum, because it doesn’t belong in any one discipline—It sprawls across disciplines.  So a MOOC seems an ideal vehicle for this topic: It’s free and open to everyone. I’ve designed the course to be short enough that other instructors can include it as a component in their own course, and fill the rest of their course with material tailored to their own discipline or profession.

The course will be provided in two five-week blocks. The first one begins on June 23, 2015 and the second on September 22, 2015. There are a lot of MOOCs out there, but there is no other course like this in the world. And ignorance is everyone’s business.  You can watch the promo video here or you can go straight to the registration page here.

 IgnoranceGameCover2_3Banner2

Written by michaelsmithson

May 24, 2015 at 12:04 am

Posted in Uncategorized

Impact Factors and the Mismeasure of Quality in Research

with 2 comments

Recently Henry Roediger III produced an article in the APA Observer criticizing simplistic uses of Impact Factors (IFs) as measures of research quality. A journal’s IF in a given year is defined as the number of times papers published in the preceding two years have been cited during the given year, divided by the number of papers published in the preceding two years. Roediger’s main points are that IFs involve some basic abuses of statistical and number craft, and they don’t measure what they are being used to measure. He points out that IFs are reported to three decimal places, which usually is spuriously precise. IFs also are means, and yet the citations of papers in most journals are very strongly skewed, with many papers not being cited at all and a few being cited many times. In many such journals the mode and median number of citations per paper both are 0. Finally, he points out that other indicators of impact are ignored, such as whether research publications are used in textbooks or applied in real-world settings. Similar criticisms have been raised in the San Francisco Declaration on Research Assessment, plus the points that IFs can be gamed by editorial policy and that IFs are strongly field-specific.

 

I agree with these points but I don’t think Roediger’s critique goes far enough. The chief problem with IFs is their usage in evaluating individual researchers, research departments, and universities. It should be blindingly obvious that the IF of a journal has almost nothing to do with the number of citations of your or my publications in that journal. The number of citations may be weakly driven by the journal’s reputation and focus, but it also strongly depends on how long your paper has been out there. Citation rates may be a bit more strongly connected with journal IFs than citation totals, but there still are many other factors influencing citation rates. These observations may be obvious, but they seem completely uncomprehended by those who would like to rule academics. Instead, all too often we researchers find ourselves in situations where our fates are being determined by an inappropriate metric that is mindlessly applied by people who know nothing about our research and who lack any modicum of numeracy. The use of IFs for judgments of individual researchers’ output quality should be junked.

 

Number of citations, the h-index, or the i10-index might seem reasonable measures, but there are difficulties with these alternatives too. Young academics can be the victims of the slow accrual of citations.  My own case is a fairly extreme illustration.  My citations currently number more than 2600.  I got my PhD in 1976, so I am about 37 years out of my PhD. Nearly half of those citations (1290) have occurred in the past 5 1/2 years (since 2008). That’s right– 31 years versus less than 6 years for the same number of citations.  In 2012 alone I had approximately 250 citations. This isn’t because I didn’t produce anything of impact until late—Two of my most cited works were published in 1987 and 1989 and have gained about half their citations since 2005 because, frankly, they were ahead of their time. There also seems to be confusion between number of citations and citation-rates. The h and i10 indexes are based on numbers of citations, as is the list of your publications that Google Scholar provides. But the graph of Google Scholar presents of citations of your works by year is presenting information about citation rates. You get a substantially different view of a publication’s impact if you measure it by citations per year since publication than if you do so by total number of citations. For instance, two of my works have nearly identical total numbers of citations (197 and 192), but the first has 7.6 citations/year whereas the second has 19.9 citations/year.  Finally, like IFs, number of citations and citation-rates depend on field-specific characteristics such as the number of people working in your area and achievable publication rates.

   
 

Another thing while I’m on the soap-box: Books should count. My books account for slightly more than half of my citations, and occupy ranks 1, 2, 3, 4, 7, 10, 13, and 14 out of the 75 of my publications that have received any citations. My most widely cited work by a long chalk is a book (published in 1989, now has more than 530 citations, approximately 22 per year, and about half of these since 2005). However, in the Australian science departments, books don’t count. Of course, this isn’t limited to the Australian scene. ISI ignores books and book chapters, and some journals forbid authors to include books or book chapters in their reference lists. This is a purely socially constructed make-believe version of “quality” that sanctions blindness to literature that has real, measurable, and very considerable impact. I have colleagues who seriously consider rewriting highly-cited book chapters and submitting them to journals so that they’ll count. This is sheer make-work of the most wasteful kind.

 

Finally, IFs and citation numbers or rates have no logical connection or demonstrated correlation with the quality of publications. Very bad works can garner high citation rates because numerous authors attack them. Likewise, useful but pedestrian papers can have high citation rates. Conversely, genuinely pioneering works may not be widely cited for a long time. There simply is no substitute for experts making careful assessments of the quality of research publications in their domains of expertise. Numerical indices can help, but they cannot supplant expert judgment. And yet, I claim that this is precisely the attraction that bureaucrats find in single-yardstick numerical measures. They don’t have to know anything about research areas or even basic number-craft to be able to rank-order researchers, departments, and/or entire universities by applying such a yardstick in a perfectly mindless manner. It’s a recipe for us to be ruled and controlled by folk who are massively ignorant and, worse still, meta-ignorant.

 


 

Written by michaelsmithson

October 17, 2013 at 3:35 am

Posted in Uncategorized

Over-diagnosis and “investigation momentum”

leave a comment »

One of my earlier posts, “Making the Wrong Decisions for the Right Reasons”, focused on conditions under which it is futile to pursue greater certainty in the name of better decisions. In this post, I’ll investigate settings in which a highly uncertain outcome should motivate us more strongly than no outcome at all to seek greater certainty. The primary stimulus for this post is a recent letter to JAMA Internal Medicine (Sah, Elias, & Ariely, 2013), entitled “Investigation momentum: the relentless pursuit to resolve uncertainty.” The authors present what they refer to as “another potential downside” to unreliable tests such as prostate-specific antigen (PSA) screening, namely the effect of receiving a “don’t know” result instead of a “positive” or “negative” diagnosis. Their chief claim is that the inconclusive result increases psychological uncertainty which motivates people to seek additional diagnostic testing, whereas untested people would not be motivated to get diagnostic tests. They term this motivation “investigation momentum”, and the obvious metaphor here is that once the juggernaut of testing gets going, it obeys a kind of psychological Newton’s First Law.

The authors’ evidence is an online survey of 727 men aged between 40 and 75 years, with a focus on prostate cancer and PSA screening (e.g., participants were asked to rate their likelihood of developing prostate cancer). The participants were randomly assigned to one of four conditions. In the “no PSA,” condition, participants were given risk information about prostate biopsies. In the other conditions, participants were given information about PSA tests and prostate biopsies. They then were asked to imagine they had just received their PSA test result, which was either “normal”, “elevated”, or “inconclusive”. In the “inconclusive” condition participants were told “This result provides no information about whether or not you have cancer.” After receiving information and (in three conditions) the scenario, participants were asked to indicate whether, considering what they had just been given, they would undergo a biopsy and their level of certainty in that decision.

The study results revealed that, as would be expected, the men whose test result was “elevated” were more likely to say they would get a biopsy than the men in any of the other conditions (61.5% versus 12.7% for those whose result was “normal”, 39.5% for those whose result was “inconclusive”, and 24.5% for those with no PSA). Likewise, the men whose hypothetical result was “normal” were least likely to opt for a biopsy. However, a significantly greater percentage opted for a biopsy if their test was “inconclusive” than those who had no test at all. This latter finding concerned the authors because, as they said, when “tests give no diagnostic information, rationally, from an information perspective, it should be equivalent to never having had the test for the purpose of future decision making.”

Really?

This claim amounts to stating that the patient’s subjective probability of having cancer should remain unchanged after the inconclusive test result. Whether that (rationally) should be the case depends on three things: The patient’s prior subjective probability of having cancer, the patient’s attribution of a probability of cancer to the ambiguous test result, and the relative weights assigned by the patient to the prior and the test result. There are two conditions under which the patient’s prior subjective probability should remain unchanged: (a) The prior subjective probability is identical to the probability attributed to the test, or (b) The test is given a weight of 0. Option (b) seems implausible, so option (a) is the limiting case. Now, it should be clear that if P(cancer|ambiguous test) > P(cancer|prior belief) then P(cancer|prior belief and ambiguous test) > P(cancer|prior belief). Therefore, it could be rational for people to be more inclined to get further tests after an initial test returns an ambiguous result than if they have not yet had any tests.

Let us take one more step. It is plausible that for many people an ambiguous test result would cause them to impute P(cancer|ambiguous test) to be somewhere in the neighbourhood of 1/2. So, for the sake of argument, let’s set P(cancer|ambiguous test) = 1/2. It also is plausible that most people will have an intuitive probability threshold, Pt, beyond which they will be inclined to seek testing. For something as consequential as cancer, we may suppose that for this threshold, Pt < 1/2. Indeed, the authors’ data suggest exactly this. In the no-PSA condition, 16.2% of the men rated their chance of getting prostate cancer above 1/2, but 24.5% of them said they would get a biopsy. Therefore, assuming that all of the 16.2% are included in those who opt for a biopsy, that leaves 8.3% of them in the below-1/2 part of the sample who also opt for a biopsy. An interpolation (see the Technical Bits section below) yields Pt = .38 (group average, of course).

The finding that vexed Sah, et al. is that 39.5% for those whose result was “inconclusive” opted for a biopsy, compared to 24.5% in the no-PSA condition. To begin, let’s assume that the “inconclusive” sample’s Pt also is .38 (after all, the authors find no significant differences among the four samples’ prior probabilities of getting prostate cancer). In the “inconclusive” sample 10.8% rated their chances of getting prostate cancer above 1/2 and 18.9% rated it between .26 and .5. So, our estimate of the percentage whose prior probability is .38 or more is 10.8% + 18.9%/2 = 20.3%. This is the percentage of people in the “inconclusive” sample who would have gone for a biopsy if they had no test, given Pt = .38. That leaves 19.2% to account for in the boost from 20.3% to 39.5% after receiving the inconclusive test result. Now, we can assume that 9.5% are in the .25-.50 range because that’s the total percentage of this sample in that range, so the remaining 9.7% must fall in the 0-.25 range. There are 70.3% altogether in that range, so a linear interpolation gives us a lowest probability for the 9.7% we need to account for of .25 (1 – 9.7/70.3) = .2155.

Now we need to compute the maximum relative weight of the test required to raise a subjective probability of .2155 to the threshold .38. From our formula in the Technical Bits section, we have


That is, the test would have to be given at most about 1.37 times the weight that each person gives to their own prior subjective probability of prostate cancer. Weighting the test 1.37 times more than one’s prior probability doesn’t seem like giving outlandish weight to the test. And therefore, a plausible case has been put that the tendency for more people to opt for further testing after an inconclusive test result might not be due to psychological “momentum” at all, but instead the product of rational thought. I’m not claiming that the 727 men in the study actually are doing Bayesian calculations—I’m just pointing out that the authors’ findings can be just as readily explained by attributing Bayesian rationality to them as by inferring that they are in the thrall of some kind of psychological “momentum”. The key to understanding all this intuitively is the distinction (introduced by Keynes in 1921) between the weight vs strength (or extremity) of evidence. An inconclusive test is not extreme, i.e., favouring neither the hypothesis that one has the disease or is clear of it. Nevertheless, it still adds its own weight.

Technical Bits

The Pt = .38 result is obtained by a simple linear interpolation. There is 8.3% of the no-PSA sample that fall in the next range below 1/2, which in the authors’ table is a range from .26 to .50. All up, 16% of this sample are in that range, so assuming that the 8.3% are the top-raters in that bunch, our estimate of their lowest probability is .50 – 8.3(.50-.26)/16 = .38.

The relative weight of the test is determined by assuming that the participant is reasoning like a Bayesian agent. Assuming that P(cancer|ambiguous test) = 1/2, we may determine when these conditions could occur for a Bayesian agent whose subjective probability, P(cancer|prior belief) , has a Beta() distribution and therefore a mean of . To begin, the first condition would be satisfied if  > Pt. Now denoting the relative weight of the test by , the posterior distribution of P(cancer|prior belief and ambiguous test) would be Beta(). So, the second condition would be satisfied if  > Pt. Solving for the weight, we get


This makes intuitive sense in two respects. First, the numerator is positive only if  < Pt, i.e., if the first condition hasn’t already been satisfied. The further  is below Pt, the greater the weight needs to be in order for the above inequality to be satisfied. Second, the denominator tells us that the further Pt is below ½, the less weight needs to be given to the test for that inequality to be satisfied.

The “precision” of a Beta() distribution is . The greater this sum, the tighter the distribution is around its mean. So the precision can be used as a proxy for the weight of evidence associated by an agent to their prior belief. The test adds to the precision because it is additional evidence: So now we can compare the weight of the test relative to the precision of the prior: Given the minimal weight required in the previous equation, we get


where Pp is the agent’s mean prior probability of getting cancer.

 

 


 

Written by michaelsmithson

August 26, 2013 at 2:41 am

Posted in Uncategorized

Digital “poster” and Integration and Implementation Sciences Conference

leave a comment »

I’ll indulge in a bit more shameless advertising here, but it’s directly relevant to this blog. First, I’m one among several plenary speakers at what’s billed as the First Global Conference on Research Integration and Implementation (8-11 September). As the entry webpage says, the main goal is bringing together everyone whose research interests include:

  • understanding problems as systems,
  • combining knowledge from various disciplines and practice areas,
  • dealing with unknowns to reduce risk, unpleasant surprises and adverse unintended consequences,
  • helping research teams collaborate more effectively, and
  • implementing evidence in improved policy and practice.

Although the physical conference takes place in Canberra, Australia, we have an extensive online conference setup and co-conferences taking place in Germany, The Netherlands, and Uruguay. Most of the conference will be broadcast live over the net.

In addition to the information about the program, participants, and events, you can get a fair idea of what the conference is about by browsing the digital posters. So this is where the shameless advert comes in. My digital poster is essentially a couple of slides of links describing my thinking about ignorance, uncertainty and the unknown over the past 25 years or so. The links go to my publications, posts on this blog, and works by others on this topic. You can get to my poster is by scrolling down this list and clicking the relevant link.

Written by michaelsmithson

August 12, 2013 at 4:18 am

Posted in Uncategorized

A Few (More) Myths about “Big Data”

leave a comment »

Following on from Kate Crawford’s recent and excellent elaboration of six myths about “big data”, I should like to add four more that highlight important issues about such data that can misguide us if we ignore them or are ignorant of them.

Myth 7: Big data are precise.

As with analyses of almost any other kind of data, big data analyses largely consists of estimates. Often these estimates are based on sample data rather than population data, and the samples may not be representative of their referent populations (as Crawford points out, but also see Myth 8). Moreover, big data are even less likely than “ordinary” data to be free of recording errors or deliberate falsification.

Even when the samples are good and the sample data are accurately recorded, estimates still are merely estimates, and the most common mistake decision makers and other stakeholders make about estimates is treating them as if they are precise or exact. In a 1990 paper I referred to this as the fallacy of false precision. Estimates always are imprecise, and ignoring how imprecise they are is equivalent to ignoring how wrong they could be. Major polling companies gradually learned to report confidence intervals or error-rates along with their estimates and to take these seriously, but most government departments apparently have yet to grasp this obvious truth.

Why might estimate error be a greater problem for big data than for “ordinary” data? There are at least two reasons. First, it is likely to be more difficult to verify the integrity or veracity of big data simply because it is integrated from numerous sources. Second, if big datasets are constructed from multiple sources, each consisting of an estimate with its own imprecision, then these imprecisions may propagate. To give a brief illustration, if estimate X has variance x2, estimate Y has variance y2, X and Y are independent of one another, and our “big” dataset consists of adding X+Y to get Z, then the variance of Z will be x2 + y2.

Myth 8: Big data are accurate.

There are two senses in which big data may be inaccurate, in addition to random variability (i.e., sampling error): Biases, and measurement confounds. Economic indicators of such things as unemployment rates, inflation, or GDP in most countries are biased. The bias stems from the “shadow” (off the books) economic activity in most countries. There is little evidence that economic policy makers in most countries pay any attention to such distortions when using economic indicators to inform policies.

Measurement confounds are a somewhat more subtle issue, but the main idea is that data may not measure what we think it is measuring because it is influenced by extraneous factors. Economic indicators are, again, good examples but there are plenty of others (don’t get me started on the idiotic bibliometrics and other KPIs that are imposed on us academics in the name of “performance” assessment). Web analytics experts are just beginning to face up to this problem. For instance, webpage dwell times are not just influenced by how interested the visitor is in the content of a webpage, but may also reflect such things as how difficult the contents are to understand, the visitor’s attention span, or the fact that they left their browsing device to do something else and then returned much later. As in Myth 7, bias and measurement confounds may be compounded in big data to a greater extent than they are in small data, simply because big data often combines multiple measures.

Myth 9. Big data are stable.

Data often are not recorded just once, but re-recorded as better information becomes available or as errors are discovered. In a recent Wall Street Journal article, economist Samuel Rines presented several illustrations of how unstable economic indicator estimates are in the U.S. For example, he observed that in November 2012 the first official estimate of net employment increase was 146,000 new jobs. By the third revision that number had increased by 68% to 247,000. In another instance, he pointed out that American GDP annual estimates each year typically are revised several times, and often substantially, as the year slides into the past.

Again, there is little evidence that people crafting policy or making decisions based on these numbers take their inherent instability into account. One may protest that often decisions must be made before “final” revisions can be completed. However, where such revisions in the past have been recorded, the degree of instability in these indicators should not be difficult to estimate. These could be taken into account, at the very least, in worst- and best-case scenario generation.

Myth 10. We have adequate computing power and techniques to analyse big data.

Analysing big data is a computationally intense undertaking, and at least some worthwhile analytical goals are beyond our reach, in terms of computing power and even, in some cases, techniques. I’ll give just one example. Suppose we want to model the total dwell time per session of a typical user who is browsing the web. The number of items on which the user dwells is a random variable, and so is the amount of dwell time for each item. The total dwell time, then, is what is called a “randomly stopped sum”. The expression for the probability distribution of a randomly stopped sum doesn’t have a closed form (it’s an infinite sum), so it can’t be modelled via conventional statistical estimation techniques (least-squares or maximum likelihood). Instead, there are two viable approaches: Simulation and Bayesian hierarchical MCMC. I’m writing a paper on this topic, and from my own experience I can declare that either technique would require a super-computer for datasets of the kind dealt with, e.g., by NRS PADD.

Written by michaelsmithson

July 26, 2013 at 6:57 am

A Book Ate My Blog

leave a comment »

Sad, but true—I’ve been off blogging since late 2011 because I was writing a book (with Ed Merkle) under a contract with Chapman and Hall. Writing the book took up the time I could devote to writing blog posts. Anyhow, I’m back in the blogosophere, and for those unusual people out there who have interests in multivariate statistical modelling, you can find out about the book here or here.

Written by michaelsmithson

July 26, 2013 at 6:30 am

Statistical Significance On Trial

with 2 comments

There is a long-running love-hate relationship between the legal and statistical professions, and two vivid examples of this have surfaced in recent news stories, one situated in a court of appeal in London and the other in the U.S. Supreme Court. Briefly, the London judge ruled that Bayes’ theorem must not be used in evidence unless the underlying statistics are “firm;” while the U.S. Supreme Court unanimously ruled that a drug company’s non-disclosure of adverse side-effects cannot be justified by an appeal to the statistical non-significance of those effects. Each case, in its own way, shows why it is high time to find a way to establish an effective rapprochement between these two professions.

The Supreme Court decision has been applauded by statisticians, whereas the London decision has appalled statisticians of similar stripe. Both decisions require some unpacking to understand why statisticians would cheer one and boo the other, and why these are important decisions not only for both the statistical and legal professions but for other domains and disciplines whose practices hinge on legal and statistical codes and frameworks.

This post focuses on the Supreme Court decision. The culprit was a homoeopathic zinc-based medicine, Zicam, manufactured by Matrixx Initivatives, Inc. and advertised as a remedy for the common cold. Matrixx ignored reports from users and doctors since 1999 that Zicam caused some users to experience burning sensations or even to lose the sense of smell. When this story was aired by a doctor on Good Morning America in 2004, Matrixx stock price plummeted.

The company’s defense was that these side-effects were “not statistically significant.” In the ensuing fallout, Matrixx was faced with more than 200 lawsuits by Zicam users, but the case in point here is Siracusano vs Matrixx, in which Mr. Siracusano was suing on behalf of investors on grounds that they had been misled. After a few iterations through the American court system, the question that the Supreme Court ruled on was whether a claim of securities fraud is valid against a company that neglected to warn consumers about effects that had been found to be statistically non-significant. As insider-knowledgeable Stephen Ziliak’s insightful essay points out, the decision will affect drug supply regulation, securities regulation, liability and the nature of adverse side-effects disclosed by drug companies. Ziliak was one of the “friends of the court” providing expert advice on the case.

A key point in this dispute is whether statistical nonsignificance can be used to infer that a potential side-effect is, for practical purposes, no more likely to occur when using the medicine than when not. Among statisticians it is a commonplace that such inferences are illogical (and illegitimate). There are several reasons for this, but I’ll review just two here.

These reasons have to do with common misinterpretations of the measure of statistical significance. Suppose Matrixx had conducted a properly randomized double-blind experiment comparing Zicam-using subjects with those using an indistinguishable placebo, and observed the difference in side-effect rates between the two groups of subjects. One has to bear in mind that random assignment of subjects to one group or the other doesn’t guarantee equivalence between the groups. So, it’s possible that even if there really is no difference between Zicam and the placebo regarding the side-effect, a difference between the groups might occur by “luck of the draw.”

The indicator of statistical significance in this context would be the probability of observing a difference at least as large as the one found in the study if the hypothesis of no difference were true. If this probability is found to be very low (typically .05 or less) then the experimenters will reject the no-difference hypothesis on the grounds that the data they’ve observed would be very unlikely to occur if that hypothesis were true. They will then declare that there is a statistically significant difference between the Zicam and placebo groups. If this probability is not sufficiently low (i.e., greater than .05) the experimenters will decide not to reject the no-difference hypothesis and announce that the difference they found was statistically non-significant.

So the first reason for concern is that Matrixx acted as if statistical nonsignificance entitles one to believe in the hypothesis of no-difference. However, failing to reject the hypothesis of no difference doesn’t entitle one to believe in it. It’s still possible that a difference might exist and the experiment failed to find it because it didn’t have enough subjects or because the experimenters were “unlucky.” Matrixx has plenty of company in committing this error; I know plenty of seasoned researchers who do the same, and I’ve already canvassed the well-known bias in fields such as psychology not to publish experiments that failed to find significant effects.

The second problem arises from a common intuition that the probability of observing a difference at least as large as the one found in the study if the hypothesis of no difference were true tells us something about the inverse—the probability that the no-difference hypothesis is true if we find a difference at least as large as the one observed in our study, or, worse still, the probability that the no-difference hypothesis is true. However, the first probability on its own tells us nothing about the other two.

For a quick intuitive, if fanciful, example let’s imagine randomly sampling one person from the world’s population and our hypothesis is that s/he will be Australian. On randomly selecting our person, all that we know about her initially is that she speaks English.

There are about 750 million first-or second-language English speakers world-wide, and about 23 million Australians. Of the 23 million Australians, about 21 million of them fit the first- or second-language English description. Given that our person speaks English, how likely is it that we’ve found an Australian? The probability that we’ve found an Australian given that we’ve picked an English-speaker is 21/750 = .03. So there goes our hypothesis. However, had we picked an Australian (i.e., given that our hypothesis were true), the probability that s/he speaks English is 21/23 = .91.

See also Ziliak and McCloskey’s 2008 book, which mounts a swinging demolition of the unquestioned application of statistical significance in a variety of domains.

Aside from the judgment about statistical nonsignificance, the most important stipulation of the Supreme Court’s decision is that “something more” is required before a drug company can justifiably decide to not disclose a drug’s potential side-effects. What should this “something more” be? This sounds as if it would need judgments about the “importance” of the side-effects, which could open multiple cans of worms (e.g., Which criteria for importance? According to what or whose standards?). Alternatively, why not simply require drug companies to report all occurrences of adverse side-effects and include the best current estimates of their rates among the population of users?

A slightly larger-picture view of the Matrixx defense resonates with something that I’ve observed in even the best and brightest of my students and colleagues (oh, and me too). And that is the hope that somehow probability or statistical theories will get us off the hook when it comes to making judgments and decisions in the face of uncertainty. It can’t and won’t, especially when it comes to matters of medical, clinical, personal, political, economic, moral, aesthetic, and all the other important kinds of importance.

Written by michaelsmithson

October 22, 2011 at 11:31 pm

Follow

Get every new post delivered to your Inbox.