Posts Tagged ‘Psychology’
Back in late May 2011, there were news stories of charges of manslaughter laid against six earthquake experts and a government advisor responsible for evaluating the threat of natural disasters in Italy, on grounds that they allegedly failed to give sufficient warning about the devastating L’Aquila earthquake in 2009. In addition, plaintiffs in a separate civil case are seeking damages in the order of €22.5 million (US$31.6 million). The first hearing of the criminal trial occurred on Tuesday the 20th of September, and the second session is scheduled for October 1st.
According to Judge Giuseppe Romano Gargarella, the defendants gave inexact, incomplete and contradictory information about whether smaller tremors in L’Aquila six months before the 6.3 magnitude quake on 6 April, which killed 308 people, were to be considered warning signs of the quake that eventuated. L’Aquila was largely flattened, and thousands of survivors lived in tent camps or temporary housing for months.
If convicted, the defendants face up to 15 years in jail and almost certainly will suffer career-ending consequences. While manslaughter charges for natural disasters have precedents in Italy, they have previously concerned breaches of building codes in quake-prone areas. Interestingly, no action has yet been taken against the engineers who designed the buildings that collapsed, or government officials responsible for enforcing building code compliance. However, there have been indications of lax building codes and the possibility of local corruption.
The trial has, naturally, outraged scientists and others sympathetic to the plight of the earthquake experts. An open letter by the Istituto Nazionale di Geofisica e Vulcanologia (National Institute of Geophysics and Volcanology) said the allegations were unfounded and amounted to “prosecuting scientists for failing to do something they cannot do yet — predict earthquakes”. The AAAS has presented a similar letter, which can be read here.
In pre-trial statements, the defence lawyers also have argued that it was impossible to predict earthquakes. “As we all know, quakes aren’t predictable,” said Marcello Melandri, defence lawyer for defendant Enzo Boschi, who was president of Italy’s National Institute of Geophysics and Volcanology). The implication is that because quakes cannot be predicted, the accusations that the commission’s scientists and civil protection experts should have warned that a major quake was imminent are baseless.
Unfortunately, the Istituto Nazionale di Geofisica e Vulcanologia, the AAAS, and the defence lawyers were missing the point. It seems that failure to predict quakes is not the substance of the accusations. Instead, it is poor communication of the risks, inappropriate reassurance of the local population and inadequate hazard assessment. Contrary to earlier reports, the prosecution apparently is not claiming the earthquake should have been predicted. Instead, their focus is on the nature of the risk messages and advice issued by the experts to the public.
Examples raised by the prosecution include a memo issued after a commission meeting on 31 March 2009 stating that a major quake was “improbable,” a statement to local media that six months of low-magnitude tremors was not unusual in the highly seismic region and did not mean a major quake would follow, and an apparent discounting of the notion that the public should be worried. Against this, defence lawyer Melandri has been reported saying that the panel “never said, ‘stay calm, there is no risk’”.
It is at this point that the issues become both complex (by their nature) and complicated (by people). Several commentators have pointed out that the scientists are distinguished experts, by way of asserting that they are unlikely to have erred in their judgement of the risks. But they are being accused of “incomplete, imprecise, and contradictory information” communication to the public. As one of the civil parties to the lawsuit put it, “Either they didn’t know certain things, which is a problem, or they didn’t know how to communicate what they did know, which is also a problem.”
So, the experts’ scientific expertise is not on trial. Instead, it is their expertise in risk communication. As Stephen S. Hall’s excellent essay in Nature points out, regardless of the outcome this trial is likely to make many scientists more reluctant to engage with the public or the media about risk assessments of all kinds. The AAAS letter makes this point too. And regardless of which country you live in, it is unwise to think “Well, that’s Italy for you. It can’t happen here.” It most certainly can and probably will.
Matters are further complicated by the abnormal nature of the commission meeting on the 31st of March at a local government office in L’Aquila. Boschi claims that these proceedings normally are closed whereas this meeting was open to government officials, and he and the other scientists at the meeting did not realize that the officials’ agenda was to calm the public. The commission did not issue its usual formal statement, and the minutes of the meeting were not completed, until after the earthquake had occurred. Instead, two members of the commission, Franco Barberi and Bernardo De Bernardinis, along with the mayor and an official from Abruzzo’s civil-protection department, held a now (in)famous press conference after the meeting where they issued reassuring messages.
De Bernardinis, an expert on floods but not earthquakes, incorrectly stated that the numerous earthquakes of the swarm would lessen the risk of a larger earthquake by releasing stress. He also agreed with a journalist’s suggestion that residents enjoy a glass of wine instead of worrying about an impending quake.
The prosecution also is arguing that the commission should have reminded residents in L’Aquila of the fragility of many older buildings, advised them to make preparations for a quake, and reminded them of what to do in the event of a quake. This amounts to an accusation of a failure to perform a duty of care, a duty that many scientists providing risk assessments may dispute that they bear.
After all, telling the public what they should or should not do is a civil or governmental matter, not a scientific one. Thomas Jordan’s essay in New Scientist brings in this verdict: “I can see no merit in prosecuting public servants who were trying in good faith to protect the public under chaotic circumstances. With hindsight their failure to highlight the hazard may be regrettable, but the inactions of a stressed risk-advisory system can hardly be construed as criminal acts on the part of individual scientists.” As Jordan points out, there is a need to separate the role of science advisors from that of civil decision-makers who must weigh the benefits of protective actions against the costs of false alarms. This would seem to be a key issue that urgently needs to be worked through, given the need for scientific input into decisions about extreme hazards and events, both natural and human-caused.
Scientists generally are not trained in communication or in dealing with the media, and communication about risks is an especially tricky undertaking. I would venture to say that the prosecution, defence, judge, and journalists reporting on the trial will not be experts in risk communication either. The problems in risk communication are well known to psychologists and social scientists specializing in its study, but not to non-specialists. Moreover, these specialists will tell you that solutions to those problems are hard to come by.
For example, Otway and Wynne (1989) observed in a classic paper that an “effective” risk message has to simultaneously reassure by saying the risk is tolerable and panic will not help, and warn by stating what actions need to be taken should an emergency arise. They coined the term “reassurance arousal paradox” to describe this tradeoff (which of course is not a paradox, but a tradeoff). The appropriate balance is difficult to achieve, and is made even more so by the fact that not everyone responds in the same way to the same risk message.
It is also well known that laypeople do not think of risks in the same way as risk experts (for instance, laypeople tend to see “hazard” and “risk” as synonyms), nor do they rate risk severity in line with the product of probability and magnitude of consequence, nor do they understand probability—especially low probabilities. Given all of this, it will be interesting to see how the prosecution attempts to establish that the commission’s risk communications contained “incomplete, imprecise, and contradictory information.”
Scientists who try to communicate risks are aware of some of these issues, but usually (and understandably) uninformed about the psychology of risk perception (see, for instance, my posts here and here on communicating uncertainty about climate science). I’ll close with just one example. A recent International Commission on Earthquake Forecasting (ICEF) report argues that frequently updated hazard probabilities are the best way to communicate risk information to the public. Jordan, chair of the ICEF, recommends that “Seismic weather reports, if you will, should be put out on a daily basis.” Laudable as this prescription is, there are at least three problems with it.
Weather reports with probabilities of rain typically present probabilities neither close to 0 nor to 1. Moreover, they usually are anchored on tenths (e.g., .2, or .6 but not precise numbers like .23162 or .62947). Most people have reasonable intuitions about mid-range probabilities such as .2 or .6. But earthquake forecasting has very low probabilities, as was the case in the lead-up to the L’Aquila event. Italian seismologists had estimated the probability of a large earthquake in the next three days had increased from 1 in 200,000, before the earthquake swarm began, to 1 in 1,000 following the two large tremors the day before the quake.
The first problem arises from the small magnitude of these probabilities. Because people are limited in their ability to comprehend and evaluate extreme probabilities, highly unlikely events usually are either ignored or overweighted. The tendency to ignore low-probability events has been cited to account for the well-established phenomenon that homeowners tend to be under-insured against low probability hazards (e.g., earthquake, flood and hurricane damage) in areas prone to those hazards. On the other hand, the tendency to over-weight low-probability events has been used to explain the same people’s propensity to purchase lottery tickets. The point is that low-probability events either excite people out of proportion to their likelihood or fail to excite them altogether.
The second problem is in understanding the increase in risk from 1 in 200,000 to 1 in 1,000. Most people are readily able to comprehend the differences between mid-range probabilities such as an increase in the chance of rain from .2 to .6. However, they may not appreciate the magnitude of the difference between the two low probabilities in our example. For instance, an experimental study with jurors in mock trials found that although DNA evidence is typically expressed in terms of probability (specifically, the probability that the DNA sample could have come from a randomly selected person in the population), jurors were equally likely to convict on the basis of a probability of 1 in 1,000 as a probability of 1 in 1 billion. At the very least, the public would need some training and accustoming to miniscule probabilities.
All this leads us to the third problem. Otway and Wynne’s “reassurance arousal paradox” is exacerbated by risk communications about extremely low-probability hazards, no matter how carefully they are crafted. Recipients of such messages will be highly suggestible, especially when the stakes are high. So, what should the threshold probability be for determining when a “don’t ignore this” message is issued? It can’t be the imbecilic Dick Cheney zero-risk threshold for terrorism threats, but what should it be instead?
Note that this is a matter for policy-makers to decide, not scientists, even though scientific input regarding potential consequences of false alarms and false reassurances should be taken into account. Criminal trials and civil lawsuits punishing the bearers of false reassurances will drive risk communicators to lower their own alarm thresholds, thereby ensuring that they will sound false alarms increasingly often (see my post about making the “wrong” decision most of the time for the “right” reasons).
Risk communication regarding low-probability, high-stakes hazards is one of the most difficult kinds of communication to perform effectively, and most of its problems remain unsolved. The L’Aquila trial probably will have an inhibitory impact on scientists’ willingness to front the media or the public. But it may also stimulate scientists and decision-makers to work together for the resolution of these problems.
I started this post in Hong Kong airport, having just finished one conference and heading to Innsbruck for another. The Hong Kong meeting was on psychometrics and the Innsbruck conference was on imprecise probabilities (believe it or not, these topics actually do overlap). Anyhow, Annemarie Zand Scholten gave a neat paper at the math psych meeting in which she pointed out that, contrary to a strong intuition that most of us have, introducing and accounting for measurement error can actually sharpen up measurement. Briefly, the key idea is that an earlier “error-free” measurement model of, say, human comparisons between pairs of objects on some dimensional characteristic (e.g., length) could only enable researchers to recover the order of object length but not any quantitative information about how much longer people were perceiving one object to be than another.
I’ll paraphrase (and amend slightly) one of Annemarie’s illustrations of her thesis, to build intuition about how her argument works. In our perception lab, we present subjects with pairs of lines and ask them to tell us which line they think is the longer. One subject, Hawkeye Harriet, perfectly picks the longer of the two lines every time—regardless of how much longer one is than the other. Myopic Myra, on the other hand, has imperfect visual discrimination and thus sometimes gets it wrong. But she’s less likely to choose the wrong line if the two lines’ lengths considerably differ from one another. In short, Myra’s success-rate is positively correlated with the difference between the two line-lengths whereas Harriet’s uniformly 100% success rate clearly is not.
Is there a way that Myra’s success- and error-rates could tell us exactly how long each object is, relative to the others? Yes. Let pij be the probability that Myra picks the ith object as longer than the jth object, and pji = 1 – pij be the probability that Myra picks the jth object as longer than the ith object. If the ith object has length Li and the jth object has length Lj, then if pij/pji = Li/Lj, Myra’s choice-rates perfectly mimic the ratio of the ith and jth objects’ lengths. This neat relationship owes its nature to the fact that a characteristic such as length has an absolute zero, so we can meaningfully compare lengths by taking ratios.
How about temperature? This is slightly trickier, because if we’re using a popular scale such as Celsius or Fahrenheit then the zero-point of the scale isn’t absolute in the sense that length has an absolute zero (i.e., you can have Celsius and Fahrenheit readings below zero, and each scale’s zero-point differs from the other). Thus, 60 degrees Fahrenheit is not twice as warm as 30 degrees Fahrenheit. However, the differences between temperatures can be compared via ratios. For instance, 40 degrees F is twice as far from 20 degrees F as 10 degrees F is.
We just need a common “reference” object against which to compare each of the others. Suppose we’re asking Myra to choose which of a pair of objects is the warmer. Assuming that Myra’s choices are transitive, there will be an object she chooses less often than any of the others in all of the paired comparisons. Let’s refer to that object as the Jth object. Now suppose the ith object has temperature Ti,the jth object has temperature Tj, and the Jth object has temperature TJ which is lower than both Ti and Tj. Then if Myra’s choice-rate ratio is
piJ/pjJ = (Ti – TJ)/( Tj – TJ),
she functions as a perfect measuring instrument for temperature comparisons between the ith and jth objects. Again, Hawkeye Harriet’s choice-rates will be piJ = 1 and pjJ = 1 no matter what Ti and Tj are, so her ratio always is 1.
If we didn’t know what the ratios of those lengths or temperature differences were, Myra would be a much better measuring instrument than Harriet even though Harriet never makes mistakes. Are there such situations? Yes, especially when it comes to measuring mental or psychological characteristics for which we have no direct access, such as subjective sensation, mood, or mental task difficulty.
Which of 10 noxious stimuli is the more aversive? Which of 12 musical rhythms makes you feel more joyous? Which of 20 types of puzzle is the more difficult? In paired comparisons between each possible pair of stimuli, rhythms or puzzles, Hawkeye Harriet will pick what for her is the correct pair every time, so all we’ll get from her is the rank-order of stimuli, rhythms and puzzles. Myopic Myra will less reliably and less accurately choose what for her is the correct pair, but her choice-rates will be correlated with how dissimilar each pair is. We’ll recover much more precise information about the underlying structure of the stimulus set from error-prone Myra.
Annemarie’s point about measurement is somewhat related to another fascinating phenomenon known as stochastic resonance. Briefly paraphrasing the Wikipedia entry for stochastic resonance (SR), SR occurs when a measurement or signal-detecting system’s signal-to-noise ratio increases when a moderate amount of noise is added to the incoming signal or to the system itself. SR usually is observed either in bistable or sub-threshold systems. Too little noise results in the system being insufficiently sensitive to the signal; too much noise overwhelms the signal. Evidence for SR has been found in several species, including humans. For example, a 1996 paper in Nature reported a demonstration that subjects asked to detect a sub-threshold impulse via mechanical stimulation of a fingertip maximized the percentage of correct detections when the signal was mixed with a moderate level of noise. One way of thinking about the optimized version of Myopic Myra as a measurement instrument is to model her as a “noisy discriminator,” with her error-rate induced by an optimal random noise-generator mixed with an otherwise error-free discriminating mechanism.
Hi, I’m back again after a few weeks’ travel (presenting papers at conferences). I’ve already posted material on this blog about the “ignorance explosion.” Numerous writings have taken up the theme that there is far too much relevant information for any of us to learn and process and the problem is worsening, despite the benefits of the internet and effective search-engines. We all have had to become more hyper-specialized and fragmented in our knowledge-bases than our forebears, and many of us find it very difficult as a result to agree with one another about the “essential” knowledge that every child should receive in their education and that every citizen should possess.
Well, here is a modest proposal for one such essential: We should all become expert about experts and expertise. That is, we should develop meta-expertise.
We can’t know everything, but knowing an expert when we see one, being able to tell the difference between an expert and an impostor, and knowing what it takes to become an expert can guide our search for assistance in all things about which we’re ignorant. A meta-expert should:
- Know the broad parameters of and requirements for attaining expertise;
- Be able to distinguish a genuine expert from a pretender or a charlatan;
- Know whether expertise is and when it is not attainable in a given domain;
- Possess effective criteria for evaluating expertise, within reasonable limits; and
- Be aware of the limitations of specialized expertise.
Let’s start with that strongly democratic source of expertise: Wikipedia’s take on experts:
“In many domains there are objective measures of performance capable of distinguishing experts from novices: expert chess players will almost always win games against recreational chess players; expert medical specialists are more likely to diagnose a disease correctly; etc.”
That said, the Wikipedia entry also raises a potentially vexing point, namely that “expertise” may come down to merely a matter of consensus, often dictated by the self-same “experts.” Examples readily spring to mind in areas where objective measures are hard to come by, such as the arts. But consider also domains where objective measures may be obtainable but not assessable by laypeople. Higher mathematics is a good example. Only a tiny group of people on the planet were capable of assessing whether Andrew Wiles really had proven Fermat’s Theorem. The rest of us have to take their word for it.
A crude but useful dichotomy splits views about expertise into two camps: Constructivist and performative. The constructivist view emphasizes the influence of communities of practice in determining what expertise is and who is deemed to have it. The performative view portrays expertise as a matter of learning through deliberative practice. Both views have their points, and many domains of expertise have elements of both. Even domains where objective indicators of expertise are available can have constructivist underpinnings. A proficient modern-day undergraduate physics student would fail late 19th-century undergraduate physics exams; and experienced medical practitioners emigrating from one country to another may find their qualifications and experience unrecognized by their adopted country.
What are the requirements for attaining deep expertise? Two popular criteria are talent and deliberative practice. Re deliberate practice, a much-discussed rule of thumb is the “10,000 hour rule.” This rule was popularized in Malcolm Gladwell’s book Outliers and some authors misattribute it to him. It actually dates back to studies of chess masters in the 1970’s (see Ericsson, K. A., R. Th. Krampe, and C. Tesch-Römer, 1993), and its generalizability to other domains still is debatable. Nevertheless, the 10K rule has some merit, and unfortunately it has been routinely ignored in many psychological studies comparing “experts” with novices, where the “experts” often are undergraduates who have been given a few hours’ practice on a relatively trivial task.
The 10K rule can be a useful guide but there’s an important caveat. It may be a necessary but it is by no means a sufficient condition for guaranteeing deep expertise. At least three other conditions have to be met: Deliberative and effective practice in a domain where deep expertise is attainable. Despite this quite simple line of reasoning, plenty of published authors have committed the error of viewing the 10K rule as both necessary and sufficient. Gladwell didn’t make this mistake, but Jane McGonigal’s recent book on video and computer games devotes considerable space to the notion that because gamers are spending upwards of 10K hours playing games they must be attaining deep “expertise” of some kind. Perhaps some may be, provided they are playing games of sufficient depth. But many will not. (BTW, McGonigal’s book is worth a read despite her over-the-top optimism about how games can save the world—and take a look at her game-design collaborator Bogost’s somewhat dissenting review of her book).
Back to the caveats. First, no deliberation makes practice useless. Having spent approximately 8 hours every day sleeping for the past 61 years (178,120 hours) hasn’t made me an expert on sleep. Likewise, deliberative but ineffective practice methods deny us top-level expertise. Early studies of Morse Code experts demonstrated that mere deliberative practice did not guarantee best performance results; specific training regimes were required instead. Autodidacts with insight and aspirations to attain the highest performative levels in their domains eventually realise how important getting the “right” coaching or teaching is.
Finally, there is the problem of determining whether effective, deliberative practice yields deep expertise in any domain. The domain may simply not be “deep” enough. In games of strategy, tic-tac-toe is a clear example of insufficient depth, checkers is a less obvious but still clear example, whereas chess and go clearly have sufficient depth.
Tic-tac-toe aside, are there domains that possess depth where deep expertise nevertheless is unattainable? There are, at least, some domains that are deeply complex where “experts” perform no better then less trained individuals or simple algorithms. Psychotherapy is one such domain. There is a plethora of studies demonstrating that clinical psychologists’ predictions of patient outcomes are worse than simple linear regression models (cf. Dawes’ searing indictment in his 1994 book) and that sometimes experts’ decisions are no more accurate than beginners’ decisions and simple decision aids. Similar results have been reported for financial planners and political experts. In Philip Tetlock’s 2005 book on so-called “expert” predictions, he finds that many so-called experts perform no better than chance in predicting political events, financial trends, and so on.
What can explain the absence of deep expertise in these instances? Tetlock attributes experts’ poor performance to two factors, among others: Hyperspecialization and overconfidence. “We reach the point of diminishing marginal predictive returns for knowledge disconcertingly quickly,” he reports. “In this age of academic hyperspecialization, there is no reason for supposing that contributors to top journals—distinguished political scientists, area study specialists, economists, and so on—are any better than journalists or attentive readers of the New York Times in ‘reading’ emerging situations.” And the more famous the forecaster the more overblown the forecasts. “Experts in demand,” Tetlock says, “were more overconfident than their colleagues who eked out existences far from the limelight.” Tetlock also claims that cognitive style counts: “Foxes” tend to outperform “hedgehogs.” These terms are taken from Isaiah Berlin’s popular essay: Foxes know a little about lots of things, whereas hedgehogs know one big thing.
Another contributing factor may be a lack of meta-cognitive insight on the part of the experts. A hallmark of expertise is ignoring (not ignorance). This proposition may sound less counter-intuitive if it’s rephrased to say that experts know what to ignore. In an earlier post I mentioned Mary Omodei and her colleagues’ chapter in a 2005 book on professionals’ decision making in connection with this claim. Their chapter opens with the observation of a widespread assumption that domain experts also know how to optimally allocate their cognitive resources when making judgments or decisions in their domain. Their research with expert fire-fighting commanders cast doubt on this assumption.
The key manipulations in the Omodei simulated fire-fighting experiments determined the extent to which commanders had unrestricted access to “complete” information about the fires, weather conditions, and other environmental matters. They found that commanders performed more poorly when information access was unrestricted than when they had to request information from subordinates. They also found that commanders performed more poorly when they believed all available information was reliable than when they believed that some of it was unreliable. The disquieting implication of these findings is that domain expertise doesn’t include meta-cognitive expertise.
Cognitive biases and styles aside, another contributing set of factors may be the characteristics of the complex, deep domains themselves that render deep expertise very difficult to attain. Here is a list of tests you can apply to such domains by way of evaluating their potential for the development of genuine expertise:
- Stationarity? Is the domain stable enough for generalizable methods to be derived? In chaotic systems long-range prediction is impossible because of initial-condition sensitivity. In human history, politics and culture, the underlying processes may not be stationary at all.
- Rarity? When it comes to prediction, rare phenomena simply are difficult to predict (see my post on making the wrong decisions most of the time for the right reasons).
- Observability? Can the outcomes of predictions or decisions be directly or immediately observed? For example in psychology, direct observation of mental states is nearly impossible, and in climatology the consequences of human interventions will take a very long time to unfold.
- Objective or even impartial criteria? For instance, what is “good,” “beautiful,” or even “acceptable” in domains such as music, dance or the visual arts? Are such domains irreducibly subjective and culture-bound?
- Testability? Are there clear criteria for when an expert has succeeded or failed? Or is there too much “wiggle-room” to be able to tell?
Finally, here are a few tests that can be used to evaluate the “experts” in your life:
- Credentials: Does the expert possess credentials that have involved testable criteria for demonstrating proficiency?
- Walking the walk: Is the expert an active practitioner in their domain (versus being a critic or a commentator)?
- Overconfidence: Ask your expert to make yes-no predictions in their domain of expertise, and before any of these predictions can be tested ask them to estimate the percentage of time they’re going to be correct. Compare that estimate with the resulting percentage correct. If their estimate was too high then your expert may suffer from over-confidence.
- Confirmation bias: We’re all prone to this, but some more so than others. Is your expert reasonably open to evidence or viewpoints contrary to their own views?
- Hedgehog-Fox test: Tetlock found that Foxes were better-calibrated and more able to entertain self-disconfirming counterfactuals than hedgehogs, but allowed that hedgehogs can occasionally be “stunningly right” in a way that foxes cannot. Is your expert a fox or a hedgehog?
- Willingness to own up to error: Bad luck is a far more popular explanation for being wrong than good luck is for being right. Is your expert balanced, i.e., equally critical, when assessing their own successes and failures?
In my previous post I attempted to provide an overview of the IPCC 2007 report’s approach to communicating about uncertainties regarding climate change and its impacts. This time I want to focus on how the report dealt with probabilistic uncertainty. It is this kind of uncertainty that the report treats most systematically. I mentioned in my previous post that Budescu et al.’s (2009) empirical investigation of how laypeople interpret verbal probability expressions (PEs, e.g., “very likely”) in the IPCC report revealed several problematic aspects, and a paper I have co-authored with Budescu’s team (Smithson, et al., 2011) yielded additional insights.
The approach adopted by the IPCC is one that has been used in other contexts, namely identifying probability intervals with verbal PEs. Their guidelines are as follows:
Virtually certain >99%; extremely likely >95%; very likely >90%; likely >66%; more likely than not > 50%; about as likely as not 33% to 66%; unlikely <33%; very unlikely <10%; extremely unlikely <5%; exceptionally unlikely <1%.
One unusual aspect of these guidelines is their overlapping intervals. For instance, “likely” takes the interval [.66,1] and thus contains the interval [.90,1] for “very likely,” and so on. The only interval that doesn’t overlap with others is “as likely as not.” Other interval-to-PE guidelines I am aware of use non-overlapping intervals. An early example is Sherman Kent’s attempt to standardize the meanings of verbal PEs in the American intelligence community.
Attempts to translate verbal PEs into numbers have a long and checkered history. Since the earliest days of probability theory, the legal profession has steadfastly refused to quantify its burdens of proof (“balance of probabilities” or “reasonable doubt”) despite the fact that they seem to explicitly refer to probabilities or at least degrees of belief. Weather forecasters debated the pros and cons of verbal versus numerical PEs for decades, with mixed results. A National Weather Service report on a 1997 survey of Juneau, Alaska residents found that although the rank-ordering of the mean numerical probabilities residents assigned to verbal PE’s reasonably agreed with those assumed by the organization, the residents’ probabilities tended to be less extreme than the organization’s assignments. For instance, “likely” had a mean of 62.5% whereas the organization’s assignments for this PE were 80-100%.
And thus we see a problem arising that has been long noted about individual differences in the interpretation of PEs but largely ignored when it comes to organizations. Since at least the 1960’s empirical studies have demonstrated that people vary widely in the numerical probabilities they associate with a verbal PE such as “likely.” It was this difficulty that doomed Sherman Kent’s attempt at standardization for intelligence analysts. Well, here we have the NWS associating it with 80-100% whereas the IPCC assigns it 66-100%. A failure of organizations and agencies to agree on number-to-PE translations leaves the public with an impossible brief. I’m reminded of the introduction of the now widely-used cyclone (hurricane) category 1-5 scheme (higher numerals meaning more dangerous storms) at a time when zoning for cyclone danger where I was living also had a 1-5 numbering system that went in the opposite direction (higher numerals indicating safer zones).
Another interesting aspect is the frequency of the PEs in the report itself. There are a total of 63 PEs therein. “Likely” occurs 36 times (more than half), and “very likely” 17 times. The remaining 10 occurrences are “very unlikely” (5 times), “virtually certain” (twice), “more likely than not” (twice), and “extremely unlikely” (once). There is a clear bias towards fairly extreme positively-worded PEs, perhaps because much of the IPCC report’s content is oriented towards presenting what is known and largely agreed on about climate change by climate scientists. As we shall see, the bias towards positively-worded PEs (e.g., “likely” rather than “unlikely”) may have served the IPCC well, whether intentionally or not.
In Budescu et al.’s experiment, subjects were assigned to one of four conditions. Subjects in the control group were not given any guidelines for interpreting the PEs, as would be the case for readers unaware of the report’s guidelines. Subjects in a “translation” condition had access to the guidelines given by the IPCC, at any time during the experiment. Finally, subjects in two “verbal-numerical translation” conditions saw a range of numerical values next to each PE in each sentence. One verbal-numerical group was shown the IPCC intervals and the other was shown narrower intervals (with widths of 10% and 5%).
Subjects were asked to provide lower, upper and “best” estimates of the probabilities they associated with each PE. As might be expected, these figures were most likely to be consistent with the IPCC guidelines in the verbal- numerical translation conditions, less likely in the translation condition, and least likely in the control condition. They were also less likely to be IPCC-consistent the more extreme the PE was (e.g., less consistent foro “very likely” than for “likely”). Consistency rates were generally low, and for the extremal PEs the deviations from the IPCC guidelines were regressive (i.e., subjects’ estimates were not extreme enough, thereby echoing the 1997 National Weather Service report findings).
One of the ironic claims by the Budescu group is that the IPCC 2007 report’s verbal probability expressions may convey excessive levels of imprecision and that some probabilities may be interpreted as less extreme than intended by the report authors. As I remarked in my earlier post, intervals do not distinguish between consensual imprecision and sharp disagreement. In the IPCC framework, the statement “The probability of event X is between .1 and .9 could mean “All experts regard this probability as being anywhere between .1 and .9” or “Some experts regard the probability as .1 and others as .9.” Budescu et al. realize this, but they also have this to say:
“However, we suspect that the variability in the interpretation of the forecasts exceeds the level of disagreement among the authors in many cases. Consider, for example, the statement that ‘‘average Northern Hemisphere temperatures during the second half of the 20th century were very likely higher than during any other 50-year period in the last 500 years’’ (IPCC, 2007, p. 8). It is hard to believe that the authors had in mind probabilities lower than 70%, yet this is how 25% of our subjects interpreted the term very likely!” (pg. 8).
One thing I’d noticed about the Budescu article was that their graphs suggested the variability in subjects’ estimates for negatively-worded PEs (e.g., “unlikely”) seemed greater than for positively worded PEs (e.g., “likely”). That is, subjects seemed to have less of a consensus about the meaning of the negatively-worded PEs. On reanalyzing their data, I focused on the six sentences that used the PE “very likely” or “very unlikely”. My statistical analyses of subjects’ lower, “best” and upper probability estimates revealed a less regressive mean and less dispersion for positive than for negative wording in all three estimates. Negative wording therefore resulted in more regressive estimates and less consensus regardless of experimental condition. You can see this in the box-plots below.
In this graph, the negative PEs’ estimates have been reverse-scored so that we can compare them directly with the positive PEs’ estimates. The “boxes” (the blue rectangles) contain the middle 50% of subjects’ estimates and these boxes are consistently longer for the negative PEs, regardless of experimental condition. The medians (i.e., the score below which 50% of the estimates fall) are the black dots, and these are fairly similar for positive and (reverse-scored) negative PEs. However, due to the negative PE boxes’ greater lengths, the mean estimates for the negative PEs end up being pulled further away from their positive PE counterparts.
There’s another effect that we confirmed statistically but also is clear from the box-plots. The difference between the lower and upper estimates is, on average, greater for the negatively-worded PEs. One implication of this finding is that the impact of negative wording is greatest on the lower estimates—And these are the subjects’ translations of the very thresholds specified in the IPCC guidelines.
If anything, these results suggest the picture is worse even than Budescu et al.’s assessment. They noted that 25% of the subjects interpreted “very likely” as having a “best” probability below 70%. The boxplots show that in three of the four experimental conditions at least 25% of the subjects provided a lower probability of less than 50% for “very likely”. If we turn to “very unlikely” the picture is worse still. In three of the four experimental conditions about 25% of the subjects returned an upper probability for “very unlikely” greater than 80%!
So, it seems that negatively-worded PEs are best avoided where possible. This recommendation sounds simple, but it could open a can of syntactical worms. Consider the statement “It is very unlikely that the MOC will undergo a large abrupt transition during the 21st century.” Would it be accurate to equate it with “It is very likely that the MOC will not undergo a large abrupt transition during the 21st century?” Perhaps not, despite the IPCC guidelines’ insistence otherwise. Moreover, turning the PE positive entails turning the event into a negative. In principle, we could have a mixture of negatively- and positively-worded PE’s and events (“It is (un)likely that A will (not) occur”). It is unclear at this point whether negative PEs or negative events are the more confusing, but inspection of the Budescu et al. data suggested that double-negatives were decidedly more confusing than any other combination.
As I write this, David Budescu is spearheading a multi-national study of laypeople’s interpretations of the IPCC probability expressions (I’ll be coordinating the Australian component). We’ll be able to compare these interpretations across languages and cultures. More anon!
Budescu, D.V., Broomell, S. and Por, H.-H. (2009) Improving the communication of uncertainty in the reports of the Intergovernmental panel on climate change. Psychological Science, 20, 299–308.
Intergovernmental Panel on Climate Change (2007). Summary for policymakers: Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Retrieved May 2010 from http://www.ipcc.ch/pdf/assessment-report/ar4/wg1/ar4-wg1-spm.pdf.
Smithson, M., Budescu, D.V., Broomell, S. and Por, H.-H. (2011) Never Say “Not:” Impact of Negative Wording in Probability Phrases on Imprecise Probability Judgments. Accepted for presentation at the Seventh International Symposium on Imprecise Probability: Theories and Applications, Innsbruck, Austria, 25-28 July 2011.
Any blog whose theme is ignorance and uncertainty should get around to discussing delusions sooner or later. I am to give a lecture on the topic to third-year neuropsych students this week, so a post about it naturally follows. Delusions are said to be a concomitant and indeed a product of other cognitive or other psychological pathologies, and traditionally research on delusions was conducted in clinical psychology and psychiatry. Recently, though, some others have got in on the act: Neuroscientists and philosophers.
The connection with neuroscience probably is obvious. Some kinds of delusion, as we’ll see, beg for a neurological explanation. But why have philosophers taken an interest? To get ourselves in the appropriately philosophical mood let’s begin by asking, what is a delusion?
Here’s the Diagnostic and Statistical Manual definition (2000):
“A false belief based on incorrect inference about external reality that is firmly sustained despite what almost everyone else believes and despite what constitutes incontrovertible and obvious proof or evidence to the contrary.”
But how does that differ from:
- A mere error in reasoning?
- Confirmation bias?
- Self-enhancement bias?
There’s a plethora of empirical research verifying that most of us, most of the time, are poor logicians and even worse statisticians. Likewise, there’s a substantial body of research documenting our tendency to pay more attention to and seek out information that confirms what we already believe, and to ignore or avoid contrary information. And then there’s the Lake Wobegon effect—The one where a large majority of us think we’re a better driver than average, less racially biased than average, more intelligent than average, and so on. But somehow none of these cognitive peccadilloes seem to be “delusions” on the same level as believing that you’re Napoleon or that Barack Obama is secretly in love with you.
Delusions are more than violations of reasoning (in fact, they may involve no pathology in reasoning at all). Nor are they merely cases of biased perception or wishful thinking. There seems to be more to a psychotic delusion than any of these characteristics; otherwise all of us are deluded most of the time and the concept loses its clinical cutting-edge.
One approach to defining them is to say that they entail a failure to comply with “procedural norms” for belief formation, particularly those involving the weighing and assessment of evidence. Procedural norms aren’t the same as epistemic norms (for instance, most of us are not Humean skeptics, nor do we update our beliefs using Bayes’ Theorem or think in terms of subjective expected utility calculations—But that doesn’t mean we’re deluded). So the appeal to procedural norms excuses “normal” reasoning errors, confirmation and self-enhancement biases. Instead, these are more like widely held social norms. The DSM definition has a decidedly social constructionist aspect to it. A belief is delusional if everyone else disbelieves it and everyone else believes the evidence against it is incontrovertible.
So, definitional difficulties remain (especially regarding religious beliefs or superstitions). In fact, there’s a website here making an attempt to “crowd-source” definitions of delusions. The nub of the problem is that it is hard to define a concept such as delusion without sliding from descriptions of what “normal” people believe and how they form beliefs into prescriptions for what people should believe or how they should form beliefs. Once we start down the prescriptive track, we encounter the awkward fact that we don’t have an uncontestable account of what people ought to believe or how they should arrive at their beliefs.
One element common to many definitions of delusion is the lack of insight on the part of the deluded. They’re meta-ignorant: They don’t know that they’re mistaken. But this notion poses some difficult problems for the potential victim of a delusion. In what senses can a person rationally believe they are (or have been) deluded? Straightaway we can knock out the following: “My current belief in X is false.” If I know believing X is wrong, then clearly I don’t believe X. Similarly, I can’t validly claim that all my current beliefs are false, or that the way I form beliefs always produces false beliefs.
Here are some defensible examples of self-insight that incorporates delusions:
- I believe I have held false beliefs in the past.
- I believe I may hold false beliefs in the future.
- I believe that some of my current beliefs may be false (but I don’t know which ones).
- I believe that the way I form any belief is unreliable (but I don’t know when it fails).
As you can see, self-insight regarding delusions is like self-insight into your own meta-ignorance (the stuff you don’t know you don’t know). You can spot it in your past and hypothesize it for your future, but you won’t be able to self-diagnose it in the here-and-now.
On the other hand, meta-ignorance and delusional thinking are easy to spot in others. For observers, it may seem obvious that someone is deluded generally in the sense that the way they form beliefs is unreliable. Usually generalized delusional thinking is a component in some type of psychosis or severe brain trauma.
But what’s really difficult to explain is monothematic delusions. These are what they sound like, namely specific delusional beliefs that have a single theme. The explanatory problem arises because the monothematically deluded person may otherwise seem cognitively competent. They can function in the everyday world, they can reason, their memories are accurate, and they form beliefs we can agree with except on one topic.
Could some monothematic delusions have a different basis from others?
Some theorists have distinguished Telic (goal-directed) from Thetic (truth-directed) delusions. Telic delusions (functional in the sense that they satisfy a goal) might be explained by a motivational basis. A combination of motivation and affective consequences (e.g., believing Q is distressing, therefore better to believe not-Q) could be a basis for delusional belief. An example is the de Clerambault syndrome, the belief that someone of high social status is secretly in love with oneself.
Thetic delusions are somewhat more puzzling, but also quite interesting. Maher (1974, etc.) said long ago that delusions arise from normal responses to anomalous experiences. Take Capgras syndrome, the belief that one’s nearest & dearest have been replaced by lookalike impostors. A recent theory about Capgras begins with the idea that if face recognition depends on a specific cognitive module, then it is possible for that to be damaged without affecting other cognitive abilities. A two-route model of face recognition holds that there are two sub-modules:
- A ventral visuo-semantic pathway for visual encoding and overt recognition, and
- A dorsal visuo-affective pathway for covert autonomic recognition and affective response to familiar faces.
For prosopagnosia sufferers the ventral system has been damaged, whereas for Capgras sufferers the dorsal system has been damaged. So here seems to be the basis for the “anomalous” experience that gives rise to Capgras syndrome. But not everyone whose dorsal system is damaged ends up with Capgras syndrome. What else could be going on?
Maher’s claim amounts to a one-factor theory about thetic delusions. The unusual experience (e.g., no longer feeling emotions when you see your nearest and dearest) becomes explained by the delusion (e.g., they’ve been replaced by impostors). A two-factor theory claims that reasoning also has to be defective (e.g., a tendency to leap to conclusions) or some motivational bias has to operate. Capgras or Cotard syndrome (the latter is a belief that one is dead) sounds like a reasoning pathology is involved, whereas de Clerambault syndrome or reverse Othello syndrome (deluded belief in the fidelity of one’s spouse) sounds like it’s propelled by a motivational bias.
What is the nature of the “second factor” in the Capgras delusion?
- Capgras patients are aware that their belief seems bizarre to others, but they are not persuaded by counter-arguments or evidence to the contrary.
- Davies et al. (2001) propose that, specifically, Capgras patients have lost the ability to refrain from believing that things are the way they appear to be. However, Capgras patients are not susceptible to visual illusions.
- McLaughlin (2009) posits that Capgras patients are susceptible to affective illusions, in the sense that a feeling of unfamiliarity leads straight to a belief in that unfamiliarity. But even if true, this account still doesn’t explain the persistence of that belief in the face of massive counter-evidence.
What about the patients who have a disconnection between their face recognition modules and their autonomic nervous systems but do not have Capgras? Turns out that the site of their damage differs from that of Capgras sufferers. But little is known about the differences between them in terms of phenomenology (e.g., whether loved ones also feel unfamiliar to the non-Capgras patients).
Where does all this leave us? To being with, we are reminded that a label (“delusion”) doesn’t bring with it a unitary phenomenon. There may be distinct types of delusions with quite distinct etiologies. The human sciences are especially vulnerable to this pitfall, because humans have fairly effective commonsensical theories about human beings—folk psychology and folk sociology—from which the human sciences borrow heavily. We’re far less likely to be (mis)guided by common sense when theorizing about things like mitochondria or mesons.
Second, there is a clear need for continued cross-disciplinary collaboration in studying delusions, particularly between cognitive and personality psychologists, neuroscientists, and philosophers of mind. “Delusion” and “self-deception” pose definitional and conceptual difficulties that rival anything in the psychological lexicon. The identification of specific neural structures implicated in particular delusions is crucial to understanding and treating them. The interaction between particular kinds of neurological trauma and other psychological traits or dispositions appears to be a key but is at present only poorly understood.
Last, but not least, this gives research on belief formation and reasoning a cutting edge, placing it at the clinical neuroscientific frontier. There may be something to the old commonsense notion that logic and madness are closely related. By the way, for an accessible and entertaining treatment of this theme in the history of mathematics, take a look at LogiComix.
Books such as Nicholas Taleb’s Fooled by Randomness and the psychological literature on our mental foibles such as gambler’s fallacy warn us to beware randomness. Well and good, but randomness actually is one of the most domesticated kinds of uncertainty. In fact, it is one form of uncertainty we can and do exploit.
One obvious way randomness can be exploited is in designing scientific experiments. To experimentally compare, say, two different fertilizers for use in growing broad beans, an ideal would be to somehow ensure that the bean seedlings exposed to one fertilizer were identical in all ways to the bean seedlings exposed to the other fertilizer. That isn’t possible in any practical sense. Instead, we can randomly assign each seedling to receive one or the other fertilizer. We won’t end up with two identical groups of seedlings, but the differences between those groups will have occurred by chance. If their subsequent growth-rates differ by more than we would reasonably expect by chance alone, then we can infer that one fertilizer is likely to have been more effective than the other.
Another commonplace exploitation of randomness is random sampling, which is used in all sorts of applications from quality-control engineering to marketing surveys. By randomly sampling a specific percentage of manufactured components coming off the production line, a quality-control analyst can decide whether a batch should be scrapped or not. By randomly sampling from a population of consumers, a marketing researcher can estimate the percentage of that population who prefer a particular brand of a consumer item, and also calculate how likely that estimate is to be within 1% of the true percentage at the time.
There is a less well-known use for randomness, one that in some respects is quite counter-intuitive. We can exploit randomness to improve our chances of making the right decision. The story begins with Tom Cover’s 1987 chapter which presents what Dov Samet and his co-authors recognized in their 2002 paper as a solution to a switching decision that has been at the root of various puzzles and paradoxes.
Probably the most famous of these is the “two envelope” problem. You’re a contestant in a game show, and the host offers you a choice between two envelopes, each containing a cheque of a specific value. The host explains that one of the cheques is for a greater amount than the other, and offers you the opportunity to toss a fair coin to select one envelope to open. After that, she says, you may choose either to retain the envelope you’ve selected or exchange it for the other. You toss the coin, open the selected envelope, and see the value of the cheque therein. Of course, you don’t know the value of the other cheque, so regardless of which way you choose, you have a probability of ½ of ending up with the larger cheque. There’s an appealing but fallacious argument that says you should switch, but we’re not going to go into that here.
Cover presents a remarkable decisional algorithm whereby you can make that probability exceed ½.
- Having chosen your envelope via the coin-toss, use a random number generator to provide you with a number anywhere between zero and some value you know to be greater than the largest cheque’s value.
- If this number is larger than the value of the cheque you’ve seen, exchange envelopes.
- If not, keep the envelope you’ve been given.
Here’s a “reasonable person’s proof” that this works (for more rigorous and general proofs, see Robert Snapp’s 2005 treatment or Samet et al., 2002). I’ll take the role of the game-show contestant and you can be the host. Suppose $X1 and $X2 are the amounts in the two envelopes. You have provided the envelopes and so you know that X1, say, is larger than X2. You’ve also told me that these amounts are less than $100 (the specific range doesn’t matter). You toss a fair coin, and if it lands Heads you give me the envelope containing X1 whereas if it lands Tails you give me the one containing X2. I open my envelope and see the amount there. Let’s call my amount Y. All I know at this point is that the probability that Y = X1 is ½ and so is the probability that Y = X2.
I now use a random number generator to produce a number between 0 and 100. Let’s call this number Z. Cover’s algorithm says I should switch envelopes if Z is larger than Y and I should retain my envelope if Z is less than or equal to Y. The claim is that my chance of ending up with the envelope containing X1 is greater than ½.
As the picture below illustrates, the probability that my randomly generated Z has landed at X2 or below is X2/100, and the probability that Z has landed at X1 or below is X1/100. Likewise, the probability that Z has exceeded X2 is 1 – X2/100, and the probability that Z has exceeded X1 is 1 – X1/100.
The proof now needs four steps to complete it:
- If Y = X1 then I’ll make the right decision if I decide to keep my envelope, i.e., if Y is less than or equal to X1, and my probability of doing so is X1/100.
- If Y = X2 then I’ll make the right decision if I decide to exchange my envelope, i.e., if Y is greater than X2, and my probability of doing so is 1 – X2/100.
- The probability that Y = X1 is ½ and the probability that Y = X2 also is ½. So my total probability of ending up with the envelope containing X1 is
½ of X1/100, which is X1/200, plus ½ of 1 – X2/100, which is ½ – X2/200.
That works out to ½ + X1/200 – X2/200.
- But X1 is larger than X2, so X1/200 – X2/200 must be larger than 0.
Therefore, ½ + X1/200 – X2/200 is larger than ½.
Fine, you might say, but could this party trick ever help us in a real-world decision? Yes, it could. Suppose you’re the director of a medical clinic with a tight budget in a desperate race against time to mount a campaign against a disease outbreak in your region. You have two treatments available to you but the research literature doesn’t tell you which one is better than the other. You have time and resources to test only one of those treatments before deciding which one to adopt for your campaign.
Toss a fair coin, letting it decide which treatment you test. The resulting cure-rate from the chosen treatment will be some number, Y, between 0% and 100%. The structure of your decisional situation now is identical to the two-envelope setup described above. Use a random number generator to generate a number, Z, between 0 and 100. If Z is less than or equal to Y use your chosen treatment for your campaign. If Z is greater than Y use the other treatment instead. You chance of having chosen the treatment that would have yielded the higher cure-rate under your test conditions will be larger than ½ and you’ll be able to defend your decision if you’re held accountable to any constituency or stakeholders.
In fact, there are ways whereby you may be able to do even better than this in a real-world situation. One is by shortening the range, if you know that the cure-rate is not going to exceed some limit, say L, below 100%. The reason this would help is because X1/2L – X2/2L will be greater than X1/200 – X2/200. The highest it can be is 1 – X2/X1. Another way, as Snapp (2005) points out, is by knowing the probability distribution generating X1 and X2. Knowing that distribution boosts your probability of being correct to ¾.
However, before we rush off to use Cover’s algorithm for all kinds of decisions, let’s consider its limitations. Returning to the disease outbreak scenario, suppose you have good reasons to suspect that one treatment (Ta, say) is better than the other (Tb). You could just go with Ta and defend your decision by pointing out that, according to your evidence the probability that Ta actually is better than Tb is greater than ½. Let’s denote this probability by P.
A reasonable question is whether you could do better than P by using Cover’s algorithm. Here’s my claim:
- If you test Ta or Tb and use the Cover algorithm to decide whether to use it for your campaign or switch to the other treatment, your probability of having chosen the treatment that would have given you the best test-result cure rate will converge to the Cover algorithm’s probability of a correct choice. This may or may not be greater than P (remember, P is greater than ½).
This time, let X1 denote the higher cure rate and X2 denote the lower cure-rate you would have got, depending on whether the treatment you tested was the better or the worse.
- If the cure rate for Ta is X1 then you’ll make the right decision if you decide to use Ta, i.e., if Y is less than or equal to X1, and your probability of doing so is X1/100.
- If the cure rate for Ta is X2 then you’ll make the right decision if you decide to use Tb, i.e., if Y is greater than X2, and your probability of doing so is 1 – X2/100.
- We began by supposing the probability that the cure rate for Ta is X1 is P, which is greater than ½. The probability that the cure rate for Ta is X2 is 1 – P, which is less than ½. So your total probability of ending up with the treatment whose cure rate is X1 is
P*X1/100 + (1 – P)*(1 – X2/100).
The question we want to address is when this probability is greater than P, i.e.,
P*X1/100 + (1 – P)*(1 – X2/100) > P.
It turns out that a rearrangement of this inequality gives us a clue.
- First, we subtract P*X1/100 from both sides to get
(1 – P)*( 1 – X2/100) > P – P*X1/100.
- Now, we divide both sides of this inequality by 1 – P to get
( 1 – X2/100)/P > P*(1 – X1/100)/(1 – P),
and then divide both sides by ( 1 – X1/100) to get
(1 – X2/100)/( 1 – X1/100) > P/(1 – P).
We can now see that the values of X2 and X1 have to make the odds of the Cover algorithm larger than the odds resulting from P. If P = .6, say, then P/(1 – P) = .6/.4 = 1.5. Thus, for example, if X2 = 40% and X1 = 70% then (1 – X2/100)/( 1 – X1/100) = .6/.3 = 2.0 and the Cover algorithm will improve your chances of making the right choice. However, if X2 = 40% and X1 = 60% then the algorithm offers no improvement on P and if we increase X2 above 40% the algorithm will return a lower probability than P. So, if you already have strong evidence that one alternative is better than the other then don’t bother using the Cover algorithm.
Nevertheless, by exploiting randomness we’ve ended up with a decisional guide that can apply to real-world situations. Faced with being able to test only one of two alternatives, if you are undecided about which one is superior but can only test one alternative, test one of them and use Cover’s algorithm to decide which to adopt. You’ll end up with a higher probability of making the right decision than tossing a coin.