Back in late May 2011, there were news stories of charges of manslaughter laid against six earthquake experts and a government advisor responsible for evaluating the threat of natural disasters in Italy, on grounds that they allegedly failed to give sufficient warning about the devastating L’Aquila earthquake in 2009. In addition, plaintiffs in a separate civil case are seeking damages in the order of €22.5 million (US$31.6 million). The first hearing of the criminal trial occurred on Tuesday the 20th of September, and the second session is scheduled for October 1st.
According to Judge Giuseppe Romano Gargarella, the defendants gave inexact, incomplete and contradictory information about whether smaller tremors in L’Aquila six months before the 6.3 magnitude quake on 6 April, which killed 308 people, were to be considered warning signs of the quake that eventuated. L’Aquila was largely flattened, and thousands of survivors lived in tent camps or temporary housing for months.
If convicted, the defendants face up to 15 years in jail and almost certainly will suffer career-ending consequences. While manslaughter charges for natural disasters have precedents in Italy, they have previously concerned breaches of building codes in quake-prone areas. Interestingly, no action has yet been taken against the engineers who designed the buildings that collapsed, or government officials responsible for enforcing building code compliance. However, there have been indications of lax building codes and the possibility of local corruption.
The trial has, naturally, outraged scientists and others sympathetic to the plight of the earthquake experts. An open letter by the Istituto Nazionale di Geofisica e Vulcanologia (National Institute of Geophysics and Volcanology) said the allegations were unfounded and amounted to “prosecuting scientists for failing to do something they cannot do yet — predict earthquakes”. The AAAS has presented a similar letter, which can be read here.
In pre-trial statements, the defence lawyers also have argued that it was impossible to predict earthquakes. “As we all know, quakes aren’t predictable,” said Marcello Melandri, defence lawyer for defendant Enzo Boschi, who was president of Italy’s National Institute of Geophysics and Volcanology). The implication is that because quakes cannot be predicted, the accusations that the commission’s scientists and civil protection experts should have warned that a major quake was imminent are baseless.
Unfortunately, the Istituto Nazionale di Geofisica e Vulcanologia, the AAAS, and the defence lawyers were missing the point. It seems that failure to predict quakes is not the substance of the accusations. Instead, it is poor communication of the risks, inappropriate reassurance of the local population and inadequate hazard assessment. Contrary to earlier reports, the prosecution apparently is not claiming the earthquake should have been predicted. Instead, their focus is on the nature of the risk messages and advice issued by the experts to the public.
Examples raised by the prosecution include a memo issued after a commission meeting on 31 March 2009 stating that a major quake was “improbable,” a statement to local media that six months of low-magnitude tremors was not unusual in the highly seismic region and did not mean a major quake would follow, and an apparent discounting of the notion that the public should be worried. Against this, defence lawyer Melandri has been reported saying that the panel “never said, ‘stay calm, there is no risk'”.
It is at this point that the issues become both complex (by their nature) and complicated (by people). Several commentators have pointed out that the scientists are distinguished experts, by way of asserting that they are unlikely to have erred in their judgement of the risks. But they are being accused of “incomplete, imprecise, and contradictory information” communication to the public. As one of the civil parties to the lawsuit put it, “Either they didn’t know certain things, which is a problem, or they didn’t know how to communicate what they did know, which is also a problem.”
So, the experts’ scientific expertise is not on trial. Instead, it is their expertise in risk communication. As Stephen S. Hall’s excellent essay in Nature points out, regardless of the outcome this trial is likely to make many scientists more reluctant to engage with the public or the media about risk assessments of all kinds. The AAAS letter makes this point too. And regardless of which country you live in, it is unwise to think “Well, that’s Italy for you. It can’t happen here.” It most certainly can and probably will.
Matters are further complicated by the abnormal nature of the commission meeting on the 31st of March at a local government office in L’Aquila. Boschi claims that these proceedings normally are closed whereas this meeting was open to government officials, and he and the other scientists at the meeting did not realize that the officials’ agenda was to calm the public. The commission did not issue its usual formal statement, and the minutes of the meeting were not completed, until after the earthquake had occurred. Instead, two members of the commission, Franco Barberi and Bernardo De Bernardinis, along with the mayor and an official from Abruzzo’s civil-protection department, held a now (in)famous press conference after the meeting where they issued reassuring messages.
De Bernardinis, an expert on floods but not earthquakes, incorrectly stated that the numerous earthquakes of the swarm would lessen the risk of a larger earthquake by releasing stress. He also agreed with a journalist’s suggestion that residents enjoy a glass of wine instead of worrying about an impending quake.
The prosecution also is arguing that the commission should have reminded residents in L’Aquila of the fragility of many older buildings, advised them to make preparations for a quake, and reminded them of what to do in the event of a quake. This amounts to an accusation of a failure to perform a duty of care, a duty that many scientists providing risk assessments may dispute that they bear.
After all, telling the public what they should or should not do is a civil or governmental matter, not a scientific one. Thomas Jordan’s essay in New Scientist brings in this verdict: “I can see no merit in prosecuting public servants who were trying in good faith to protect the public under chaotic circumstances. With hindsight their failure to highlight the hazard may be regrettable, but the inactions of a stressed risk-advisory system can hardly be construed as criminal acts on the part of individual scientists.” As Jordan points out, there is a need to separate the role of science advisors from that of civil decision-makers who must weigh the benefits of protective actions against the costs of false alarms. This would seem to be a key issue that urgently needs to be worked through, given the need for scientific input into decisions about extreme hazards and events, both natural and human-caused.
Scientists generally are not trained in communication or in dealing with the media, and communication about risks is an especially tricky undertaking. I would venture to say that the prosecution, defence, judge, and journalists reporting on the trial will not be experts in risk communication either. The problems in risk communication are well known to psychologists and social scientists specializing in its study, but not to non-specialists. Moreover, these specialists will tell you that solutions to those problems are hard to come by.
For example, Otway and Wynne (1989) observed in a classic paper that an “effective” risk message has to simultaneously reassure by saying the risk is tolerable and panic will not help, and warn by stating what actions need to be taken should an emergency arise. They coined the term “reassurance arousal paradox” to describe this tradeoff (which of course is not a paradox, but a tradeoff). The appropriate balance is difficult to achieve, and is made even more so by the fact that not everyone responds in the same way to the same risk message.
It is also well known that laypeople do not think of risks in the same way as risk experts (for instance, laypeople tend to see “hazard” and “risk” as synonyms), nor do they rate risk severity in line with the product of probability and magnitude of consequence, nor do they understand probability—especially low probabilities. Given all of this, it will be interesting to see how the prosecution attempts to establish that the commission’s risk communications contained “incomplete, imprecise, and contradictory information.”
Scientists who try to communicate risks are aware of some of these issues, but usually (and understandably) uninformed about the psychology of risk perception (see, for instance, my posts here and here on communicating uncertainty about climate science). I’ll close with just one example. A recent International Commission on Earthquake Forecasting (ICEF) report argues that frequently updated hazard probabilities are the best way to communicate risk information to the public. Jordan, chair of the ICEF, recommends that “Seismic weather reports, if you will, should be put out on a daily basis.” Laudable as this prescription is, there are at least three problems with it.
Weather reports with probabilities of rain typically present probabilities neither close to 0 nor to 1. Moreover, they usually are anchored on tenths (e.g., .2, or .6 but not precise numbers like .23162 or .62947). Most people have reasonable intuitions about mid-range probabilities such as .2 or .6. But earthquake forecasting has very low probabilities, as was the case in the lead-up to the L’Aquila event. Italian seismologists had estimated the probability of a large earthquake in the next three days had increased from 1 in 200,000, before the earthquake swarm began, to 1 in 1,000 following the two large tremors the day before the quake.
The first problem arises from the small magnitude of these probabilities. Because people are limited in their ability to comprehend and evaluate extreme probabilities, highly unlikely events usually are either ignored or overweighted. The tendency to ignore low-probability events has been cited to account for the well-established phenomenon that homeowners tend to be under-insured against low probability hazards (e.g., earthquake, flood and hurricane damage) in areas prone to those hazards. On the other hand, the tendency to over-weight low-probability events has been used to explain the same people’s propensity to purchase lottery tickets. The point is that low-probability events either excite people out of proportion to their likelihood or fail to excite them altogether.
The second problem is in understanding the increase in risk from 1 in 200,000 to 1 in 1,000. Most people are readily able to comprehend the differences between mid-range probabilities such as an increase in the chance of rain from .2 to .6. However, they may not appreciate the magnitude of the difference between the two low probabilities in our example. For instance, an experimental study with jurors in mock trials found that although DNA evidence is typically expressed in terms of probability (specifically, the probability that the DNA sample could have come from a randomly selected person in the population), jurors were equally likely to convict on the basis of a probability of 1 in 1,000 as a probability of 1 in 1 billion. At the very least, the public would need some training and accustoming to miniscule probabilities.
All this leads us to the third problem. Otway and Wynne’s “reassurance arousal paradox” is exacerbated by risk communications about extremely low-probability hazards, no matter how carefully they are crafted. Recipients of such messages will be highly suggestible, especially when the stakes are high. So, what should the threshold probability be for determining when a “don’t ignore this” message is issued? It can’t be the imbecilic Dick Cheney zero-risk threshold for terrorism threats, but what should it be instead?
Note that this is a matter for policy-makers to decide, not scientists, even though scientific input regarding potential consequences of false alarms and false reassurances should be taken into account. Criminal trials and civil lawsuits punishing the bearers of false reassurances will drive risk communicators to lower their own alarm thresholds, thereby ensuring that they will sound false alarms increasingly often (see my post about making the “wrong” decision most of the time for the “right” reasons).
Risk communication regarding low-probability, high-stakes hazards is one of the most difficult kinds of communication to perform effectively, and most of its problems remain unsolved. The L’Aquila trial probably will have an inhibitory impact on scientists’ willingness to front the media or the public. But it may also stimulate scientists and decision-makers to work together for the resolution of these problems.
It’s coming up to a year since I began this blog. In my usual fashion, I set myself the unrealistic goal of writing a post every week. This is only the 37th, so I’ve fallen short by a considerable margin. On the other hand, most of those posts have been on the order of 1500 words long, for a total of about 55,500 words thus far. That seems a fair whack of the keyboard, and it’s been fun too.
In an earlier post I proposed that because of the ways in which knowledge economies work, we increasingly live in an “ignorance society.” In the same year that Sheldon Ungar’s paper on ignorance as a public problem appeared, another paper came out by Joanne Roberts and John Armitage with the intriguing title “The Ignorance Economy.” Their stated purpose was to critique the notion of a knowledge economy via an investigation of ignorance from an economic standpoint.
As Roberts and Armitage (and many others) have said, knowledge as a commodity has several distinctive features. Once knowledge is consumed, it does not disappear and indeed its consumption may result in the development of more knowledge. The consumption of knowledge is non-zero-sum and can be non-excludable. Knowledge is a multiplier resource in this sense. Finally, knowledge is not subject to diminishing returns.
Interestingly, Roberts and Armitage do not say anything substantial about ignorance as a commodity. We already have some characterizations handy from this blog and elsewhere. Like knowledge, ignorance can be non-zero-sum and non-excludable in the sense that my being ignorant about X doesn’t prevent you from also being ignorant about X, nor does greater ignorance on my part necessarily decrease your ignorance. Ignorance also does not seem susceptible to diminishing returns. And of course, new knowledge can generate ignorance, and an important aspect of an effective knowledge-based economy is its capacity for identifying and clarifying unknowns. Even in a booming knowledge economy, ignorance can be a growth industry in its own right.
There are obvious examples of economies that could, in some sense, be called “ignorance economies.” Education and training are ignorance economies in the sense that educators and trainers make their living via a continual supply of ignoramuses who are willing to pay for the privilege of shedding that status. Likewise, governmental and corporate organizations paying knowledge experts enable those experts to make a living out of selling their expertise to those who lack it. This is simply the “underbelly” of knowledge economies, as Roberts and Armitage point out.
But what about making ignorance pay? Roberts and Armitage observe that agents in knowledge economies set about this in several ways. First, there is the age-old strategy of intellectual property protection via copyright, patents, or outright secrecy. Hastening the obsolescence of knowledge and/or skills is another strategy. Entire trades, crafts and even professions have been de-skilled or rendered obsolete. And how about that increasingly rapid deluge of updates and “upgrades” imposed on us?
A widespread observation about the knowledge explosion is that it generates an ensuing ignorance explosion, both arising from and resulting in increasing specialization. The more specialized a knowledge economy is, the greater are certain opportunities to exploit ignorance for economic gains. These opportunities arise in at least three forms. First, there are potential coordination and management roles for anyone (or anything) able to pull a large unstructured corpus of data into a usable structure or, better still, a “big picture.” Second, making sense of data has become a major industry in its own right, giving rise to ironically specialized domains of expertise such as statistics and information technology.
Third, Roberts and Armitage point to the long-established trend for consumer products to require less knowledge for their effective use. So consumers are enticed to become more ignorant about how these products work, how to repair or maintain them, and how they are produced. You don’t have to be a Marxist to share a cynical but wry guffaw with Roberts and Armitage as they confess, along with the rest of us, to writing their article using a computer whose workings they are happily ignorant about. One must admit that this is an elegant, if nihilistic solution to Sheldon Ungar’s problem that the so-called information age has made it difficult to agree on a human-sized common stock of knowledge that we all should share.
Oddly, Roberts and Armitage neglect two additional (also age-old) strategies for exploiting ignorance for commercial gain and/or political power. First, an agent can spread disinformation and, if successful, make money or power out of deception. Second, an agent can generate uncertainty in the minds of a target population, and leverage wealth and/or power out of that uncertainty. Both strategies have numerous exemplars throughout history, from legitimate commercial or governmental undertakings to terrorism and organized crime.
Roberts and Armitage also neglect the kinds of ignorance-based “social capital” that I have written about, both in this blog and elsewhere. Thus, for example, in many countries the creation and maintenance of privacy, secrecy and censorship engage economic agents of considerable size in both the private and public sectors. All three are, of course, ignorance arrangements. Likewise, trust-based relations have distinct economic advantages over relations based on assurance through contracts, and trust is partially an ignorance arrangement.
More prosaically, do people make their living by selling their ignorance? I once met a man who claimed he did so, primarily on a consulting basis. His sales-pitch boiled down to declaring “If you can make something clear to me, you can make it clear to anyone.” He was effectively making the role of a “beta-tester” pay off. Perhaps we may see the emergence of niche markets for specific kinds of ignoramuses.
But there already is, arguably, a sustainable market for generalist ignoramuses. Roberts and Armitage moralize about the neglect by national governments of “regional ignorance economies,” by which they mean subpopulations of workers lacking any qualifications whatsoever. Yet these are precisely the kinds of workers needed to perform jobs for which everyone else would be over-qualified and, knowledge economy or not, such jobs are likely to continue abounding for some time to come.
I’ve watched seven children on my Australian middle-class suburban cul-de-sac grow to adulthood over the past 14 years. Only one of them has gone to university. Why? Well, for example, one of them realized he could walk out of school after 10th grade, go to the mines, drive a big machine and immediately command a $90,000 annual salary. The others made similar choices, although not as high-paying as his but still favorable in short-term comparisons to their age-mates heading off to uni to rack up tens-of-thousands-of-dollars debts. The recipe for maintaining a ready supply of generalist ignoramuses is straightforward: Make education or training sufficiently unaffordable and difficult, and/or unqualified work sufficiently remunerative and easy. An anti-intellectual mainstream culture helps, too, by the way.
I started this post in Hong Kong airport, having just finished one conference and heading to Innsbruck for another. The Hong Kong meeting was on psychometrics and the Innsbruck conference was on imprecise probabilities (believe it or not, these topics actually do overlap). Anyhow, Annemarie Zand Scholten gave a neat paper at the math psych meeting in which she pointed out that, contrary to a strong intuition that most of us have, introducing and accounting for measurement error can actually sharpen up measurement. Briefly, the key idea is that an earlier “error-free” measurement model of, say, human comparisons between pairs of objects on some dimensional characteristic (e.g., length) could only enable researchers to recover the order of object length but not any quantitative information about how much longer people were perceiving one object to be than another.
I’ll paraphrase (and amend slightly) one of Annemarie’s illustrations of her thesis, to build intuition about how her argument works. In our perception lab, we present subjects with pairs of lines and ask them to tell us which line they think is the longer. One subject, Hawkeye Harriet, perfectly picks the longer of the two lines every time—regardless of how much longer one is than the other. Myopic Myra, on the other hand, has imperfect visual discrimination and thus sometimes gets it wrong. But she’s less likely to choose the wrong line if the two lines’ lengths considerably differ from one another. In short, Myra’s success-rate is positively correlated with the difference between the two line-lengths whereas Harriet’s uniformly 100% success rate clearly is not.
Is there a way that Myra’s success- and error-rates could tell us exactly how long each object is, relative to the others? Yes. Let pij be the probability that Myra picks the ith object as longer than the jth object, and pji = 1 – pij be the probability that Myra picks the jth object as longer than the ith object. If the ith object has length Li and the jth object has length Lj, then if pij/pji = Li/Lj, Myra’s choice-rates perfectly mimic the ratio of the ith and jth objects’ lengths. This neat relationship owes its nature to the fact that a characteristic such as length has an absolute zero, so we can meaningfully compare lengths by taking ratios.
How about temperature? This is slightly trickier, because if we’re using a popular scale such as Celsius or Fahrenheit then the zero-point of the scale isn’t absolute in the sense that length has an absolute zero (i.e., you can have Celsius and Fahrenheit readings below zero, and each scale’s zero-point differs from the other). Thus, 60 degrees Fahrenheit is not twice as warm as 30 degrees Fahrenheit. However, the differences between temperatures can be compared via ratios. For instance, 40 degrees F is twice as far from 20 degrees F as 10 degrees F is.
We just need a common “reference” object against which to compare each of the others. Suppose we’re asking Myra to choose which of a pair of objects is the warmer. Assuming that Myra’s choices are transitive, there will be an object she chooses less often than any of the others in all of the paired comparisons. Let’s refer to that object as the Jth object. Now suppose the ith object has temperature Ti,the jth object has temperature Tj, and the Jth object has temperature TJ which is lower than both Ti and Tj. Then if Myra’s choice-rate ratio is
piJ/pjJ = (Ti – TJ)/( Tj – TJ),
she functions as a perfect measuring instrument for temperature comparisons between the ith and jth objects. Again, Hawkeye Harriet’s choice-rates will be piJ = 1 and pjJ = 1 no matter what Ti and Tj are, so her ratio always is 1.
If we didn’t know what the ratios of those lengths or temperature differences were, Myra would be a much better measuring instrument than Harriet even though Harriet never makes mistakes. Are there such situations? Yes, especially when it comes to measuring mental or psychological characteristics for which we have no direct access, such as subjective sensation, mood, or mental task difficulty.
Which of 10 noxious stimuli is the more aversive? Which of 12 musical rhythms makes you feel more joyous? Which of 20 types of puzzle is the more difficult? In paired comparisons between each possible pair of stimuli, rhythms or puzzles, Hawkeye Harriet will pick what for her is the correct pair every time, so all we’ll get from her is the rank-order of stimuli, rhythms and puzzles. Myopic Myra will less reliably and less accurately choose what for her is the correct pair, but her choice-rates will be correlated with how dissimilar each pair is. We’ll recover much more precise information about the underlying structure of the stimulus set from error-prone Myra.
Annemarie’s point about measurement is somewhat related to another fascinating phenomenon known as stochastic resonance. Briefly paraphrasing the Wikipedia entry for stochastic resonance (SR), SR occurs when a measurement or signal-detecting system’s signal-to-noise ratio increases when a moderate amount of noise is added to the incoming signal or to the system itself. SR usually is observed either in bistable or sub-threshold systems. Too little noise results in the system being insufficiently sensitive to the signal; too much noise overwhelms the signal. Evidence for SR has been found in several species, including humans. For example, a 1996 paper in Nature reported a demonstration that subjects asked to detect a sub-threshold impulse via mechanical stimulation of a fingertip maximized the percentage of correct detections when the signal was mixed with a moderate level of noise. One way of thinking about the optimized version of Myopic Myra as a measurement instrument is to model her as a “noisy discriminator,” with her error-rate induced by an optimal random noise-generator mixed with an otherwise error-free discriminating mechanism.
Hi, I’m back again after a few weeks’ travel (presenting papers at conferences). I’ve already posted material on this blog about the “ignorance explosion.” Numerous writings have taken up the theme that there is far too much relevant information for any of us to learn and process and the problem is worsening, despite the benefits of the internet and effective search-engines. We all have had to become more hyper-specialized and fragmented in our knowledge-bases than our forebears, and many of us find it very difficult as a result to agree with one another about the “essential” knowledge that every child should receive in their education and that every citizen should possess.
Well, here is a modest proposal for one such essential: We should all become expert about experts and expertise. That is, we should develop meta-expertise.
We can’t know everything, but knowing an expert when we see one, being able to tell the difference between an expert and an impostor, and knowing what it takes to become an expert can guide our search for assistance in all things about which we’re ignorant. A meta-expert should:
- Know the broad parameters of and requirements for attaining expertise;
- Be able to distinguish a genuine expert from a pretender or a charlatan;
- Know whether expertise is and when it is not attainable in a given domain;
- Possess effective criteria for evaluating expertise, within reasonable limits; and
- Be aware of the limitations of specialized expertise.
Let’s start with that strongly democratic source of expertise: Wikipedia’s take on experts:
“In many domains there are objective measures of performance capable of distinguishing experts from novices: expert chess players will almost always win games against recreational chess players; expert medical specialists are more likely to diagnose a disease correctly; etc.”
That said, the Wikipedia entry also raises a potentially vexing point, namely that “expertise” may come down to merely a matter of consensus, often dictated by the self-same “experts.” Examples readily spring to mind in areas where objective measures are hard to come by, such as the arts. But consider also domains where objective measures may be obtainable but not assessable by laypeople. Higher mathematics is a good example. Only a tiny group of people on the planet were capable of assessing whether Andrew Wiles really had proven Fermat’s Theorem. The rest of us have to take their word for it.
A crude but useful dichotomy splits views about expertise into two camps: Constructivist and performative. The constructivist view emphasizes the influence of communities of practice in determining what expertise is and who is deemed to have it. The performative view portrays expertise as a matter of learning through deliberative practice. Both views have their points, and many domains of expertise have elements of both. Even domains where objective indicators of expertise are available can have constructivist underpinnings. A proficient modern-day undergraduate physics student would fail late 19th-century undergraduate physics exams; and experienced medical practitioners emigrating from one country to another may find their qualifications and experience unrecognized by their adopted country.
What are the requirements for attaining deep expertise? Two popular criteria are talent and deliberative practice. Re deliberate practice, a much-discussed rule of thumb is the “10,000 hour rule.” This rule was popularized in Malcolm Gladwell’s book Outliers and some authors misattribute it to him. It actually dates back to studies of chess masters in the 1970’s (see Ericsson, K. A., R. Th. Krampe, and C. Tesch-Römer, 1993), and its generalizability to other domains still is debatable. Nevertheless, the 10K rule has some merit, and unfortunately it has been routinely ignored in many psychological studies comparing “experts” with novices, where the “experts” often are undergraduates who have been given a few hours’ practice on a relatively trivial task.
The 10K rule can be a useful guide but there’s an important caveat. It may be a necessary but it is by no means a sufficient condition for guaranteeing deep expertise. At least three other conditions have to be met: Deliberative and effective practice in a domain where deep expertise is attainable. Despite this quite simple line of reasoning, plenty of published authors have committed the error of viewing the 10K rule as both necessary and sufficient. Gladwell didn’t make this mistake, but Jane McGonigal’s recent book on video and computer games devotes considerable space to the notion that because gamers are spending upwards of 10K hours playing games they must be attaining deep “expertise” of some kind. Perhaps some may be, provided they are playing games of sufficient depth. But many will not. (BTW, McGonigal’s book is worth a read despite her over-the-top optimism about how games can save the world—and take a look at her game-design collaborator Bogost’s somewhat dissenting review of her book).
Back to the caveats. First, no deliberation makes practice useless. Having spent approximately 8 hours every day sleeping for the past 61 years (178,120 hours) hasn’t made me an expert on sleep. Likewise, deliberative but ineffective practice methods deny us top-level expertise. Early studies of Morse Code experts demonstrated that mere deliberative practice did not guarantee best performance results; specific training regimes were required instead. Autodidacts with insight and aspirations to attain the highest performative levels in their domains eventually realise how important getting the “right” coaching or teaching is.
Finally, there is the problem of determining whether effective, deliberative practice yields deep expertise in any domain. The domain may simply not be “deep” enough. In games of strategy, tic-tac-toe is a clear example of insufficient depth, checkers is a less obvious but still clear example, whereas chess and go clearly have sufficient depth.
Tic-tac-toe aside, are there domains that possess depth where deep expertise nevertheless is unattainable? There are, at least, some domains that are deeply complex where “experts” perform no better then less trained individuals or simple algorithms. Psychotherapy is one such domain. There is a plethora of studies demonstrating that clinical psychologists’ predictions of patient outcomes are worse than simple linear regression models (cf. Dawes’ searing indictment in his 1994 book) and that sometimes experts’ decisions are no more accurate than beginners’ decisions and simple decision aids. Similar results have been reported for financial planners and political experts. In Philip Tetlock’s 2005 book on so-called “expert” predictions, he finds that many so-called experts perform no better than chance in predicting political events, financial trends, and so on.
What can explain the absence of deep expertise in these instances? Tetlock attributes experts’ poor performance to two factors, among others: Hyperspecialization and overconfidence. “We reach the point of diminishing marginal predictive returns for knowledge disconcertingly quickly,” he reports. “In this age of academic hyperspecialization, there is no reason for supposing that contributors to top journals—distinguished political scientists, area study specialists, economists, and so on—are any better than journalists or attentive readers of the New York Times in ‘reading’ emerging situations.” And the more famous the forecaster the more overblown the forecasts. “Experts in demand,” Tetlock says, “were more overconfident than their colleagues who eked out existences far from the limelight.” Tetlock also claims that cognitive style counts: “Foxes” tend to outperform “hedgehogs.” These terms are taken from Isaiah Berlin’s popular essay: Foxes know a little about lots of things, whereas hedgehogs know one big thing.
Another contributing factor may be a lack of meta-cognitive insight on the part of the experts. A hallmark of expertise is ignoring (not ignorance). This proposition may sound less counter-intuitive if it’s rephrased to say that experts know what to ignore. In an earlier post I mentioned Mary Omodei and her colleagues’ chapter in a 2005 book on professionals’ decision making in connection with this claim. Their chapter opens with the observation of a widespread assumption that domain experts also know how to optimally allocate their cognitive resources when making judgments or decisions in their domain. Their research with expert fire-fighting commanders cast doubt on this assumption.
The key manipulations in the Omodei simulated fire-fighting experiments determined the extent to which commanders had unrestricted access to “complete” information about the fires, weather conditions, and other environmental matters. They found that commanders performed more poorly when information access was unrestricted than when they had to request information from subordinates. They also found that commanders performed more poorly when they believed all available information was reliable than when they believed that some of it was unreliable. The disquieting implication of these findings is that domain expertise doesn’t include meta-cognitive expertise.
Cognitive biases and styles aside, another contributing set of factors may be the characteristics of the complex, deep domains themselves that render deep expertise very difficult to attain. Here is a list of tests you can apply to such domains by way of evaluating their potential for the development of genuine expertise:
- Stationarity? Is the domain stable enough for generalizable methods to be derived? In chaotic systems long-range prediction is impossible because of initial-condition sensitivity. In human history, politics and culture, the underlying processes may not be stationary at all.
- Rarity? When it comes to prediction, rare phenomena simply are difficult to predict (see my post on making the wrong decisions most of the time for the right reasons).
- Observability? Can the outcomes of predictions or decisions be directly or immediately observed? For example in psychology, direct observation of mental states is nearly impossible, and in climatology the consequences of human interventions will take a very long time to unfold.
- Objective or even impartial criteria? For instance, what is “good,” “beautiful,” or even “acceptable” in domains such as music, dance or the visual arts? Are such domains irreducibly subjective and culture-bound?
- Testability? Are there clear criteria for when an expert has succeeded or failed? Or is there too much “wiggle-room” to be able to tell?
Finally, here are a few tests that can be used to evaluate the “experts” in your life:
- Credentials: Does the expert possess credentials that have involved testable criteria for demonstrating proficiency?
- Walking the walk: Is the expert an active practitioner in their domain (versus being a critic or a commentator)?
- Overconfidence: Ask your expert to make yes-no predictions in their domain of expertise, and before any of these predictions can be tested ask them to estimate the percentage of time they’re going to be correct. Compare that estimate with the resulting percentage correct. If their estimate was too high then your expert may suffer from over-confidence.
- Confirmation bias: We’re all prone to this, but some more so than others. Is your expert reasonably open to evidence or viewpoints contrary to their own views?
- Hedgehog-Fox test: Tetlock found that Foxes were better-calibrated and more able to entertain self-disconfirming counterfactuals than hedgehogs, but allowed that hedgehogs can occasionally be “stunningly right” in a way that foxes cannot. Is your expert a fox or a hedgehog?
- Willingness to own up to error: Bad luck is a far more popular explanation for being wrong than good luck is for being right. Is your expert balanced, i.e., equally critical, when assessing their own successes and failures?
The title of this post is, of course, a famous quotation from Edmund Burke. This is a personal account of an attempt to find an appropriate substitute for such a plan. My siblings and I persuaded our parents that the best option for financing their long-term in-home care is via a reverse-mortgage. At first glance, the problem seems fairly well-structured: Choose the best reverse mortgage setup for my elderly parents. After all, this is the kind of problem for which economists and actuaries claim to have appropriate methods.
There are two viable strategies for utilizing the loan from a reverse mortgage: Take out a line of credit from which my parents can draw as they wish, or a tenured (fixed) schedule of monthly payments to their nominated savings account. The line of credit (LOC) option’s main attraction is its flexibility. However, the LOC runs out when the equity in my parents’ property is exhausted, whereas the tenured payments (TP) continue as long as they live in their home. So if either of them is sufficiently long-lived then the TP could be the safer option. On the other hand, the LOC may be more robust against unexpected expenses (e.g., medical emergencies or house repairs). Of course, one can opt for a mixture of TP and LOC.
So, this sounds like a standard optimization problem: What’s the optimal mix of TP and LOC? Here we run into the first hurdle: “Optimal” by what criteria? One criterion is to maximize the expected remaining equity in the property. This criterion might be appealing to their offspring, but it doesn’t do my parents much good. Another criterion that should appeal to my parents is maximizing the expected funds available to them. Fortunately, my siblings and I are more concerned for our parents’ welfare than what we’d get from the equity, so we’re happy to go with the second criterion. Nevertheless, it’s worth noting that this issue poses a deeper problem in general—How would a family with interests in both criteria come up with an appropriate weighting for each of them, especially if family members disagreed on the importance of these criteria?
Meanwhile, having settled on an optimization criterion, the next step would seem to be computing the expected payout to my parents for various mixtures of TP and LOC. But wait a minute. Surely we also should be worried about the possibility that some financial exigency could exhaust their funds altogether. So, we could arguably consider a third criterion: Minimizing the probability of their running out of funds. So now we encounter a second hurdle: How do we weigh up maximizing expected payout to our parents against the likelihood that their funds could run out? It might seem as if maximizing payout would also minimize that probability, but this is not necessarily so. A strategy that maximized expected payout could also increase the variability of the available funds over time so that the probability of ruin is increased.
Then there are the unknowns: How long our parents might live, what expenses they might incur (e.g., medical or in-home care), inflation, the behaviour of the LIBOR index that determines the interest rate on what is drawn down from the mortgage, and appreciation or deprecation of the property value. It is possible to come up with plausible-looking models for each of these by using standard statistical tools, and that’s exactly what I did.
I pulled down life-expectancy tables for American men and women born when my parents were born, more than two decades of monthly data on inflation in the USA, a similar amount of monthly data on the LIBOR, and likewise for real-estate values in the area where my parents live. I fitted a several “lifetime” distributions to the relevant parts of the life-expectancy tables to model the probability of my parents living 1, 2, 3, … years longer given that they have survived to their mid-80’s and arrived at a model that fitted the data very well. I modeled the inflation, LIBOR and real-estate data with standard time-series (ARIMA) models whose squared correlations with the data were .91, .98, and .91 respectively—All very good fits.
Finally, my brothers and sisters-in-law obtained the necessary information from my mother regarding our parents’ expenses in the recent past, their income from pensions and so on, and we made some reasonable forecasts of additional expenses that we can foresee in the near term. The transition in this post from “I” to “we” is crucial. This was very much a joint effort. In particular, my youngest brother’s sister-in-law made most of the running on determining the ins and outs of reverse mortgages. She has a terrifically analytical intelligence, and we were able to cross-check one another’s perceptions, intuitions, and calculations.
Armed with all of this information and well-fitted models, it would seem that all we should need to do is run a large enough batch of simulations of the future for each reverse-mortgage scenario under consideration to get reliable estimates of expected payout, expected equity, the probability of ruin, and so on. The inflation model would simulate fluctuations in expenses, the LIBOR model would do so for the interest-rates, the real-estate model for the property value, and the life-expectancy model for how long our parents would live.
But there are at least two flaws in my approach. First, it assumes that my parents’ life-spans can best be estimated by considering them as if they are randomly chosen from the population of American men and women born when they were born who have survived to their mid-80’s. Should I take additional characteristics about them into account and base my estimates on only those who share those characteristics as well as their nation and birth-year? What about diet, or body-mass index, or various aspects of their medical histories? This issue is known as the reference-class problem, and it bedevils every school of statistical inference.
What did I do about this? I fudged my life-expectancy model to be “conservative,” i.e., so that it assumes my parents have a somewhat longer life-span than the original model suggests. In short, I tweaked my model as a risk-averse agent would—The longer my parents live, the greater the risk that they will run short of funds.
The second flaw in my approach is more fundamental. It assumes that the future is going to be just like the past. And before anyone says anything, yes, I’ve read Taleb’s The Black Swan (and was aware of most of the material he covered before reading his book), and yes, I’m aware of most criticisms that have been raised against the kind of models I’ve constructed. The most problematic assumption in my models is what is called stationarity, i.e., that the process driving the ups and downs of, say, the LIBOR index has stable characteristics. There were clear indications that the real-estate market fluctuations in the area where my parents live do not resemble a stationary process, and therefore I should not trust my ARIMA model very much despite its high correlation with the data.
Let me also point out the difference between my approach and the materials provided to us by potential lenders and the HUD counsellor. Their scenarios and forecasts are one-shot spreadsheets that don’t simulate my parents’ expenses, the impact of inflation, or fluctuations in real-estate markets. Indeed, the standard assumption about the latter in their spreadsheets is a constant appreciation in property value of 4% per year.
My simulations are literally equivalent to 10,000 spreadsheets for each scenario, each spreadsheet an appropriate random sample from an uncertain future, and capable of being tweaked to include possibilities such as substantial real-estate downturns. I also incorporated random “shock” expenditures on the order of $5-$75K to see how vulnerable each scenario was to unexpected expenses.
The upshot of all this was that the mix of LOC and TP had a substantial effect on the probability of running out of money, but not a large impact on expected balance or equity (the other factors had large impacts on those). So at least we could home in on a robust mix of LOC and TP, one that would have a lower risk of running out of money than others. This criterion became the primary driver in our choice. We also can monitor how our parents’ situation evolves and revise the mix if necessary.
What about maximizing expected utility? Or optimizing in any sense of the term? No, and no. The deep unknowns inherent even in this relatively well-structured problem make those unattainable goals. What can we do instead? Taleb’s advice is to pay attention to consequences instead of probabilities. This is known as “dominance reasoning.” If option A yields better outcomes than option B no matter what the probabilities of those outcomes are, choose option A. But life often isn’t that simple. We can’t do that here because the comparative outcomes of alternative mixtures of LOC and TP depend on probabilities.
Instead, we have ended up closer to the “bounded rationality” that Herbert Simon wrote about. We can’t claim to have optimized, but we do have robustness and corrigibility on our side, two important criteria for good decision making under ignorance (described in my recent post on that topic). Perhaps most importantly, the simulations gave us insights none of our intuitions could, into how variable the future can be and the consequences of that variability. Sir Edmund was right. We can’t plan the future by the past. But sometimes we can chart a steerable course into that future armed with a few clues from the past to give us an honest check on our intuitions, and a generous measure of scepticism about relying too much on those clues.