“I come not to praise forecasters but to bury them.”  With these unsubtle words, Barry Ritholz opens an entertaining piece in the Washington Post, expressing a widely held view about forecasting in difficult domains such as geopolitics or financial markets.  The view is that nobody is any good at it, or if anyone is, they can’t be reliably identified.  This hard-line skepticism has seemed warranted by the persistent failure of active fund managers to statistically outperform dart-throwing monkeys, or the research by Philip Tetlock showing that geopolitical experts do scarcely better than random, and worse than the simplest statistical methods.

More recent research on a range of fronts – notably, by the Good Judgement Project, but also by less well-known groups such as Scicast and ACERA/CEBRA here at Melbourne University – has suggested that a better view is what might be termed “tempered optimism” about expert judgement forecasting. This new attitude acknowledges that forecasting challenges will always fall on a spectrum from the easy to the practically impossible.  However, in some important but difficult domains, hard-line skepticism is too dogmatic.  Rather,

  • There can be forecasting skill;
  • Some people can be remarkably good;
  • Factors conducive to good forecasting have been identified;
  • Forecasting is a skill which can be improved in broadly the same way as other skills;
  • Better forecasts can be obtained by combining forecasts.

A high-level lesson that seems to be emerging is that forecasting depends on culture.  That is, superior forecasting is not a kind of genius possessed (or not) by individuals, but emerges when a group or organisation has the right kinds of common beliefs, practices, and incentives.

The obvious question then is what such a culture is like, and how it can be cultivated.  As part of work for an Australian superannuation fund, I distilled the latest research supporting tempered optimism into seven guidelines for developing a culture of superior forecasting.

  1. Select.  When choosing who to hire – or retain – for your forecasting team, look for individuals with the right kind of mindset.  To a worthwhile extent, mindset can be assessed using objective tests.
  2. Train.  Provide basic training in the fundamentals of forecasting and generic forecasting skills.  A brief training session can improve forecasting performance over a multi-year period.
  3. Track. Carefully document and evaluate predictive accuracy using proper scoring rules.  Provide results to forecasters as helpful feedback.
  4. Team. Group forecasters into small teams who work together, sharing information and debating ideas.
  5. Stream. Put your best forecasters (see Track) into an elite team.
  6. Motivate. Incentives should reward predictive accuracy (see Track) and constructive collaboration.
  7. Combine. Generate group forecasts by appropriately combining individuals’ forecasts, weighting by predictive accuracy (see Track).

The pivotal element here obviously is Track, i.e. measuring predictive accuracy using a proper scoring rule such as the Brier score.  According to Mellers (a key member of the Good Judgement Project) and colleagues, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”  Using proper scoring rules requires forecasters to commit to explicit probabilistic predictions, a practice that is common in fields such as weather forecasting where predictions are rigorously assessed, but very rare in domains such as geopolitics and investment.  This relative lack of rigorous assessment is a key enabler – and concealer – of ongoing poor performance.

In current work, we are developing training in generic forecasting skills, and practical methods for using scoring rules to evaluate predictive accuracy in domains such as funds management. Contact me if this may be of interest in your organisation.

Join my email list to be periodically notified of interesting posts to this site.

I forwarded to Paul Monk a link to this video:

He replied, within minutes:

Truly awesome.

It prompts the thought that the biggest revolutions in worldview have been scientific and have entailed:

1. Moving from the Earth centred (Aristotelian/Biblical) cosmology (which had its counterparts in many tribal myths and the cosmogonies of many other civilizations; though the classical Atomists began to guess at the truth and this was picked up again by Giordano Bruno in the late 16th century, only to get him burned alive in Rome by the Inquisition) to first a heliocentric one, then a Milky Way one, then a Hubble 3D one, as it were, and finally to a multiverse one;

2. Discovering that we are evolved creatures and have a direct biological ancestry going back 3.8 billion years, but on a world that, in much less than that time into the future (regardless of what we do) will become uninhabitable, as the Sun swells to become a red giant and destroys the Goldilocks Zone which makes life on Earth possible;

3. Realizing that we live in and are imbricated in a world of microbes that used to dominate the planet, exist in a highly complex symbiosis with larger life forms, including predation upon them and have played a substantial role in the mass extinctions.

4. Slowly getting to understanding human history from a global and cosmopolitan perspective instead of from narrowly local ones; and

5. Developing the elements of a universal cognitive humanism with the exploration of languages and linguistics, comparative mythology (Levi-Strauss and structuralism) and anthropology (including Durkheim’s Elementary Forms of the Religious Life, about a century ago).

My own worldview, if you like, is that all these things transcend (trump) the epistemological claims of the old religions and mythologies, as well as those even of 19th century political ideologies (to say nothing of crude 20th century ones such as Nazism and Marxism-Leninism). BUT the vast majority of human beings on the planet know almost nothing of all this and certainly have not been able to weave it together into a coherent new, shared, universal worldview for the 21st century.

Just a few thoughts on the run, or rather while viewing Andromeda.

In our consulting work we have periodically been asked to review how judgments or decisions of a particular kind are made within an organisation, and to recommend improvements.  This has taken us to some interesting places, such as the rapid lead assessment center of a national intelligence agency, and recently, meetings of coaches of an elite professional sports team.

On other occasions, we have been asked to assist a group to design and build, more or less from scratch, a process for making a particular decision or set of decisions (e.g., decisions as to what a group should consider itself to collectively believe).

Both types of activity involve thinking hard about what the current/default process is or would be, and what kind of process might work more effectively in a given real-world context, in the light of what academics in fields such as cognitive science and organisational theory have learned over the years.

This sounds a bit like engineering.  My favorite definition of the engineer is somebody who can’t help but think that there must be a better way to do this.  A more comprehensive and workmanlike definition is given by Wikipedia:

Engineering is the application of scientific, economic, social, and practical knowledge in order to invent, design, build, maintain, research, and improve structures, machines, devices, systems, materials and processes.

The activities mentioned above seem to fit this very broad concept: we were engaged to help improve or develop systems – in our case, systems for making decisions.

It is therefore tempting to describe some of what we do as decision engineering.  However this term has been in circulation for some decades now, shown in this Google n-gram:


and its current meaning or meanings might not be such a good fit with our activities.  So, I set about exploring what the term means “out there”.

As usual in such cases, there doesn’t appear to be any one official, authoritative definition.  Threads appearing in various characterizations include:

While each such thread clearly highlights something important, my view is that individually they are only part of the story, and collectively are a bit of a dog’s breakfast.  What we need, I think, is a more succinct, more abstract, and more unifying definition.  Here’s an attempt, based on Wikipedia’s definition of engineering:

Decision engineering is applying relevant knowledge to design, build, maintain, and improve systems for making decisions.

Relevant knowledge can include knowledge of at least three kinds:

  • Theoretical knowledge from any relevant field of inquiry;
  • Practical knowledge (know-how, or tacit knowledge) of the decision engineer;
  • “Local” knowledge of the particular context and challenges of decision making, contributed by people already in or familiar with the context, such as the decision makers themselves.

System is of course a very broad term, and for current purposes a system for making decisions, or decision system, is any complex part of the world causally responsible for decisions of a certain category.  Such systems may or may not include humans.  For example, decisions in a Google driverless car would be made by a complex combination of sensors, on-board computing processors, and perhaps elements outside the car such as remote servers.

However the decision processes we have worked on, which might loosely be called organisational decision processes, always involve human judgement at crucial points.  The systems responsible for such decisions include

  • People playing various roles
  • “Norms,” including procedures, guidelines, methods, standards.
  • Supporting technologies ranging from pen and paper through sophisticated computers
  • Various aspects of the environment or context of decision making.

For example, a complex organisational decision system produces the monthly interest rate decisions of the Reserve Bank of Australia, as hinted at in this paragraph from their website:

The formulation of monetary policy is the primary responsibility of the Reserve Bank Board. The Board usually meets eleven times each year, on the first Tuesday of the month except in January. Hence, the dates of meetings are well known in advance. For each meeting, the Bank’s staff prepare a detailed account of developments in the Australian and international economies, and in domestic and international financial markets. The papers contain a recommendation for the policy decision. Senior staff attend the meeting and give presentations. Monetary policy decisions by the Reserve Bank Board are communicated publicly shortly after the conclusion of the meeting.

and described in much more detail in this (surprisingly interesting) 2001 speech by the man who is now Governor of the Reserve Bank.

In most cases, decision engineering means taking an existing system and considering to how improve it.  A system can be better in various ways, including:

  • First and foremost, improving the decision hit rate, i.e. the proportion of decisions which are correct in the sense of choosing an optimal or at least satisfactory path of action;
  • More efficient in the sense of using less resources or producing decisions more quickly
  • More transparent or defensible.

Now, in order to improve a particular decision system, a decision engineer might use approaches such as:

  • Bringing standard engineering principles and techniques to bear on making decisions
  • Using more structured decision methods, including the application of decision analysis techniques
  • Basing decisions on “big data” and “data science,” such as predictive analytics

(i.e., the “threads” listed above).  However the usefulness of these approaches will depend very much on the nature of the decision challenges being addressed.  For example, if you want to improve how elite football coaches make decisions in the coaching box on game day, you almost certainly will not introduce highly structured decision methods such as decision trees.

In short, I like this more general definition of decision engineering (in four words or less, building better decision systems) because it seems to get at the essence of what decision engineers do, allowing but not requiring that highly technical, quantitative approaches might be used.  And it accommodates my instinct that much of what we do in our consulting work should indeed count as as a kind of engineering.

Whether we would be wise to publicly describe ourselves as decision engineers is however quite another question – one for marketers, not engineers.

About a month ago The Age published an opinion piece I wrote under the title “Do you hold a Bayesian or Boolean worldview?“.  I had submitted it under the title “Madmen in Authority,” and it opened by discussing two men in authority who are/were each mad in their own way – Maurice Newman, influential Australian businessman and climate denier, and Cuban dictator Fidel Castro.  Both men had professed to be totally certain about issues on which any reasonable person ought to have had serious doubts given the very substantial counter-evidence.

Their dogmatic attitudes seemed to exemplify a kind of crude epistemological viewpoint I call “Booleanism,” in contrast with a more sophisticated “Bayesianism”. Here is the philosophical core of the short opinion piece:

On economic matters, Keynes said: “Practical men who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist. Madmen in authority, who hear voices in the air, are distilling their frenzy from some academic scribbler of a few years back.”

Similarly, on matters of truth and evidence, we are usually unwittingly beholden to our background epistemology (theory of knowledge), partially shaped by unknown theorists from centuries past.

One such  theory of knowledge we can call Boolean, after the 19th century English logician George Boole.  He was responsible for what is now known as Boolean algebra, the binary logic which underpins the computing revolution.

In the Boolean worldview, the world is organised into basic situations such as Sydney being north of Melbourne. Such situations are facts. Truth is correspondence to facts. That is, if a belief matches a fact, it is objectively true; if not, it is objectively false. If you and I disagree, one of us must be right, the other wrong; and if I know I’m right, then I know you’re wrong. Totally wrong.

This worldview underpins Castro’s extreme confidence.  Either JFK was killed by an anti-Castro/CIA conspiracy or he wasn’t; and if he was, then Castro is 100 per cent right. Who needs doubt?

An alternative  theory of knowledge has roots in the work of another important English figure, the Reverend Thomas Bayes. He is famous for Bayes’ Theorem, a basic law of probability governing how to modify one’s beliefs when new evidence arrives.

In the Bayesian worldview, beliefs are not simply true or false, but more or less probable. That is, we can be more or less confident that they are true, given how they relate to our other beliefs and how confident we are in them. If you and I disagree about the cause of climate change, it is not a matter of me being wholly right and you being wholly wrong, but about the differing levels of confidence we have in a range of hypotheses.

Scientists are generally Bayesians, if not self-consciously, at least in their pronouncements. For example, the IPCC refrains from claiming certainty that climate change is human-caused; it says instead that it has 95 per cent confidence that human activities are a major cause.

On Thursday 9th October I’m doing a presentation at a conference of The Tax Institute, the Australian professional association for tax specialists, introducing decision analysis techniques.  The presentation will illustrate (with live demonstration) the following applications:

  • Using quantitative risk analysis (Monte Carlo simulation) to help a client gain better insight into the probable or possible outcomes of a certain tax strategy;
  • Using decision trees to help a client decide whether to purse a dispute with the Tax Office through the courts.

The conference paper is available here.

An excerpt:

Decision analysis techniques are well-developed and used, more or less widely, in various other professions such as engineering and finance. However, they are rarely used by tax specialists, or by lawyers and accountants more broadly.

Why? One perspective is that decision analysis is fundamentally ill-suited to the kinds of reasoning and decision making involved in tax matters, which are thought to involve unquantifiable issues and nuances requiring intuitive nous of the kind only highly trained and experienced legal or accountants can provide.

An alternative perspective is that tax matters would almost always benefit from decision analysis, and that tax specialists fail to use it only because they are trapped behind boundaries imposed by their professional traditions, their training, and their intellectual inertia. A strong version of this view is that tax specialists are derelict in failing to provide their clients with an easily-obtainable level of clarity and rigour.

In the spirit of John Stuart Mill, this paper takes a middle position. It suggests that decision analysis is potentially useful for certain types of problems regularly handled by tax specialists, while not being appropriate for many others. Decision analysis may represent an important opportunity for tax specialists to provide greater value to sophisticated clients.

1.2.1 Three Thinking Modes

At a high level, the relation of decision analysis to the kinds of intellectual labour generally undertaken by tax specialists is summarized in this diagram.


To indulge in some useful caricatures, qualitative thinking is the domain of the lawyer. It uses no numbers at all, or at most simple arithmetic. The central concept is the argument. Making the most important decisions is always a matter of “weighing up” arguments expressed in the legally-inflected natural language

Quantitative deterministic thinking is the speciality of the accountant. It is epitomised in the structures and calculations in an ordinary spreadsheet, in which specified inputs are “crunched” into equally specific outputs. The central concept is calculation; uncertainties are replaced by “assumptions”. Decisions generally boil down to comparing the magnitudes of numerical outputs, in the penumbral light cast by the background knowledge, intuitions and biases of the decision maker.

The third mode of thinking, probabilistic, is of course the decision analyst’s territory. The central concepts is uncertainty, and the essential gambit is framing and manipulating probabilistic representations of uncertainty.

In this context, the “master” tax specialist has facility, or even advanced expertise, in all three modes of thinking.

There’s a familiar idea from the world of sport – that winning requires an elite team and not just a team of elite players.

Does something similar apply in the world of decision making?

In many situations, critical decisions are made by small groups.  The members of these groups are often “elite” in their own right.  For example, in Australia monthly interest rate decisions are made by the board of the Reserve Bank of Australia.  This is clearly a “team” of elite decision makers.

However it is not clear that they are an elite team of decision makers.   For current purposes, I define an elite decision team as a small decision group conforming to all or at least most of the following principles:

  1. The team operates according to rigorously thought-through decision making practices. Wherever possible these practices should be strongly evidence-based.
  2. The team has been trained to operate as a team using these practices. Members have well-defined and well-understood roles.
  3. Members have been rigorously trained as decision makers (and not just as, say, economists).
  4. The team, and members individually, are rigorously evaluated for their decision making performance.
  5. There is a program of continuous improvement.

Note also that the team should be a decision making team, i.e. one that makes decisions (commitments to courses of action) rather than judgements of some other kind such as predictions.

There are many types of teams which do operate according to analogs of these principles – for example elite sporting teams, as mentioned, and small military teams such as bomb disposal squads.  These teams’ operations involve decision making, but they are not primarily decision making teams.

I doubt the Board of the RBA is an elite decision team in this sense, but would be relieved to find out I was wrong.

More generally, I am currently looking for good examples of elite decision teams.  Any suggestions are most welcome.

Alternatively, if you think this idea of an elite decision team is somehow misconceived, that would be interesting too.

Well-known anti-theist Sam Harris has posted an interesting challenge on his blog.  He writes:

So I would like to issue a public challenge. Anyone who believes that my case for a scientific understanding of morality is mistaken is invited to prove it in under 1,000 words. (You must address the central argument of the book—not peripheral issues.) The best response will be published on this website, and its author will receive $2,000. If any essay actually persuades me, however, its author will receive $20,000,* and I will publicly recant my view. 

In the previous post on this blog, Seven Habits of Highly Critical Thinkers, habit #3 was Chase Challenges.  If nothing else, Harris’ post is a remarkable illustration of this habit.

The quality of his case is of course quite another matter.

I missed the deadline for submission, and I haven’t read the book, and don’t intend to, though it seems interesting enough. So I will just make a quick observation about the quality of Harris’ argument as formulated.

In a nutshell, simple application of argument mapping techniques quickly and easily show that Harris’ argument, as stated by Harris himself on the challenge blog page, is a gross non-sequitur, requiring, at a minimum, multiple additional premises to bridge the gap between his premises and his conclusions.  In that sense, his argument as stated is easily shown to be seriously flawed.

Here is how Harris presents his argument:

1. You have said that these essays must attack the “central argument” of your book. What do you consider that to be?
Here it is: Morality and values depend on the existence of conscious minds—and specifically on the fact that such minds can experience various forms of well-being and suffering in this universe. Conscious minds and their states are natural phenomena, fully constrained by the laws of the universe (whatever these turn out to be in the end). Therefore, questions of morality and values must have right and wrong answers that fall within the purview of science (in principle, if not in practice). Consequently, some people and cultures will be right (to a greater or lesser degree), and some will be wrong, with respect to what they deem important in life.

This formulation is short and clear enough that creating a first-pass argument map in Rationale is scarcely more than drag and drop:


Now, as explained in the second of the argument mapping tutorials, there are some basic, semi-formal constraints on the adequacy of an argument as presented in an argument map.

First, the “Rabbit Rule” decrees that any significant word or phrase appearing in the contention of an argument must also appear in at least one of the premises of that argument.  Any significant word or phrase appearing in the contention but not appearing in one of the premises has suddenly appeared out of thin air, like the proverbial magician’s rabbit, and so is informally called a rabbit.  Any argument with rabbits is said to commit rabbit violations.

Second, the Rabbit Rule’s sister, the “Holding Hands Rule,” decrees that any significant word or phrase appearing in one of the premises must appear either in the contention, or in another premise.

These rules are aimed at ensuring that the premises and contention of an argument are tightly connected with each other.  The Rabbit Rule tries to ensure that every aspect of what is claimed in the contention is “covered” in the premises.  If the Rabbit Rule is not satisfied, the contention is saying something which hasn’t been even discussed in the premises as stated.  (Not to go into it here, but this is quite different from the sense in which, in an inductive argument, the contention “goes beyond” the premises.) The Holding Hands Rule tries to ensure that any concept appearing in the premises is doing relevant and useful work.

Consider then the basic argument consisting of Contention 1 and the premises beneath it.   It is obvious on casual inspection that much – indeed most – of what appears in Contention 1 does not appear in the premises.  Consider for example the word “purview”, or the phrase “falls within the purview of science”.  These do not appear in the premises as stated. What does appear in Premise 2 is “natural phenomena, fully constrained by the laws of the universe”.  But as would be obvious to any philosopher, there’s a big conceptual difference between these.

What Harris’ argument needs, at a very minimum, is another premise.  My guess is that it is something like “Anything fully constrained by the laws of the universe falls within the purview of science.”   But two points.  First, this suggested premise obviously needs (a) explication, and (b) substantiation.  In other words, Harris would need to argue for it, not assume it. Second, it may not be the Harris’ preferred way of filling gaps (one of them, at least) between his premises and his conclusion.  Maybe he’d come up with a different formulation of the bridging premise.  Maybe he addresses this in his book.

It would be tedious to list and discuss the numerous Rabbit and Holding Hands violations present in the two basic arguments making up Harris’ two-step “proof”.   Suffice to say, that if both Rabbit Rule and Holding Hands Rule violations are called “rabbits” (we also use the term “danglers”), then his argument looks a lot like the famous photo of a rabbit plague in the Australian outback:


Broadly speaking, fixing these problems would require quite a bit of work:

  • refining the claims he has provided
  • adding suitable additional premises
  • perhaps breaking the overall argument into more steps.

Pointing this out doesn’t prove that his main contentions are false.  (For what little it is worth, I am quite attracted to them.)  Nor does it establish that there is not a solid argument somewhere in the vicinity of what Harris gave us. It doesn’t show that Harris’ case (whatever it is) for a scientific understanding of morality is mistaken.  What it does show is that his own “flagship” succinct presentation of his argument (a) is sloppily formulated, and (b) as stated, clearly doesn’t establish its contentions.   In short, as stated, it fails.  Argument mapping reveals this very quickly.

Perhaps this is why, in part, there is so much argy bargy about Harris’ argument.

Final comment: normally I would not be so picky about how somebody formulated what may be an important argument.  However in this case the author was pleading for criticism.


Get every new post delivered to your Inbox.

Join 498 other followers