When is a forecaster performing well?  An increasingly common way to measure this is to use a scoring rule known as the Brier score.

The essential idea behind the Brier score is simple enough: it is the average gap (mean squared difference) between forecast probabilities and actual outcomes. This post tries to explain and motivate the Brier score by “composing” it from some other simple ideas about forecasting quality, unlike many presentations which start with the Brier score and then show how it can be decomposed. There is nothing surprising here for anyone well-versed in these topics, but others who (like me) are just beginning to explore these ideas might find the post helpful.

I’ll use a very small real-world dataset, a set of predictions about what the Reserve Bank of Australia will decide about interest rates at its monthly meetings.  The RBA generally leaves interest rates unchanged, but sometimes raises them and sometimes lowers them, depending on economic conditions.  The dataset consists of predictions implicit in the assessments of the ANU RBA Shadow Board, as found on the website RBA.Tips.  To keep things simple, the dataset reduces the predictions to binary outcomes – Change or No Change – and provides a numerical estimate of the probability of No Change.


The “Coded Outcome” column just translates the RBA’s decision into numbers – 1 for No Change, and 0 for Change.  This makes it possible to do the kind of mathematics described below.


One obvious thing about this dataset is that more often than not, there is No Change.  In this small sample, the RBA made no change 5/7 times or 71.4% of the time, which as it happens is quite close to the long term (1990-2015) average or overall base rate of 75%.  In other words, there isn’t a lot of uncertainty in the outcomes being predicted. Conversely, uncertainty would also be low if the RBA almost always changed the interest rate.  A simple way to put a single number on the uncertainty of either of these flavors is to take the base rate and multiply it by 1 minus itself, i.e.

Uncertainty = base rate * (1 – base rate).

For this dataset, Uncertainty is 0.714 – (1 – 0.714) = 0.204

This relative lack of uncertainty means that an attractive forecasting strategy would be to simply go with the base rate, i.e. always predicting that the RBA will do whatever it does most often.  How well would such a forecaster do? A simple way to measure this is in terms of hits and misses.  For the period above, the base-rate strategy would yield 5 hits and two misses out of a total of seven predictions, i.e. a hit/miss ratio of 5/7 = 71%.   Over a long period, this ratio should converge on the base rate – as long, that is, as variations in economic conditions, and RBA decision making, tend in future to be similar to they were in the past when the base rate was being determined.

The base rate strategy has some advantages (assuming you can access the base rate information).  First, it is better than the most naive approach, which would be to pick randomly, or assign equal probabilities to the possible outcomes.  Second, it is easy; you don’t have to know much or think hard about economic conditions and the interplay between those and RBA decision making. The downside is that over the long term you can’t do better than the base rate, and you can’t do better than anyone else who is also using the base strategy strategy.  If you’re ambitious or competitive or just take pride in good work,  you’ll need to make predictions which are more sensitive to the underlying probabilities of the outcomes – i.e. more likely to predict No Change when no change is more likely, and vice versa for change.

This can be seen in our simple dataset.  Crude inspection suggests the predictions fall into two groups or “bins”.  From Oct-14 to Dec-14 the probabilities assigned to No Change were all 70% or above, and over this period, interest rates in fact never changed.  From Feb-15 to May-15, the probabilities were lower, in the 60-70% range, and twice there was in fact a change.   It seems that whoever made these predictions believed that the economic conditions made a change more likely in 2015 than it was in late 2014, and they correctly adjusted their predictions accordingly.  Note that they had two misses in 2015, suggesting that their probabilities had not been reduced sufficiently.  But intuitively the “miss” predictions were not quite as off-the-mark as they would have been if the probabilities had been at the higher 2104 level – an idea captured by the Brier score.


So in general a good forecaster will not make the same forecast regardless of circumstances but rather will have “horses for courses,” i.e. different forecasts when the actual probabilities of various outcomes are different.  Can we measure the extent to which a forecaster is doing this?  One way to do it is:

  • Put the forecasts into groups or bins with the same forecast probability
  • For each bin, measure how different the outcomes of predictions in the bin – the “bin base rate” are to the overall base rate.
  • Add up these differences

Lets see how this goes with out dataset.  Suppose we have two bins, the 70s (2014) bin and the 60s (2015) bin.  For forecasts in the 70s bin, the outcomes were all No Change, so the bin base rate is 1.  For the 60s bin, the bin base rate is 2/4 = 0.5.  So we get:

70s bin: 1 (the bin base rate) – 0.714 (the overall base rate) = 0.286
60s bin: 0.5 – 0.714 = -0.214

Before we just add up these differences, we need to square them to make sure they’re both positive, and then “weight” them by the number of forecasts in each bin:

0.286^2 * 3 = 0.245
-0.214^2 * 4 = 0.183

Then we add them and divide by the total number of forecasts (7), to get 0.061.

This number is known as the Resolution of the forecast set.  The higher the Resolution the better; a forecaster with higher Resolution is making forecasts which are more different to the overall base rate than a forecaster with a lower score, and in that sense more interesting or bold.


In order to define resolution we had to sort forecasts into probability-of-outcome bins.  A natural question to ask is how well these bins correspond to the rate at which outcomes actually occur.  Consider for example the 70s bin.  Forecasts in that bin predict No Change with a probability, on average, of 70.67%.  Does the RBA choose No Change 70.67% of the time in those months? No; it decided No Change 100% of the time.  So there’s a mismatch between forecast probabilities and outcome rates.  Since the latter is higher, we call the forecasts underconfident; the probabilities should have been higher.

Similarly forecasts in the 60s bin predicted No Change with probability (on average) 66%, but the RBA in fact made no change only half the time.  Since .66 is larger than 0.5, we call this overconfidence.

Calibration is the term used to describe the alignment between forecast probabilities and outcome rates.  Calibration is usually illustrated with a chart like this:


The orange line represents a hypothetical forecaster with perfect calibration, i.e. where the observed rate for every bin is exactly the same as the forecast probability defining that bin; the orange dots represent hypothetical bins with probabilities 0, 0.1, 0.2, etc..  The two bins from our dataset are shown as blue dots.  The 70s bin is out to the left of the line, indicating underconfidence; vice versa for the 60s bin.


So it seems our forecaster is not particularly well calibrated  (though be aware that we are dealing with a tiny dataset where luck of the draw can have undue effects). Can we quantify the level of calibration shown by the forecaster in a particular set of forecasts? Yes, using an approach very similar to the calculation in the previous section.  There we took the mean (average) squared difference between bin base rates and overall base rates.  To quantify calibration, we take the mean squared difference between bin probability and bin base rate.  If that sounds cryptic, lets walk through the numbers.

For the 70s bin, the average forecast probability was 70.67%, and the bin base rate was 1, so the squared difference is

(.7067 – 1)^2 = 0.086

Similarly for the 60s bin:

(0.66 – 0.5)^2 = .026

Multiple each of these by the number of forecasts in the bin:

0.086 * 3 = 0.258
.026 * 4 = 0.102

Add these together and divide by the total number of forecasts, to get 0.052.  This, as you guessed, is called the Reliability of the forecast set.  Note however that Reliability is good when the mean squared difference is minimized, so the lower reliability score, the better, unlike Resolution where higher is better.


Lets briefly take stock. Our guiding question has been: how good are the forecasts in our little dataset? So far, to get a handle on this we’ve loosely defined four quantities

  1. Uncertainty in the outcomes.  Uncertainty indicates the degree to which outcomes are predictable.
  2. Resolution of the forecast set.  This is the degree to which the forecasts fall into subsets with outcome rates different from the overall outcome base rate, calculated as mean squared difference.
  3. Calibration – the correspondence, on a bin-by-bin basis, between the forecast probabilities and the outcome rates;
  4. Reliability – an overall measure of calibration, calculated as the mean squared difference between forecast bin probabilities and outcome rates – or in other words, mean squared calibration.

Brier Score

But wouldn’t it be good if we could somehow capture all this in a single, goodness-of-forecasts number?  That’s what the Brier score does.  The Brier score is yet another mean squared difference measure, but this time it compares forecast probabilities with outcomes on a forecast-by-forecast basis.  In other words, for each forecast, subtract the outcome (coded as 1 or 0) from the forecast probability and square the result; add up all the results and divide by the total number of forecasts.  For out little dataset we get

(0.7067 – 1)^2 = 0.086
(0.7067 – 1)^2 = 0.086
(0.7067 – 1)^2 = 0.086
(0.66 – 0)^2 = 0.436
(0.66 – 1)^2 = 0.116
(0.66 – 1)^2 = 0.116
(0.66 – 0)^2 = 0.436

Add these all up and divide by 7 to get 0.195 – the Brier Score for this set of forecasts.  (Note that because, in calculating Uncertainty, Resolution and Reliability we collapsed forecasts into bins with a single forecast probability, in calculating the Brier score we treat each forecast as having its “bin” probability.)

Like Reliability, lower is better for Brier scores; a perfect score is 0.

Brier Score Composition

It turns out that all these measures are unified by the simple equation

Brier Score = Reliability – Resolution + Uncertainty

or in our numbers

Brier Score = 0.195
Reliability – Resolution + Uncertainty = 0.052 – 0.061 + 0.204 = 0.195

In other words, the Brier score is composed out of Reliability (a measure of Calibration), Resolution, and Uncertainty.

The equation above – which can be found in full formulaic glory on the Wikipedia page and in many other places – is attributed to Alan Murphy in a paper published in 1973 in the Journal of Applied Meteorology.  It is usually called the Brier score decomposition, but here I’ve called it the Brier Score Composition because I’ve approached it in a bottom-up way.

Interpreting the Brier Score

As mentioned at the outset, the Brier score is increasingly common as a measure of forecasting performance.  According to Barbara Meller, a principal reseacher in the Good Judgement Project, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”

Having followed how the Brier score is built up out of other measures of forecasting quality, we should keep in mind two important points.

  1. One of the Brier score components is Uncertainty, which is function solely of the outcomes, not of the forecasts.  Greater Uncertainty will push up Brier scores.  This means that a forecaster trying to forecast in a highly uncertain domain will have higher Brier score than a forecaster of the same skill level tackling a less uncertain domain.  In other words, you can’t directly compare Brier scores unless they are scoring forecasts on the same set of events (or two sets of events with the same Uncertainty).  As a rough rule of thumb, only compare Brier scores if the forecasters were forecasting the same events.
  2. The Brier score is convenient as a single number, but it collapses three other measures.  You can get more insight into a forecaster’s performance if you look not just at “headline number” – the Brier score – but at all four measures.

“I come not to praise forecasters but to bury them.”  With these unsubtle words, Barry Ritholz opens an entertaining piece in the Washington Post, expressing a widely held view about forecasting in difficult domains such as geopolitics or financial markets.  The view is that nobody is any good at it, or if anyone is, they can’t be reliably identified.  This hard-line skepticism has seemed warranted by the persistent failure of active fund managers to statistically outperform dart-throwing monkeys, or the research by Philip Tetlock showing that geopolitical experts do scarcely better than random, and worse than the simplest statistical methods.

More recent research on a range of fronts – notably, by the Good Judgement Project, but also by less well-known groups such as Scicast and ACERA/CEBRA here at Melbourne University – has suggested that a better view is what might be termed “tempered optimism” about expert judgement forecasting. This new attitude acknowledges that forecasting challenges will always fall on a spectrum from the easy to the practically impossible.  However, in some important but difficult domains, hard-line skepticism is too dogmatic.  Rather,

  • There can be forecasting skill;
  • Some people can be remarkably good;
  • Factors conducive to good forecasting have been identified;
  • Forecasting is a skill which can be improved in broadly the same way as other skills;
  • Better forecasts can be obtained by combining forecasts.

A high-level lesson that seems to be emerging is that forecasting depends on culture.  That is, superior forecasting is not a kind of genius possessed (or not) by individuals, but emerges when a group or organisation has the right kinds of common beliefs, practices, and incentives.

The obvious question then is what such a culture is like, and how it can be cultivated.  As part of work for an Australian superannuation fund, I distilled the latest research supporting tempered optimism into seven guidelines for developing a culture of superior forecasting.

  1. Select.  When choosing who to hire – or retain – for your forecasting team, look for individuals with the right kind of mindset.  To a worthwhile extent, mindset can be assessed using objective tests.
  2. Train.  Provide basic training in the fundamentals of forecasting and generic forecasting skills.  A brief training session can improve forecasting performance over a multi-year period.
  3. Track. Carefully document and evaluate predictive accuracy using proper scoring rules.  Provide results to forecasters as helpful feedback.
  4. Team. Group forecasters into small teams who work together, sharing information and debating ideas.
  5. Stream. Put your best forecasters (see Track) into an elite team.
  6. Motivate. Incentives should reward predictive accuracy (see Track) and constructive collaboration.
  7. Combine. Generate group forecasts by appropriately combining individuals’ forecasts, weighting by predictive accuracy (see Track).

The pivotal element here obviously is Track, i.e. measuring predictive accuracy using a proper scoring rule such as the Brier score.  According to Mellers (a key member of the Good Judgement Project) and colleagues, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”  Using proper scoring rules requires forecasters to commit to explicit probabilistic predictions, a practice that is common in fields such as weather forecasting where predictions are rigorously assessed, but very rare in domains such as geopolitics and investment.  This relative lack of rigorous assessment is a key enabler – and concealer – of ongoing poor performance.

In current work, we are developing training in generic forecasting skills, and practical methods for using scoring rules to evaluate predictive accuracy in domains such as funds management. Contact me if this may be of interest in your organisation.

Join my email list to be periodically notified of interesting posts to this site.

I forwarded to Paul Monk a link to this video:

He replied, within minutes:

Truly awesome.

It prompts the thought that the biggest revolutions in worldview have been scientific and have entailed:

1. Moving from the Earth centred (Aristotelian/Biblical) cosmology (which had its counterparts in many tribal myths and the cosmogonies of many other civilizations; though the classical Atomists began to guess at the truth and this was picked up again by Giordano Bruno in the late 16th century, only to get him burned alive in Rome by the Inquisition) to first a heliocentric one, then a Milky Way one, then a Hubble 3D one, as it were, and finally to a multiverse one;

2. Discovering that we are evolved creatures and have a direct biological ancestry going back 3.8 billion years, but on a world that, in much less than that time into the future (regardless of what we do) will become uninhabitable, as the Sun swells to become a red giant and destroys the Goldilocks Zone which makes life on Earth possible;

3. Realizing that we live in and are imbricated in a world of microbes that used to dominate the planet, exist in a highly complex symbiosis with larger life forms, including predation upon them and have played a substantial role in the mass extinctions.

4. Slowly getting to understanding human history from a global and cosmopolitan perspective instead of from narrowly local ones; and

5. Developing the elements of a universal cognitive humanism with the exploration of languages and linguistics, comparative mythology (Levi-Strauss and structuralism) and anthropology (including Durkheim’s Elementary Forms of the Religious Life, about a century ago).

My own worldview, if you like, is that all these things transcend (trump) the epistemological claims of the old religions and mythologies, as well as those even of 19th century political ideologies (to say nothing of crude 20th century ones such as Nazism and Marxism-Leninism). BUT the vast majority of human beings on the planet know almost nothing of all this and certainly have not been able to weave it together into a coherent new, shared, universal worldview for the 21st century.

Just a few thoughts on the run, or rather while viewing Andromeda.

In our consulting work we have periodically been asked to review how judgments or decisions of a particular kind are made within an organisation, and to recommend improvements.  This has taken us to some interesting places, such as the rapid lead assessment center of a national intelligence agency, and recently, meetings of coaches of an elite professional sports team.

On other occasions, we have been asked to assist a group to design and build, more or less from scratch, a process for making a particular decision or set of decisions (e.g., decisions as to what a group should consider itself to collectively believe).

Both types of activity involve thinking hard about what the current/default process is or would be, and what kind of process might work more effectively in a given real-world context, in the light of what academics in fields such as cognitive science and organisational theory have learned over the years.

This sounds a bit like engineering.  My favorite definition of the engineer is somebody who can’t help but think that there must be a better way to do this.  A more comprehensive and workmanlike definition is given by Wikipedia:

Engineering is the application of scientific, economic, social, and practical knowledge in order to invent, design, build, maintain, research, and improve structures, machines, devices, systems, materials and processes.

The activities mentioned above seem to fit this very broad concept: we were engaged to help improve or develop systems – in our case, systems for making decisions.

It is therefore tempting to describe some of what we do as decision engineering.  However this term has been in circulation for some decades now, shown in this Google n-gram:


and its current meaning or meanings might not be such a good fit with our activities.  So, I set about exploring what the term means “out there”.

As usual in such cases, there doesn’t appear to be any one official, authoritative definition.  Threads appearing in various characterizations include:

While each such thread clearly highlights something important, my view is that individually they are only part of the story, and collectively are a bit of a dog’s breakfast.  What we need, I think, is a more succinct, more abstract, and more unifying definition.  Here’s an attempt, based on Wikipedia’s definition of engineering:

Decision engineering is applying relevant knowledge to design, build, maintain, and improve systems for making decisions.

Relevant knowledge can include knowledge of at least three kinds:

  • Theoretical knowledge from any relevant field of inquiry;
  • Practical knowledge (know-how, or tacit knowledge) of the decision engineer;
  • “Local” knowledge of the particular context and challenges of decision making, contributed by people already in or familiar with the context, such as the decision makers themselves.

System is of course a very broad term, and for current purposes a system for making decisions, or decision system, is any complex part of the world causally responsible for decisions of a certain category.  Such systems may or may not include humans.  For example, decisions in a Google driverless car would be made by a complex combination of sensors, on-board computing processors, and perhaps elements outside the car such as remote servers.

However the decision processes we have worked on, which might loosely be called organisational decision processes, always involve human judgement at crucial points.  The systems responsible for such decisions include

  • People playing various roles
  • “Norms,” including procedures, guidelines, methods, standards.
  • Supporting technologies ranging from pen and paper through sophisticated computers
  • Various aspects of the environment or context of decision making.

For example, a complex organisational decision system produces the monthly interest rate decisions of the Reserve Bank of Australia, as hinted at in this paragraph from their website:

The formulation of monetary policy is the primary responsibility of the Reserve Bank Board. The Board usually meets eleven times each year, on the first Tuesday of the month except in January. Hence, the dates of meetings are well known in advance. For each meeting, the Bank’s staff prepare a detailed account of developments in the Australian and international economies, and in domestic and international financial markets. The papers contain a recommendation for the policy decision. Senior staff attend the meeting and give presentations. Monetary policy decisions by the Reserve Bank Board are communicated publicly shortly after the conclusion of the meeting.

and described in much more detail in this (surprisingly interesting) 2001 speech by the man who is now Governor of the Reserve Bank.

In most cases, decision engineering means taking an existing system and considering to how improve it.  A system can be better in various ways, including:

  • First and foremost, improving the decision hit rate, i.e. the proportion of decisions which are correct in the sense of choosing an optimal or at least satisfactory path of action;
  • More efficient in the sense of using less resources or producing decisions more quickly
  • More transparent or defensible.

Now, in order to improve a particular decision system, a decision engineer might use approaches such as:

  • Bringing standard engineering principles and techniques to bear on making decisions
  • Using more structured decision methods, including the application of decision analysis techniques
  • Basing decisions on “big data” and “data science,” such as predictive analytics

(i.e., the “threads” listed above).  However the usefulness of these approaches will depend very much on the nature of the decision challenges being addressed.  For example, if you want to improve how elite football coaches make decisions in the coaching box on game day, you almost certainly will not introduce highly structured decision methods such as decision trees.

In short, I like this more general definition of decision engineering (in four words or less, building better decision systems) because it seems to get at the essence of what decision engineers do, allowing but not requiring that highly technical, quantitative approaches might be used.  And it accommodates my instinct that much of what we do in our consulting work should indeed count as as a kind of engineering.

Whether we would be wise to publicly describe ourselves as decision engineers is however quite another question – one for marketers, not engineers.

About a month ago The Age published an opinion piece I wrote under the title “Do you hold a Bayesian or Boolean worldview?“.  I had submitted it under the title “Madmen in Authority,” and it opened by discussing two men in authority who are/were each mad in their own way – Maurice Newman, influential Australian businessman and climate denier, and Cuban dictator Fidel Castro.  Both men had professed to be totally certain about issues on which any reasonable person ought to have had serious doubts given the very substantial counter-evidence.

Their dogmatic attitudes seemed to exemplify a kind of crude epistemological viewpoint I call “Booleanism,” in contrast with a more sophisticated “Bayesianism”. Here is the philosophical core of the short opinion piece:

On economic matters, Keynes said: “Practical men who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist. Madmen in authority, who hear voices in the air, are distilling their frenzy from some academic scribbler of a few years back.”

Similarly, on matters of truth and evidence, we are usually unwittingly beholden to our background epistemology (theory of knowledge), partially shaped by unknown theorists from centuries past.

One such  theory of knowledge we can call Boolean, after the 19th century English logician George Boole.  He was responsible for what is now known as Boolean algebra, the binary logic which underpins the computing revolution.

In the Boolean worldview, the world is organised into basic situations such as Sydney being north of Melbourne. Such situations are facts. Truth is correspondence to facts. That is, if a belief matches a fact, it is objectively true; if not, it is objectively false. If you and I disagree, one of us must be right, the other wrong; and if I know I’m right, then I know you’re wrong. Totally wrong.

This worldview underpins Castro’s extreme confidence.  Either JFK was killed by an anti-Castro/CIA conspiracy or he wasn’t; and if he was, then Castro is 100 per cent right. Who needs doubt?

An alternative  theory of knowledge has roots in the work of another important English figure, the Reverend Thomas Bayes. He is famous for Bayes’ Theorem, a basic law of probability governing how to modify one’s beliefs when new evidence arrives.

In the Bayesian worldview, beliefs are not simply true or false, but more or less probable. That is, we can be more or less confident that they are true, given how they relate to our other beliefs and how confident we are in them. If you and I disagree about the cause of climate change, it is not a matter of me being wholly right and you being wholly wrong, but about the differing levels of confidence we have in a range of hypotheses.

Scientists are generally Bayesians, if not self-consciously, at least in their pronouncements. For example, the IPCC refrains from claiming certainty that climate change is human-caused; it says instead that it has 95 per cent confidence that human activities are a major cause.

On Thursday 9th October I’m doing a presentation at a conference of The Tax Institute, the Australian professional association for tax specialists, introducing decision analysis techniques.  The presentation will illustrate (with live demonstration) the following applications:

  • Using quantitative risk analysis (Monte Carlo simulation) to help a client gain better insight into the probable or possible outcomes of a certain tax strategy;
  • Using decision trees to help a client decide whether to purse a dispute with the Tax Office through the courts.

The conference paper is available here.

An excerpt:

Decision analysis techniques are well-developed and used, more or less widely, in various other professions such as engineering and finance. However, they are rarely used by tax specialists, or by lawyers and accountants more broadly.

Why? One perspective is that decision analysis is fundamentally ill-suited to the kinds of reasoning and decision making involved in tax matters, which are thought to involve unquantifiable issues and nuances requiring intuitive nous of the kind only highly trained and experienced legal or accountants can provide.

An alternative perspective is that tax matters would almost always benefit from decision analysis, and that tax specialists fail to use it only because they are trapped behind boundaries imposed by their professional traditions, their training, and their intellectual inertia. A strong version of this view is that tax specialists are derelict in failing to provide their clients with an easily-obtainable level of clarity and rigour.

In the spirit of John Stuart Mill, this paper takes a middle position. It suggests that decision analysis is potentially useful for certain types of problems regularly handled by tax specialists, while not being appropriate for many others. Decision analysis may represent an important opportunity for tax specialists to provide greater value to sophisticated clients.

1.2.1 Three Thinking Modes

At a high level, the relation of decision analysis to the kinds of intellectual labour generally undertaken by tax specialists is summarized in this diagram.


To indulge in some useful caricatures, qualitative thinking is the domain of the lawyer. It uses no numbers at all, or at most simple arithmetic. The central concept is the argument. Making the most important decisions is always a matter of “weighing up” arguments expressed in the legally-inflected natural language

Quantitative deterministic thinking is the speciality of the accountant. It is epitomised in the structures and calculations in an ordinary spreadsheet, in which specified inputs are “crunched” into equally specific outputs. The central concept is calculation; uncertainties are replaced by “assumptions”. Decisions generally boil down to comparing the magnitudes of numerical outputs, in the penumbral light cast by the background knowledge, intuitions and biases of the decision maker.

The third mode of thinking, probabilistic, is of course the decision analyst’s territory. The central concepts is uncertainty, and the essential gambit is framing and manipulating probabilistic representations of uncertainty.

In this context, the “master” tax specialist has facility, or even advanced expertise, in all three modes of thinking.

There’s a familiar idea from the world of sport – that winning requires an elite team and not just a team of elite players.

Does something similar apply in the world of decision making?

In many situations, critical decisions are made by small groups.  The members of these groups are often “elite” in their own right.  For example, in Australia monthly interest rate decisions are made by the board of the Reserve Bank of Australia.  This is clearly a “team” of elite decision makers.

However it is not clear that they are an elite team of decision makers.   For current purposes, I define an elite decision team as a small decision group conforming to all or at least most of the following principles:

  1. The team operates according to rigorously thought-through decision making practices. Wherever possible these practices should be strongly evidence-based.
  2. The team has been trained to operate as a team using these practices. Members have well-defined and well-understood roles.
  3. Members have been rigorously trained as decision makers (and not just as, say, economists).
  4. The team, and members individually, are rigorously evaluated for their decision making performance.
  5. There is a program of continuous improvement.

Note also that the team should be a decision making team, i.e. one that makes decisions (commitments to courses of action) rather than judgements of some other kind such as predictions.

There are many types of teams which do operate according to analogs of these principles – for example elite sporting teams, as mentioned, and small military teams such as bomb disposal squads.  These teams’ operations involve decision making, but they are not primarily decision making teams.

I doubt the Board of the RBA is an elite decision team in this sense, but would be relieved to find out I was wrong.

More generally, I am currently looking for good examples of elite decision teams.  Any suggestions are most welcome.

Alternatively, if you think this idea of an elite decision team is somehow misconceived, that would be interesting too.


Get every new post delivered to your Inbox.

Join 514 other followers