A colleague Todd Sears recently wrote:

I thought I’d write to let you know that I used an argument map last night to inform a public conversation about whether to change our school budget voting system from what it is (one meeting and you have to be physically present to vote), to the (of all things!) Australian Ballot system (secret ballot, polls open all day, and absentee ballots available).

So, I went through the articles, editorials, and opinion pieces I could find on the matter and collapsed those into a pretty simple argument, which it is. Simple reasoning boxes get the job done.  Our voters had never really seen this kind of visualization.  It’s nice to be able to see an argument exist in space, and to demonstrate by pointing and framing that a “yea” vote needs to buy into the green points, but also that they need to reconcile the red points, somehow. It had very good response.

Ultimately, the AB motion was defeated by five votes.  Still, it was a good example of a calm, reasonable, and civil dialogue.  A nice change from the typical vitriol and partisan sniping.

Here is his map (click to view full size version):


When I suggested that readers of this blog might find his account interesting or useful, he added:

Let me clarify what I did because it wasn’t a classic facilitation.

1. I reviewed all of the on-line Vermont-centric AB content I could find in the more reputable news sources, and put a specific emphasis on getting the viewpoints of the more vociferous anti-AB folks in my town so that I could fairly represent them.

2. I created a map from that information and structured it in a way that spread out the lines of reasoning in an easily understandable way. I could have done some further abstraction and restructured things, or made assumptions explicit using the “Advanced” mode, but chose to focus on easily recognized reasoning chains.

3. I sent the map out to the entire school board, the administrators, a couple of politicians, the anti-AB folks and some of the other more politically engaged people in town.

4. The session was moderated by the town Moderator, who set out Robert’s Rules of Order. Then discussion began. In fact, the first ant-AB speaker had my map in his hand and acknowledged the balance and strength of both sides of the argument.

5. I let the session run its course, and then explained what I did and how I did it, and then reviewed the Green and Red lines of the debate, explaining that a vote for or against means that the due diligence has to be done in addressing the points counter to your own position, and I demonstrated how this should be done. Though I was in favor of AB, I maintained objectivity and balance, rather than a position of advocacy one way or another.

Overall the session was very civil, informed, and not one point was made (myriad rhetorical flourishes aside) that was not already on the map. Many variations on similar themes, but nothing that hadn’t been captured.

And followed up with:

BTW, just 30 minutes ago I received an e-mail which said this:


I love the map of the issues around Australian Ballot that you sent out. Is there an easy way to make such a map? We are tackling some issues that have our faculty at Randolph Union High School pretty evenly split and I think two such maps would be a powerful way for my colleague and I who are leading this change to communicate. It looks as if it was created in PowerPoint. If you are too busy to elaborate that’s fine too.

Thanks for your leadership on the Australian Ballot issue. I appreciate it.

“As soon as you start thinking probabilities, all kinds of things change. You’ll prepare for risks you disregarded before. You’ll listen to people you disagreed with before. You won’t be surprised when a recession or a bear market that no one predicted occurs. All of this makes you better at handling and navigating the future — which is the point of forecasting in the first place.”

From a good piece in the Motley Fool called Maybe Yes, Maybe No.  Core idea is pretty much the same as in my newspaper piece Do you hold a Bayesian or a Boolean worldview?

See also rba.tips – a site designed in part to help people better understand probabilistic thinking.

Are Treasury forecasts credible?

In a recent opinion piece economics columnist Ross Gittins defended Treasury on the grounds that:

  • It assesses its own performance and publishes the results
  • It bases its forecasts on reasonable assumptions and sophisticated modeling
  • Its critics don’t do either of these things

But still, are they credible?  What does “credible” mean?

As evidence of Treasury’s credibility Gittins points to a new section of the recent Budget Papers, Statement 7: Forecasting Performance and Scenario Analysis, in which Treasury describes its own performance.

Here is the first data displayed in that Statement:


It shows Treasury GDP growth forecasts (dots) and actual GDP growth (columns). Eyeballing the chart suggests that Treasury are usually out by half a percent or more, and sometimes much more.  But then, economic forecasting is notoriously difficult.  Is this performance good or bad or OK?  How do we tell?

One way is to compare against simple benchmarks.  For example, what if I were to “compete” with Treasury by forecasting that growth one year will be the same as growth in the previous year?  Clearly some years I’d do well, and some years I’d be way out.  Would I be on average more or less accurate than Treasury? You can’t tell just by looking, but it can be calculated easily enough.

I took the data from the chart and calculated the mean squared error for Treasury forecasts vs actual growth, and for two alternative “naive” strategies vs actual growth – the one described in the previous paragraph, and another where growth next year is forecast to be equal to the average of growth in all the previous years.  Here are the results:

Treasury: 1.12
Naive 1: 1.22
Naive 2: 0.93

Apparently, Treasury is doing about the same as one dumb strategy, same as last year and worse than another, average of prior years.  (Note that for mean squared error measures, low is good.)

In other words, on the face of it, despite all its the effort, intelligence, data and modeling, Treasury forecasts GDP growth worse than a simple extrapolation well within the ability of most high school students.

If that’s right, I’d say Treasury forecasts are not credible.

To be sure, this is very rough and ready.  The analysis could be made more sophisticated in all sorts of ways.  The key question however is whether Treasury is managing to outperform simple benchmarks.  If they aren’t demonstrably doing so, why shouldn’t we ignore what they say and just go with the benchmarks?

Interestingly, Statement 7 of the Budget Papers makes no comparison with simple benchmarks.  They tell us how well they did, and provide a long list of reasons why they think doing as well as they did is pretty hard. They don’t tell us how well they would have done if they had used some other less expensive strategy.

The other thing they don’t tell us is how their forecasts compare with how good it is possible to be.  The implicit claim is that their forecasts are as good as anyone could do, but this is far from obvious.

I’m raising these points not to condemn Treasury forecasts but to throw out some challenges.

First, Treasury should compare how it is performing against simple benchmarks.  Indeed, I’d be surprised if they weren’t doing this already in some cubicle somewhere. They should make the results easily accessible to the public.

Second, Treasury and its critics should enter into independently-run public forecasting competitions.  These competitions should be open to anyone who’d like to try their hand at it, using whatever methods or data they like. Such competitions would be the best way to establish the level of credibility Treasury forecasts really have.

The new site www.rba.tips – a “tipping competition” for RBA interest rate decisions – is an example of the kind of approach that could be used.

Such steps might help make Treasury, in Gittins’ phrase, “the only honest players in this game.”

When is a forecaster performing well?  An increasingly common way to measure this is to use a scoring rule known as the Brier score.

The essential idea behind the Brier score is simple enough: it is the average gap (mean squared difference) between forecast probabilities and actual outcomes. This post tries to explain and motivate the Brier score by “composing” it from some other simple ideas about forecasting quality, unlike many presentations which start with the Brier score and then show how it can be decomposed. There is nothing surprising here for anyone well-versed in these topics, but others who (like me) are just beginning to explore these ideas might find the post helpful.

I’ll use a very small real-world dataset, a set of predictions about what the Reserve Bank of Australia will decide about interest rates at its monthly meetings.  The RBA generally leaves interest rates unchanged, but sometimes raises them and sometimes lowers them, depending on economic conditions.  The dataset consists of predictions implicit in the assessments of the ANU RBA Shadow Board, as found on the website RBA.Tips.  To keep things simple, the dataset reduces the predictions to binary outcomes – Change or No Change – and provides a numerical estimate of the probability of No Change.


The “Coded Outcome” column just translates the RBA’s decision into numbers – 1 for No Change, and 0 for Change.  This makes it possible to do the calculations described below.


One obvious thing about this dataset is that more often than not, there is No Change.  In this small sample, the RBA made no change 5/7 times or 71.4% of the time, which as it happens is quite close to the long term (1990-2015) average or overall base rate of 75%.  In other words, there isn’t a lot of uncertainty in the outcomes being predicted. Conversely, uncertainty would also be low if the RBA almost always changed the interest rate.  A simple way to put a single number on the uncertainty of either of these flavors is to take the base rate and multiply it by 1 minus itself, i.e.

Uncertainty = base rate * (1 – base rate).

For this dataset, Uncertainty is 0.714 – (1 – 0.714) = 0.204

This relative lack of uncertainty means that an attractive forecasting strategy would be to simply go with the base rate, i.e. always predicting that the RBA will do whatever it does most often.  How well would such a forecaster do? A simple way to measure this is in terms of hits and misses.  For the period above, the base-rate strategy would yield 5 hits and two misses out of a total of seven predictions, i.e. a hit/miss ratio of 5/7 = 71%.   Over a long period, this ratio should converge on the base rate – as long, that is, as variations in economic conditions, and RBA decision making, tend in future to be similar to what they were in the past when the base rate was being determined.

The base rate strategy has some advantages (assuming you can access the base rate information).  First, it is better than the most naive approach, which would be to pick randomly, or assign equal probabilities to the possible outcomes.  Second, it is easy; you don’t have to know much or think hard about economic conditions and the interplay between those and RBA decision making. The downside is that over the long term you can’t do better than the base rate, and you can’t do better than anyone else who is also using the base strategy strategy.  If you’re ambitious or competitive or just take pride in good work,  you’ll need to make predictions which are more sensitive to the underlying probabilities of the outcomes – i.e. more likely to predict No Change when no change is more likely, and vice versa for change.

This can be seen in our simple dataset.  Crude inspection suggests the predictions fall into two groups or “bins”.  From Oct-14 to Dec-14 the probabilities assigned to No Change were all 70% or above, and over this period, interest rates in fact never changed.  From Feb-15 to May-15, the probabilities were lower, in the 60-70% range, and twice there was in fact a change.   It seems that whoever made these predictions believed that the economic conditions made a change more likely in 2015 than it was in late 2014, and they correctly adjusted their predictions accordingly.  Note that they had two misses in 2015, suggesting that their probabilities had not been reduced sufficiently.  But intuitively the “miss” predictions were not quite as off-the-mark as they would have been if the probabilities had been at the higher 2104 level – an idea captured by the Brier score.


So in general a good forecaster will not make the same forecast regardless of circumstances but rather will have “horses for courses,” i.e. different forecasts when the actual probabilities of various outcomes are different.  Can we measure the extent to which a forecaster is doing this?  One way to do it is:

  • Put the forecasts into groups or bins with the same forecast probability
  • For each bin, measure how different the outcomes of predictions in the bin – the “bin base rate” are to the overall base rate.
  • Add up these differences

Lets see how this goes with out dataset.  Suppose we have two bins, the 70s (2014) bin and the 60s (2015) bin.  For forecasts in the 70s bin, the outcomes were all No Change, so the bin base rate is 1.  For the 60s bin, the bin base rate is 2/4 = 0.5.  So we get:

70s bin: 1 (the bin base rate) – 0.714 (the overall base rate) = 0.286
60s bin: 0.5 – 0.714 = -0.214

Before we just add up these differences, we need to square them to make sure they’re both positive, and then “weight” them by the number of forecasts in each bin:

0.286^2 * 3 = 0.245
-0.214^2 * 4 = 0.183

Then we add them and divide by the total number of forecasts (7), to get 0.061.

This number is known as the Resolution of the forecast set.  The higher the Resolution the better; a forecaster with higher Resolution is making forecasts which are more different to the overall base rate than a forecaster with a lower score, and in that sense more interesting or bold.


In order to define resolution we had to sort forecasts into probability-of-outcome bins.  A natural question to ask is how well these bins correspond to the rate at which outcomes actually occur.  Consider for example the 70s bin.  Forecasts in that bin predict No Change with a probability, on average, of 70.67%.  Does the RBA choose No Change 70.67% of the time in those months? No; it decided No Change 100% of the time.  So there’s a mismatch between forecast probabilities and outcome rates.  Since the latter is higher, we call the forecasts underconfident; the probabilities should have been higher.

Similarly forecasts in the 60s bin predicted No Change with probability (on average) 66%, but the RBA in fact made no change only half the time.  Since .66 is larger than 0.5, we call this overconfidence.

Calibration is the term used to describe the alignment between forecast probabilities and outcome rates.  Calibration is usually illustrated with a chart like this:


The orange line represents a hypothetical forecaster with perfect calibration, i.e. where the observed rate for every bin is exactly the same as the forecast probability defining that bin; the orange dots represent hypothetical bins with probabilities 0, 0.1, 0.2, etc..  The two bins from our dataset are shown as blue dots.  The 70s bin is out to the left of the line, indicating underconfidence; vice versa for the 60s bin.


So it seems our forecaster is not particularly well calibrated  (though be aware that we are dealing with a tiny dataset where luck of the draw can have undue effects). Can we quantify the level of calibration shown by the forecaster in a particular set of forecasts? Yes, using an approach very similar to the calculation in the previous section.  There we took the mean (average) squared difference between bin base rates and overall base rates.  To quantify calibration, we take the mean squared difference between bin probability and bin base rate.  If that sounds cryptic, lets walk through the numbers.

For the 70s bin, the average forecast probability was 70.67%, and the bin base rate was 1, so the squared difference is

(.7067 – 1)^2 = 0.086

Similarly for the 60s bin:

(0.66 – 0.5)^2 = .026

Multiple each of these by the number of forecasts in the bin:

0.086 * 3 = 0.258
.026 * 4 = 0.102

Add these together and divide by the total number of forecasts, to get 0.052.  This, as you guessed, is called the Reliability of the forecast set.  Note however that Reliability is good when the mean squared difference is minimized, so the lower reliability score, the better, unlike Resolution where higher is better.


Lets briefly take stock. Our guiding question has been: how good are the forecasts in our little dataset? So far, to get a handle on this we’ve loosely defined four quantities

  1. Uncertainty in the outcomes.  Uncertainty indicates the degree to which outcomes are predictable.
  2. Resolution of the forecast set.  This is the degree to which the forecasts fall into subsets with outcome rates different from the overall outcome base rate, calculated as mean squared difference.
  3. Calibration – the correspondence, on a bin-by-bin basis, between the forecast probabilities and the outcome rates;
  4. Reliability – an overall measure of calibration, calculated as the mean squared difference between forecast bin probabilities and outcome rates – or in other words, mean squared calibration.

Brier Score

But wouldn’t it be good if we could somehow capture all this in a single, goodness-of-forecasts number?  That’s what the Brier score does.  The Brier score is yet another mean squared difference measure, but this time it compares forecast probabilities with outcomes on a forecast-by-forecast basis.  In other words, for each forecast, subtract the outcome (coded as 1 or 0) from the forecast probability and square the result; add up all the results and divide by the total number of forecasts.  For out little dataset we get

(0.7067 – 1)^2 = 0.086
(0.7067 – 1)^2 = 0.086
(0.7067 – 1)^2 = 0.086
(0.66 – 0)^2 = 0.436
(0.66 – 1)^2 = 0.116
(0.66 – 1)^2 = 0.116
(0.66 – 0)^2 = 0.436

Add these all up and divide by 7 to get 0.195 – the Brier Score for this set of forecasts.  (Note that because, in calculating Uncertainty, Resolution and Reliability we collapsed forecasts into bins with a single forecast probability, in calculating the Brier score we treat each forecast as having its “bin” probability.)

Like Reliability, lower is better for Brier scores; a perfect score is 0.

Brier Score Composition

It turns out that all these measures are unified by the simple equation

Brier Score = Reliability – Resolution + Uncertainty

or in our numbers

Brier Score = 0.195
Reliability – Resolution + Uncertainty = 0.052 – 0.061 + 0.204 = 0.195

In other words, the Brier score is composed out of Reliability (a measure of Calibration), Resolution, and Uncertainty.

The equation above – which can be found in full formulaic glory on the Wikipedia page and in many other places – is attributed to Alan Murphy in a paper published in 1973 in the Journal of Applied Meteorology.  It is usually called the Brier score decomposition, but here I’ve called it the Brier Score Composition because I’ve approached it in a bottom-up way.

Interpreting the Brier Score

As mentioned at the outset, the Brier score is increasingly common as a measure of forecasting performance.  According to Barbara Meller, a principal reseacher in the Good Judgement Project, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”

Having followed how the Brier score is built up out of other measures of forecasting quality, we should keep in mind two important points.

  1. One of the Brier score components is Uncertainty, which is function solely of the outcomes, not of the forecasts.  Greater Uncertainty will push up Brier scores.  This means that a forecaster trying to forecast in a highly uncertain domain will have higher Brier score than a forecaster of the same skill level tackling a less uncertain domain.  In other words, you can’t directly compare Brier scores unless they are scoring forecasts on the same set of events (or two sets of events with the same Uncertainty).  As a rough rule of thumb, only compare Brier scores if the forecasters were forecasting the same events.
  2. The Brier score is convenient as a single number, but it collapses three other measures.  You can get more insight into a forecaster’s performance if you look not just at “headline number” – the Brier score – but at all four measures.

“I come not to praise forecasters but to bury them.”  With these unsubtle words, Barry Ritholz opens an entertaining piece in the Washington Post, expressing a widely held view about forecasting in difficult domains such as geopolitics or financial markets.  The view is that nobody is any good at it, or if anyone is, they can’t be reliably identified.  This hard-line skepticism has seemed warranted by the persistent failure of active fund managers to statistically outperform dart-throwing monkeys, or the research by Philip Tetlock showing that geopolitical experts do scarcely better than random, and worse than the simplest statistical methods.

More recent research on a range of fronts – notably, by the Good Judgement Project, but also by less well-known groups such as Scicast and ACERA/CEBRA here at Melbourne University – has suggested that a better view is what might be termed “tempered optimism” about expert judgement forecasting. This new attitude acknowledges that forecasting challenges will always fall on a spectrum from the easy to the practically impossible.  However, in some important but difficult domains, hard-line skepticism is too dogmatic.  Rather,

  • There can be forecasting skill;
  • Some people can be remarkably good;
  • Factors conducive to good forecasting have been identified;
  • Forecasting is a skill which can be improved in broadly the same way as other skills;
  • Better forecasts can be obtained by combining forecasts.

A high-level lesson that seems to be emerging is that forecasting depends on culture.  That is, superior forecasting is not a kind of genius possessed (or not) by individuals, but emerges when a group or organisation has the right kinds of common beliefs, practices, and incentives.

The obvious question then is what such a culture is like, and how it can be cultivated.  As part of work for an Australian superannuation fund, I distilled the latest research supporting tempered optimism into seven guidelines for developing a culture of superior forecasting.

  1. Select.  When choosing who to hire – or retain – for your forecasting team, look for individuals with the right kind of mindset.  To a worthwhile extent, mindset can be assessed using objective tests.
  2. Train.  Provide basic training in the fundamentals of forecasting and generic forecasting skills.  A brief training session can improve forecasting performance over a multi-year period.
  3. Track. Carefully document and evaluate predictive accuracy using proper scoring rules.  Provide results to forecasters as helpful feedback.
  4. Team. Group forecasters into small teams who work together, sharing information and debating ideas.
  5. Stream. Put your best forecasters (see Track) into an elite team.
  6. Motivate. Incentives should reward predictive accuracy (see Track) and constructive collaboration.
  7. Combine. Generate group forecasts by appropriately combining individuals’ forecasts, weighting by predictive accuracy (see Track).

The pivotal element here obviously is Track, i.e. measuring predictive accuracy using a proper scoring rule such as the Brier score.  According to Mellers (a key member of the Good Judgement Project) and colleagues, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”  Using proper scoring rules requires forecasters to commit to explicit probabilistic predictions, a practice that is common in fields such as weather forecasting where predictions are rigorously assessed, but very rare in domains such as geopolitics and investment.  This relative lack of rigorous assessment is a key enabler – and concealer – of ongoing poor performance.

In current work, we are developing training in generic forecasting skills, and practical methods for using scoring rules to evaluate predictive accuracy in domains such as funds management. Contact me if this may be of interest in your organisation.

Join my email list to be periodically notified of interesting posts to this site.

I forwarded to Paul Monk a link to this video:

He replied, within minutes:

Truly awesome.

It prompts the thought that the biggest revolutions in worldview have been scientific and have entailed:

1. Moving from the Earth centred (Aristotelian/Biblical) cosmology (which had its counterparts in many tribal myths and the cosmogonies of many other civilizations; though the classical Atomists began to guess at the truth and this was picked up again by Giordano Bruno in the late 16th century, only to get him burned alive in Rome by the Inquisition) to first a heliocentric one, then a Milky Way one, then a Hubble 3D one, as it were, and finally to a multiverse one;

2. Discovering that we are evolved creatures and have a direct biological ancestry going back 3.8 billion years, but on a world that, in much less than that time into the future (regardless of what we do) will become uninhabitable, as the Sun swells to become a red giant and destroys the Goldilocks Zone which makes life on Earth possible;

3. Realizing that we live in and are imbricated in a world of microbes that used to dominate the planet, exist in a highly complex symbiosis with larger life forms, including predation upon them and have played a substantial role in the mass extinctions.

4. Slowly getting to understanding human history from a global and cosmopolitan perspective instead of from narrowly local ones; and

5. Developing the elements of a universal cognitive humanism with the exploration of languages and linguistics, comparative mythology (Levi-Strauss and structuralism) and anthropology (including Durkheim’s Elementary Forms of the Religious Life, about a century ago).

My own worldview, if you like, is that all these things transcend (trump) the epistemological claims of the old religions and mythologies, as well as those even of 19th century political ideologies (to say nothing of crude 20th century ones such as Nazism and Marxism-Leninism). BUT the vast majority of human beings on the planet know almost nothing of all this and certainly have not been able to weave it together into a coherent new, shared, universal worldview for the 21st century.

Just a few thoughts on the run, or rather while viewing Andromeda.

In our consulting work we have periodically been asked to review how judgments or decisions of a particular kind are made within an organisation, and to recommend improvements.  This has taken us to some interesting places, such as the rapid lead assessment center of a national intelligence agency, and recently, meetings of coaches of an elite professional sports team.

On other occasions, we have been asked to assist a group to design and build, more or less from scratch, a process for making a particular decision or set of decisions (e.g., decisions as to what a group should consider itself to collectively believe).

Both types of activity involve thinking hard about what the current/default process is or would be, and what kind of process might work more effectively in a given real-world context, in the light of what academics in fields such as cognitive science and organisational theory have learned over the years.

This sounds a bit like engineering.  My favorite definition of the engineer is somebody who can’t help but think that there must be a better way to do this.  A more comprehensive and workmanlike definition is given by Wikipedia:

Engineering is the application of scientific, economic, social, and practical knowledge in order to invent, design, build, maintain, research, and improve structures, machines, devices, systems, materials and processes.

The activities mentioned above seem to fit this very broad concept: we were engaged to help improve or develop systems – in our case, systems for making decisions.

It is therefore tempting to describe some of what we do as decision engineering.  However this term has been in circulation for some decades now, shown in this Google n-gram:


and its current meaning or meanings might not be such a good fit with our activities.  So, I set about exploring what the term means “out there”.

As usual in such cases, there doesn’t appear to be any one official, authoritative definition.  Threads appearing in various characterizations include:

While each such thread clearly highlights something important, my view is that individually they are only part of the story, and collectively are a bit of a dog’s breakfast.  What we need, I think, is a more succinct, more abstract, and more unifying definition.  Here’s an attempt, based on Wikipedia’s definition of engineering:

Decision engineering is applying relevant knowledge to design, build, maintain, and improve systems for making decisions.

Relevant knowledge can include knowledge of at least three kinds:

  • Theoretical knowledge from any relevant field of inquiry;
  • Practical knowledge (know-how, or tacit knowledge) of the decision engineer;
  • “Local” knowledge of the particular context and challenges of decision making, contributed by people already in or familiar with the context, such as the decision makers themselves.

System is of course a very broad term, and for current purposes a system for making decisions, or decision system, is any complex part of the world causally responsible for decisions of a certain category.  Such systems may or may not include humans.  For example, decisions in a Google driverless car would be made by a complex combination of sensors, on-board computing processors, and perhaps elements outside the car such as remote servers.

However the decision processes we have worked on, which might loosely be called organisational decision processes, always involve human judgement at crucial points.  The systems responsible for such decisions include

  • People playing various roles
  • “Norms,” including procedures, guidelines, methods, standards.
  • Supporting technologies ranging from pen and paper through sophisticated computers
  • Various aspects of the environment or context of decision making.

For example, a complex organisational decision system produces the monthly interest rate decisions of the Reserve Bank of Australia, as hinted at in this paragraph from their website:

The formulation of monetary policy is the primary responsibility of the Reserve Bank Board. The Board usually meets eleven times each year, on the first Tuesday of the month except in January. Hence, the dates of meetings are well known in advance. For each meeting, the Bank’s staff prepare a detailed account of developments in the Australian and international economies, and in domestic and international financial markets. The papers contain a recommendation for the policy decision. Senior staff attend the meeting and give presentations. Monetary policy decisions by the Reserve Bank Board are communicated publicly shortly after the conclusion of the meeting.

and described in much more detail in this (surprisingly interesting) 2001 speech by the man who is now Governor of the Reserve Bank.

In most cases, decision engineering means taking an existing system and considering to how improve it.  A system can be better in various ways, including:

  • First and foremost, improving the decision hit rate, i.e. the proportion of decisions which are correct in the sense of choosing an optimal or at least satisfactory path of action;
  • More efficient in the sense of using less resources or producing decisions more quickly
  • More transparent or defensible.

Now, in order to improve a particular decision system, a decision engineer might use approaches such as:

  • Bringing standard engineering principles and techniques to bear on making decisions
  • Using more structured decision methods, including the application of decision analysis techniques
  • Basing decisions on “big data” and “data science,” such as predictive analytics

(i.e., the “threads” listed above).  However the usefulness of these approaches will depend very much on the nature of the decision challenges being addressed.  For example, if you want to improve how elite football coaches make decisions in the coaching box on game day, you almost certainly will not introduce highly structured decision methods such as decision trees.

In short, I like this more general definition of decision engineering (in four words or less, building better decision systems) because it seems to get at the essence of what decision engineers do, allowing but not requiring that highly technical, quantitative approaches might be used.  And it accommodates my instinct that much of what we do in our consulting work should indeed count as as a kind of engineering.

Whether we would be wise to publicly describe ourselves as decision engineers is however quite another question – one for marketers, not engineers.


Get every new post delivered to your Inbox.

Join 536 other followers