Feeds:
Posts
Comments

Anyone familiar with this blog knows that it frequently talks about argument mapping.  This is because, as an applied epistemologist, I’m interested in how we know things.  Often, knowledge is a matter of arguments and evidence.  However, argumentation can get very complicated.  Argument mapping helps our minds cope with that complexity by providing (relatively) simple diagrams.

Often what we are seeking knowledge about is the way the world works, i.e. its causal structure.  This too can be very complex, and so its an obvious idea that “causal mapping” – diagramming causal structure – might help in much the same way as argument mapping.  And indeed various kinds of causal diagrams are already widely used for this reason.

What follows is a reflection on explanation, causation, and causal diagramming.  It uses as a springboard a recent post on blog of the Lowy Institute which offered a causal explanation of the popularity of Russian president Putin.  It also introduces what appears to be a new term – “causal storyboard” – for a particular kind of causal map.


 

In a recent blog post with the ambitious title “Putin’s Popularity Explained,” Matthew Dal Santo argues that Putin’s popularity is not, as many think, due to brainwashing by Russia’s state-controlled media, but to the alignment between Putin’s conservative policies and the conservative yearnings of the Russian public.

Dal Santo dismisses the brainwashing hypothesis on very thin grounds, offering us only “Tellingly, only 34% of Russians say they trust the media.” However professed trust is only weakly related to actual trust. Australians in surveys almost universally claim to distrust car salesmen, but still place a lot of trust in them when buying a car.

In fact, Dal Santo’s case against the brainwashing account seems to be less a matter of direct evidence than “either or” reasoning: Putin’s popularity is explained by the conservatism of the public, so it is not explained by brainwashing.

He does not explicitly endorse such a simple model of causal explanation, but he doesn’t reject it either, and it seems to capture the tenor of the post.

The post does contain a flurry of interesting numbers, quotes and speculations, and these can distract us from difficult questions of explanatory adequacy.

The causal story Dal Santo rejects might be diagrammed like this:

putin1

The dashed lines indicate the parts of the story he thinks are not true, or at least exaggerated. Instead, he prefers something like:

putin2
However the true causal story might look more like this:

putin3.jpg

Here Putin’s popularity is partly the result of brainwashing by a government-controlled media, and partly due to “the coincidence of government policies and public opinion.”

The relative thickness of the causal links indicate differing degrees to which the causal factors are responsible. Often the hardest part of causal explanation is not ruling factors in or out, but estimating the extent to which they contribute to the outcomes of interest.

Note also the link suggesting that a government-controlled media might be responsible, in part, for the conservatism of the public. Dal Santos doesn’t explicitly address this possibility but does note that certain attitudes have remained largely unchanged since 1996. This lack of change might be taken to suggest that the media is not influencing public conservatism. However it might also be the dog that isn’t barking. One of the more difficult aspects of identifying and assessing causal relationships is thinking counterfactually. If the media had been free and open, perhaps the Russian public would have become much less conservative. The government-controlled media may have been effective in counteracting that trend.

The graphics above are examples of what I’ve started calling causal storyboards. (Surprisingly, at time of writing this phrase turns up zero results on a Google search.) Such diagrams represent webs of events and states and their causal dependencies – crudely, “what caused what.”

For aficionados, causal storyboards are not causal loop diagrams or cognitive maps or system models, all of which represent variables and their causal relationships.  Causal loop diagrams and their kin describe general causal structure which might govern many different causal histories depending on initial conditions and exogenous inputs.  A causal storyboard depicts a particular (actual or possible) causal history – the “chain” of states and events.  It is an aid for somebody who is trying to understand and reason about a complex situation, not a precursor to quantitative model.

Our emerging causal storyboard surely does not yet capture the full causal history behind Putin’s popularity. For example it does not incorporate any additional factors, such as his reputed charisma. Nor does it trace the causal pathways very far back. To fully understand Putin’s popularity, we need to know why (not merely that) the Russian public is so conservative.

The causal history may become very complex. In his 2002 book Friendly Fire, Scott Snook attempts to undercover all the antecedents of a tragic incident in 1994 when two US fighter jets shot down two US Army helicopters. There were dozens of factors, intricately interconnected. To help us appreciate and understand this complexity, Snook produced a compact causal storyboard:

148914-151707.png

To fully explain is to delineate causal history as comprehensively and accurately as possible. However, full explanations in this sense are often not available. Even when they are, they may be too complex and detailed. We often need to zero in on some aspect of the causal situation which is particularly unusual, salient, or important.

There is thus a derivative or simplified notion of explanation in which we highlight some particular causal factor, or small number of factors, as “the” cause. The Challenger explosion was caused by O-ring leaks. The cause of Tony Abbott’s fall was his low polling figures.

As Runde and de Rond point out, explanation in this sense is a pragmatic business. The appropriate choice of cause depends on what is being explained, to whom, by who, and to what purpose.

In an insightful discussion of Scott Snook’s work, Gary Klein suggests that we should focus on two dimensions: a causal factor’s impact, and the ease with which that factor might have been negated, or could be negated in future. He uses the term “causal landscape” for a causal storyboard analysed using these factors. He says: “The causal landscape is a hybrid explanatory form that attempts to get the best of both worlds. It portrays the complex range and interconnection of causes and identifies a few of the most important causes. Without reducing some of the complexity we’d be confused about how to act.”

This all suggests that causes and explanations are not always the same thing. It can make sense to say that an event is caused by some factor, but not fully explained by that factor. O-ring failure caused the Challenger explosion, but only partially explains it.

More broadly, it suggests a certain kind of anti-realism about causes. The world and all its causal complexity may be objectively real, but causes – what we focus on when providing brief explanations – are in significant measure up to us. Causes are negotiated as much as they are discovered.

What does this imply for how we should evaluate succinct causal explanations such as Dal Santo’s? Two recommendations come to mind.

First, a proposed cause might be ill-chosen because it has been selected from underdeveloped causal history. To determine whether we should go along, we should try to understand the full causal context – a causal storyboard may be useful for this – and why the proposed factor has been selected as the cause.

Second, we should be aware that causal explanation can itself be a political act. Smoking-related lung cancer might be said to be caused by tobacco companies, by cigarette smoke, or by smoker’s free choices, depending on who is doing the explaining, to whom, and why. Causal explanation seems like the uncovering of facts, but it may equally be the revealing of agendas.

A colleague Todd Sears recently wrote:

I thought I’d write to let you know that I used an argument map last night to inform a public conversation about whether to change our school budget voting system from what it is (one meeting and you have to be physically present to vote), to the (of all things!) Australian Ballot system (secret ballot, polls open all day, and absentee ballots available).

So, I went through the articles, editorials, and opinion pieces I could find on the matter and collapsed those into a pretty simple argument, which it is. Simple reasoning boxes get the job done.  Our voters had never really seen this kind of visualization.  It’s nice to be able to see an argument exist in space, and to demonstrate by pointing and framing that a “yea” vote needs to buy into the green points, but also that they need to reconcile the red points, somehow. It had very good response.

Ultimately, the AB motion was defeated by five votes.  Still, it was a good example of a calm, reasonable, and civil dialogue.  A nice change from the typical vitriol and partisan sniping.

Here is his map (click to view full size version):

Bethelmap

When I suggested that readers of this blog might find his account interesting or useful, he added:

Let me clarify what I did because it wasn’t a classic facilitation.

1. I reviewed all of the on-line Vermont-centric AB content I could find in the more reputable news sources, and put a specific emphasis on getting the viewpoints of the more vociferous anti-AB folks in my town so that I could fairly represent them.

2. I created a map from that information and structured it in a way that spread out the lines of reasoning in an easily understandable way. I could have done some further abstraction and restructured things, or made assumptions explicit using the “Advanced” mode, but chose to focus on easily recognized reasoning chains.

3. I sent the map out to the entire school board, the administrators, a couple of politicians, the anti-AB folks and some of the other more politically engaged people in town.

4. The session was moderated by the town Moderator, who set out Robert’s Rules of Order. Then discussion began. In fact, the first ant-AB speaker had my map in his hand and acknowledged the balance and strength of both sides of the argument.

5. I let the session run its course, and then explained what I did and how I did it, and then reviewed the Green and Red lines of the debate, explaining that a vote for or against means that the due diligence has to be done in addressing the points counter to your own position, and I demonstrated how this should be done. Though I was in favor of AB, I maintained objectivity and balance, rather than a position of advocacy one way or another.

Overall the session was very civil, informed, and not one point was made (myriad rhetorical flourishes aside) that was not already on the map. Many variations on similar themes, but nothing that hadn’t been captured.

And followed up with:

BTW, just 30 minutes ago I received an e-mail which said this:

Hi,

I love the map of the issues around Australian Ballot that you sent out. Is there an easy way to make such a map? We are tackling some issues that have our faculty at Randolph Union High School pretty evenly split and I think two such maps would be a powerful way for my colleague and I who are leading this change to communicate. It looks as if it was created in PowerPoint. If you are too busy to elaborate that’s fine too.

Thanks for your leadership on the Australian Ballot issue. I appreciate it.

“As soon as you start thinking probabilities, all kinds of things change. You’ll prepare for risks you disregarded before. You’ll listen to people you disagreed with before. You won’t be surprised when a recession or a bear market that no one predicted occurs. All of this makes you better at handling and navigating the future — which is the point of forecasting in the first place.”

From a good piece in the Motley Fool called Maybe Yes, Maybe No.  Core idea is pretty much the same as in my newspaper piece Do you hold a Bayesian or a Boolean worldview?

See also rba.tips – a site designed in part to help people better understand probabilistic thinking.

Are Treasury forecasts credible?

In a recent opinion piece economics columnist Ross Gittins defended Treasury on the grounds that:

  • It assesses its own performance and publishes the results
  • It bases its forecasts on reasonable assumptions and sophisticated modeling
  • Its critics don’t do either of these things

But still, are they credible?  What does “credible” mean?

As evidence of Treasury’s credibility Gittins points to a new section of the recent Budget Papers, Statement 7: Forecasting Performance and Scenario Analysis, in which Treasury describes its own performance.

Here is the first data displayed in that Statement:

chart1

It shows Treasury GDP growth forecasts (dots) and actual GDP growth (columns). Eyeballing the chart suggests that Treasury are usually out by half a percent or more, and sometimes much more.  But then, economic forecasting is notoriously difficult.  Is this performance good or bad or OK?  How do we tell?

One way is to compare against simple benchmarks.  For example, what if I were to “compete” with Treasury by forecasting that growth one year will be the same as growth in the previous year?  Clearly some years I’d do well, and some years I’d be way out.  Would I be on average more or less accurate than Treasury? You can’t tell just by looking, but it can be calculated easily enough.

I took the data from the chart and calculated the mean squared error for Treasury forecasts vs actual growth, and for two alternative “naive” strategies vs actual growth – the one described in the previous paragraph, and another where growth next year is forecast to be equal to the average of growth in all the previous years.  Here are the results:

Treasury: 1.12
Naive 1: 1.22
Naive 2: 0.93

Apparently, Treasury is doing about the same as one dumb strategy, same as last year and worse than another, average of prior years.  (Note that for mean squared error measures, low is good.)

In other words, on the face of it, despite all its the effort, intelligence, data and modeling, Treasury forecasts GDP growth worse than a simple extrapolation well within the ability of most high school students.

If that’s right, I’d say Treasury forecasts are not credible.

To be sure, this is very rough and ready.  The analysis could be made more sophisticated in all sorts of ways.  The key question however is whether Treasury is managing to outperform simple benchmarks.  If they aren’t demonstrably doing so, why shouldn’t we ignore what they say and just go with the benchmarks?

Interestingly, Statement 7 of the Budget Papers makes no comparison with simple benchmarks.  They tell us how well they did, and provide a long list of reasons why they think doing as well as they did is pretty hard. They don’t tell us how well they would have done if they had used some other less expensive strategy.

The other thing they don’t tell us is how their forecasts compare with how good it is possible to be.  The implicit claim is that their forecasts are as good as anyone could do, but this is far from obvious.

I’m raising these points not to condemn Treasury forecasts but to throw out some challenges.

First, Treasury should compare how it is performing against simple benchmarks.  Indeed, I’d be surprised if they weren’t doing this already in some cubicle somewhere. They should make the results easily accessible to the public.

Second, Treasury and its critics should enter into independently-run public forecasting competitions.  These competitions should be open to anyone who’d like to try their hand at it, using whatever methods or data they like. Such competitions would be the best way to establish the level of credibility Treasury forecasts really have.

The new site www.rba.tips – a “tipping competition” for RBA interest rate decisions – is an example of the kind of approach that could be used.

Such steps might help make Treasury, in Gittins’ phrase, “the only honest players in this game.”

When is a forecaster performing well?  An increasingly common way to measure this is to use a scoring rule known as the Brier score.

The essential idea behind the Brier score is simple enough: it is the average gap (mean squared difference) between forecast probabilities and actual outcomes. This post tries to explain and motivate the Brier score by “composing” it from some other simple ideas about forecasting quality, unlike many presentations which start with the Brier score and then show how it can be decomposed. There is nothing surprising here for anyone well-versed in these topics, but others who (like me) are just beginning to explore these ideas might find the post helpful.

I’ll use a very small real-world dataset, a set of predictions about what the Reserve Bank of Australia will decide about interest rates at its monthly meetings.  The RBA generally leaves interest rates unchanged, but sometimes raises them and sometimes lowers them, depending on economic conditions.  The dataset consists of predictions implicit in the assessments of the ANU RBA Shadow Board, as found on the website RBA.Tips.  To keep things simple, the dataset reduces the predictions to binary outcomes – Change or No Change – and provides a numerical estimate of the probability of No Change.

data

The “Coded Outcome” column just translates the RBA’s decision into numbers – 1 for No Change, and 0 for Change.  This makes it possible to do the calculations described below.

Uncertainty

One obvious thing about this dataset is that more often than not, there is No Change.  In this small sample, the RBA made no change 5/7 times or 71.4% of the time, which as it happens is quite close to the long term (1990-2015) average or overall base rate of 75%.  In other words, there isn’t a lot of uncertainty in the outcomes being predicted. Conversely, uncertainty would also be low if the RBA almost always changed the interest rate.  A simple way to put a single number on the uncertainty of either of these flavors is to take the base rate and multiply it by 1 minus itself, i.e.

Uncertainty = base rate * (1 – base rate).

For this dataset, Uncertainty is 0.714 – (1 – 0.714) = 0.204

This relative lack of uncertainty means that an attractive forecasting strategy would be to simply go with the base rate, i.e. always predicting that the RBA will do whatever it does most often.  How well would such a forecaster do? A simple way to measure this is in terms of hits and misses.  For the period above, the base-rate strategy would yield 5 hits and two misses out of a total of seven predictions, i.e. a hit/miss ratio of 5/7 = 71%.   Over a long period, this ratio should converge on the base rate – as long, that is, as variations in economic conditions, and RBA decision making, tend in future to be similar to what they were in the past when the base rate was being determined.

The base rate strategy has some advantages (assuming you can access the base rate information).  First, it is better than the most naive approach, which would be to pick randomly, or assign equal probabilities to the possible outcomes.  Second, it is easy; you don’t have to know much or think hard about economic conditions and the interplay between those and RBA decision making. The downside is that over the long term you can’t do better than the base rate, and you can’t do better than anyone else who is also using the base strategy strategy.  If you’re ambitious or competitive or just take pride in good work,  you’ll need to make predictions which are more sensitive to the underlying probabilities of the outcomes – i.e. more likely to predict No Change when no change is more likely, and vice versa for change.

This can be seen in our simple dataset.  Crude inspection suggests the predictions fall into two groups or “bins”.  From Oct-14 to Dec-14 the probabilities assigned to No Change were all 70% or above, and over this period, interest rates in fact never changed.  From Feb-15 to May-15, the probabilities were lower, in the 60-70% range, and twice there was in fact a change.   It seems that whoever made these predictions believed that the economic conditions made a change more likely in 2015 than it was in late 2014, and they correctly adjusted their predictions accordingly.  Note that they had two misses in 2015, suggesting that their probabilities had not been reduced sufficiently.  But intuitively the “miss” predictions were not quite as off-the-mark as they would have been if the probabilities had been at the higher 2104 level – an idea captured by the Brier score.

Resolution

So in general a good forecaster will not make the same forecast regardless of circumstances but rather will have “horses for courses,” i.e. different forecasts when the actual probabilities of various outcomes are different.  Can we measure the extent to which a forecaster is doing this?  One way to do it is:

  • Put the forecasts into groups or bins with the same forecast probability
  • For each bin, measure how different the outcomes of predictions in the bin – the “bin base rate” are to the overall base rate.
  • Add up these differences

Lets see how this goes with out dataset.  Suppose we have two bins, the 70s (2014) bin and the 60s (2015) bin.  For forecasts in the 70s bin, the outcomes were all No Change, so the bin base rate is 1.  For the 60s bin, the bin base rate is 2/4 = 0.5.  So we get:

70s bin: 1 (the bin base rate) – 0.714 (the overall base rate) = 0.286
60s bin: 0.5 – 0.714 = -0.214

Before we just add up these differences, we need to square them to make sure they’re both positive, and then “weight” them by the number of forecasts in each bin:

0.286^2 * 3 = 0.245
-0.214^2 * 4 = 0.183

Then we add them and divide by the total number of forecasts (7), to get 0.061.

This number is known as the Resolution of the forecast set.  The higher the Resolution the better; a forecaster with higher Resolution is making forecasts which are more different to the overall base rate than a forecaster with a lower score, and in that sense more interesting or bold.

Calibration

In order to define resolution we had to sort forecasts into probability-of-outcome bins.  A natural question to ask is how well these bins correspond to the rate at which outcomes actually occur.  Consider for example the 70s bin.  Forecasts in that bin predict No Change with a probability, on average, of 70.67%.  Does the RBA choose No Change 70.67% of the time in those months? No; it decided No Change 100% of the time.  So there’s a mismatch between forecast probabilities and outcome rates.  Since the latter is higher, we call the forecasts underconfident; the probabilities should have been higher.

Similarly forecasts in the 60s bin predicted No Change with probability (on average) 66%, but the RBA in fact made no change only half the time.  Since .66 is larger than 0.5, we call this overconfidence.

Calibration is the term used to describe the alignment between forecast probabilities and outcome rates.  Calibration is usually illustrated with a chart like this:

calibration

The orange line represents a hypothetical forecaster with perfect calibration, i.e. where the observed rate for every bin is exactly the same as the forecast probability defining that bin; the orange dots represent hypothetical bins with probabilities 0, 0.1, 0.2, etc..  The two bins from our dataset are shown as blue dots.  The 70s bin is out to the left of the line, indicating underconfidence; vice versa for the 60s bin.

Reliability

So it seems our forecaster is not particularly well calibrated  (though be aware that we are dealing with a tiny dataset where luck of the draw can have undue effects). Can we quantify the level of calibration shown by the forecaster in a particular set of forecasts? Yes, using an approach very similar to the calculation in the previous section.  There we took the mean (average) squared difference between bin base rates and overall base rates.  To quantify calibration, we take the mean squared difference between bin probability and bin base rate.  If that sounds cryptic, lets walk through the numbers.

For the 70s bin, the average forecast probability was 70.67%, and the bin base rate was 1, so the squared difference is

(.7067 – 1)^2 = 0.086

Similarly for the 60s bin:

(0.66 – 0.5)^2 = .026

Multiple each of these by the number of forecasts in the bin:

0.086 * 3 = 0.258
.026 * 4 = 0.102

Add these together and divide by the total number of forecasts, to get 0.052.  This, as you guessed, is called the Reliability of the forecast set.  Note however that Reliability is good when the mean squared difference is minimized, so the lower reliability score, the better, unlike Resolution where higher is better.

Recap

Lets briefly take stock. Our guiding question has been: how good are the forecasts in our little dataset? So far, to get a handle on this we’ve loosely defined four quantities

  1. Uncertainty in the outcomes.  Uncertainty indicates the degree to which outcomes are predictable.
  2. Resolution of the forecast set.  This is the degree to which the forecasts fall into subsets with outcome rates different from the overall outcome base rate, calculated as mean squared difference.
  3. Calibration – the correspondence, on a bin-by-bin basis, between the forecast probabilities and the outcome rates;
  4. Reliability – an overall measure of calibration, calculated as the mean squared difference between forecast bin probabilities and outcome rates – or in other words, mean squared calibration.

Brier Score

But wouldn’t it be good if we could somehow capture all this in a single, goodness-of-forecasts number?  That’s what the Brier score does.  The Brier score is yet another mean squared difference measure, but this time it compares forecast probabilities with outcomes on a forecast-by-forecast basis.  In other words, for each forecast, subtract the outcome (coded as 1 or 0) from the forecast probability and square the result; add up all the results and divide by the total number of forecasts.  For out little dataset we get

(0.7067 – 1)^2 = 0.086
(0.7067 – 1)^2 = 0.086
(0.7067 – 1)^2 = 0.086
(0.66 – 0)^2 = 0.436
(0.66 – 1)^2 = 0.116
(0.66 – 1)^2 = 0.116
(0.66 – 0)^2 = 0.436

Add these all up and divide by 7 to get 0.195 – the Brier Score for this set of forecasts.  (Note that because, in calculating Uncertainty, Resolution and Reliability we collapsed forecasts into bins with a single forecast probability, in calculating the Brier score we treat each forecast as having its “bin” probability.)

Like Reliability, lower is better for Brier scores; a perfect score is 0.

Brier Score Composition

It turns out that all these measures are unified by the simple equation

Brier Score = Reliability – Resolution + Uncertainty

or in our numbers

Brier Score = 0.195
Reliability – Resolution + Uncertainty = 0.052 – 0.061 + 0.204 = 0.195

In other words, the Brier score is composed out of Reliability (a measure of Calibration), Resolution, and Uncertainty.

The equation above – which can be found in full formulaic glory on the Wikipedia page and in many other places – is attributed to Alan Murphy in a paper published in 1973 in the Journal of Applied Meteorology.  It is usually called the Brier score decomposition, but here I’ve called it the Brier Score Composition because I’ve approached it in a bottom-up way.

Interpreting the Brier Score

As mentioned at the outset, the Brier score is increasingly common as a measure of forecasting performance.  According to Barbara Meller, a principal reseacher in the Good Judgement Project, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”

Having followed how the Brier score is built up out of other measures of forecasting quality, we should keep in mind two important points.

  1. One of the Brier score components is Uncertainty, which is function solely of the outcomes, not of the forecasts.  Greater Uncertainty will push up Brier scores.  This means that a forecaster trying to forecast in a highly uncertain domain will have higher Brier score than a forecaster of the same skill level tackling a less uncertain domain.  In other words, you can’t directly compare Brier scores unless they are scoring forecasts on the same set of events (or two sets of events with the same Uncertainty).  As a rough rule of thumb, only compare Brier scores if the forecasters were forecasting the same events.
  2. The Brier score is convenient as a single number, but it collapses three other measures.  You can get more insight into a forecaster’s performance if you look not just at “headline number” – the Brier score – but at all four measures.

“I come not to praise forecasters but to bury them.”  With these unsubtle words, Barry Ritholz opens an entertaining piece in the Washington Post, expressing a widely held view about forecasting in difficult domains such as geopolitics or financial markets.  The view is that nobody is any good at it, or if anyone is, they can’t be reliably identified.  This hard-line skepticism has seemed warranted by the persistent failure of active fund managers to statistically outperform dart-throwing monkeys, or the research by Philip Tetlock showing that geopolitical experts do scarcely better than random, and worse than the simplest statistical methods.

More recent research on a range of fronts – notably, by the Good Judgement Project, but also by less well-known groups such as Scicast and ACERA/CEBRA here at Melbourne University – has suggested that a better view is what might be termed “tempered optimism” about expert judgement forecasting. This new attitude acknowledges that forecasting challenges will always fall on a spectrum from the easy to the practically impossible.  However, in some important but difficult domains, hard-line skepticism is too dogmatic.  Rather,

  • There can be forecasting skill;
  • Some people can be remarkably good;
  • Factors conducive to good forecasting have been identified;
  • Forecasting is a skill which can be improved in broadly the same way as other skills;
  • Better forecasts can be obtained by combining forecasts.

A high-level lesson that seems to be emerging is that forecasting depends on culture.  That is, superior forecasting is not a kind of genius possessed (or not) by individuals, but emerges when a group or organisation has the right kinds of common beliefs, practices, and incentives.

The obvious question then is what such a culture is like, and how it can be cultivated.  As part of work for an Australian superannuation fund, I distilled the latest research supporting tempered optimism into seven guidelines for developing a culture of superior forecasting.

  1. Select.  When choosing who to hire – or retain – for your forecasting team, look for individuals with the right kind of mindset.  To a worthwhile extent, mindset can be assessed using objective tests.
  2. Train.  Provide basic training in the fundamentals of forecasting and generic forecasting skills.  A brief training session can improve forecasting performance over a multi-year period.
  3. Track. Carefully document and evaluate predictive accuracy using proper scoring rules.  Provide results to forecasters as helpful feedback.
  4. Team. Group forecasters into small teams who work together, sharing information and debating ideas.
  5. Stream. Put your best forecasters (see Track) into an elite team.
  6. Motivate. Incentives should reward predictive accuracy (see Track) and constructive collaboration.
  7. Combine. Generate group forecasts by appropriately combining individuals’ forecasts, weighting by predictive accuracy (see Track).

The pivotal element here obviously is Track, i.e. measuring predictive accuracy using a proper scoring rule such as the Brier score.  According to Mellers (a key member of the Good Judgement Project) and colleagues, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.”  Using proper scoring rules requires forecasters to commit to explicit probabilistic predictions, a practice that is common in fields such as weather forecasting where predictions are rigorously assessed, but very rare in domains such as geopolitics and investment.  This relative lack of rigorous assessment is a key enabler – and concealer – of ongoing poor performance.

In current work, we are developing training in generic forecasting skills, and practical methods for using scoring rules to evaluate predictive accuracy in domains such as funds management. Contact me if this may be of interest in your organisation.

Join my email list to be periodically notified of interesting posts to this site.

I forwarded to Paul Monk a link to this video:

He replied, within minutes:

Truly awesome.

It prompts the thought that the biggest revolutions in worldview have been scientific and have entailed:

1. Moving from the Earth centred (Aristotelian/Biblical) cosmology (which had its counterparts in many tribal myths and the cosmogonies of many other civilizations; though the classical Atomists began to guess at the truth and this was picked up again by Giordano Bruno in the late 16th century, only to get him burned alive in Rome by the Inquisition) to first a heliocentric one, then a Milky Way one, then a Hubble 3D one, as it were, and finally to a multiverse one;

2. Discovering that we are evolved creatures and have a direct biological ancestry going back 3.8 billion years, but on a world that, in much less than that time into the future (regardless of what we do) will become uninhabitable, as the Sun swells to become a red giant and destroys the Goldilocks Zone which makes life on Earth possible;

3. Realizing that we live in and are imbricated in a world of microbes that used to dominate the planet, exist in a highly complex symbiosis with larger life forms, including predation upon them and have played a substantial role in the mass extinctions.

4. Slowly getting to understanding human history from a global and cosmopolitan perspective instead of from narrowly local ones; and

5. Developing the elements of a universal cognitive humanism with the exploration of languages and linguistics, comparative mythology (Levi-Strauss and structuralism) and anthropology (including Durkheim’s Elementary Forms of the Religious Life, about a century ago).

My own worldview, if you like, is that all these things transcend (trump) the epistemological claims of the old religions and mythologies, as well as those even of 19th century political ideologies (to say nothing of crude 20th century ones such as Nazism and Marxism-Leninism). BUT the vast majority of human beings on the planet know almost nothing of all this and certainly have not been able to weave it together into a coherent new, shared, universal worldview for the 21st century.

Just a few thoughts on the run, or rather while viewing Andromeda.

Follow

Get every new post delivered to your Inbox.

Join 574 other followers