“I come not to praise forecasters but to bury them.” With these unsubtle words, Barry Ritholz opens an entertaining piece in the Washington Post, expressing a widely held view about forecasting in difficult domains such as geopolitics or financial markets. The view is that nobody is any good at it, or if anyone is, they can’t be reliably identified. This hard-line skepticism has seemed warranted by the persistent failure of active fund managers to statistically outperform dart-throwing monkeys, or the research by Philip Tetlock showing that geopolitical experts do scarcely better than random, and worse than the simplest statistical methods.
More recent research on a range of fronts – notably, by the Good Judgement Project, but also by less well-known groups such as Scicast and ACERA/CEBRA here at Melbourne University – has suggested that a better view is what might be termed “tempered optimism” about expert judgement forecasting. This new attitude acknowledges that forecasting challenges will always fall on a spectrum from the easy to the practically impossible. However, in some important but difficult domains, hard-line skepticism is too dogmatic. Rather,
- There can be forecasting skill;
- Some people can be remarkably good;
- Factors conducive to good forecasting have been identified;
- Forecasting is a skill which can be improved in broadly the same way as other skills;
- Better forecasts can be obtained by combining forecasts.
A high-level lesson that seems to be emerging is that forecasting depends on culture. That is, superior forecasting is not a kind of genius possessed (or not) by individuals, but emerges when a group or organisation has the right kinds of common beliefs, practices, and incentives.
The obvious question then is what such a culture is like, and how it can be cultivated. As part of work for an Australian superannuation fund, I distilled the latest research supporting tempered optimism into seven guidelines for developing a culture of superior forecasting.
- Select. When choosing who to hire – or retain – for your forecasting team, look for individuals with the right kind of mindset. To a worthwhile extent, mindset can be assessed using objective tests.
- Train. Provide basic training in the fundamentals of forecasting and generic forecasting skills. A brief training session can improve forecasting performance over a multi-year period.
- Track. Carefully document and evaluate predictive accuracy using proper scoring rules. Provide results to forecasters as helpful feedback.
- Team. Group forecasters into small teams who work together, sharing information and debating ideas.
- Stream. Put your best forecasters (see Track) into an elite team.
- Motivate. Incentives should reward predictive accuracy (see Track) and constructive collaboration.
- Combine. Generate group forecasts by appropriately combining individuals’ forecasts, weighting by predictive accuracy (see Track).
The pivotal element here obviously is Track, i.e. measuring predictive accuracy using a proper scoring rule such as the Brier score. According to Mellers (a key member of the Good Judgement Project) and colleagues, “This measure of accuracy is central to the question of whether forecasters can perform well over extended periods and what factors predict their success.” Using proper scoring rules requires forecasters to commit to explicit probabilistic predictions, a practice that is common in fields such as weather forecasting where predictions are rigorously assessed, but very rare in domains such as geopolitics and investment. This relative lack of rigorous assessment is a key enabler – and concealer – of ongoing poor performance.
In current work, we are developing training in generic forecasting skills, and practical methods for using scoring rules to evaluate predictive accuracy in domains such as funds management. Contact me if this may be of interest in your organisation.
Join my email list to be periodically notified of interesting posts to this site.
Reblogged this on The Official SciCast Blog and commented:
Nice summary by longtime colleague and arch argument mapper Tim van Gelder. “The pivotal element here obviously is Track, i.e. measure predictive accuracy using a proper scoring rule.” If “ACERA” sounds familiar, it’s because they were part of our team when we were DAGGRE: they ran several experiments on and in parallel to the site.