Suppose you wanted better ratings of intelligence reports.  Where could that improvement come from? How could you possibly get it?

Some years ago the U.S. Office of the Director of National Intelligence proposed an answer of sorts. It was: use the method described in their Rating Scale document.  In previous posts, I described various issues with that approach, and pointed to an alternative “flaws” based approach.

Here I’ll move up a level of abstraction and talk briefly about how, in the most general terms, ratings might be improved.

Every intelligence analyst and manager has at least some ability to assess the quality of an intelligence report, just as every normal person has at least some ability to evaluate the quality of deliberative material they encounter, such as discussions of contentious issues on Facebook or talk-back radio.

Consider all the reports produced by an organisation.  Presumably some will be very good indeed, some poor, and most of middling quality, relative to the standards of that organisation.

Similarly, consider all the ratings or evaluations of reports produced by that organisation. Rating a report is a complex task which can be done with varying levels of proficiency and diligence.  So with these also we would expect some to be very good, some poor, and most to be of middling quality.  Plausibly, there would be a roughly bell-shaped distribution of quality of ratings.

An organisation hoping to improve its ratings of reports can be understood as wanting to shift the distribution to the right, as schematically depicted here:

distribution shift

The distribution on the right has been shifted so that its mean is greater than the mean of the one on the left by one standard deviation.  Generally speaking, in these sorts of contexts, a one standard deviation gain is very large and very difficult to achieve.  For example, an educational intervention that improved critical thinking skills in undergraduate students by one standard deviation would be hugely successful.

So, returning to the main question: where could such gains come from? What could drive such a distribution shift?

One way of tackling this is to focus on three related areas:

  1. The method used to produce ratings
  2. The proficiency of individuals using the method
  3. The support provided

The rating method

The method is the distinctive set of activities which result in a rating being delivered. The first major opportunity for shifting the rating distribution is to improve the method, at either individual or collective levels.

At the individual level, a method might be a set of steps a rater goes through as they come up with a rating of a report.  For example, many rating approaches use a rubric-based method, where the rater follows a procedure specified in the rubric and along the way produces various sub-ratings, which might get aggregated into a single overall rating.

At a higher or collective level, a method might involve a number of raters working together in some orderly way.  For example, the ODNI Rating Scale, used properly, involves two raters following a rubric-based procedure, working together to produce a q consensus rating, which is then quality-checked by a third rater.  Alternatively, you might use a “wisdom of crowds” approach where the ratings produced by a number of raters working independently are mathematically aggregated.

It is important to be aware that there are alternatives to rubric-based procedures.  Our research team has been developing and testing a “forced choice” method in which raters are required simply to say which of two reports is better.  If enough of these choices are made, a particular report can be assigned an overall quality rating.

Individual proficiency

A second major opportunity is to improve the proficiency with which the raters individual raters do their jobs.

This can be divided into improving the expertise of the raters, and improving the deployment of that expertise.

To improve expertise itself, raters will need good training.  They will also need plenty of good practice. Here I’m following the pioneering work of Ericsson and others, according to which expertise is acquired through extensive deliberate practice.  Deliberative practice involves focused engagement in an activity aimed at improvement, with the benefit of good coaching and rapid, high-quality feedback.

To improve the deployment of expertise, raters should be given appropriate conditions, and appropriate incentives, to do their job properly.  Conditions include factors such as having enough time to do a complex job thoroughly, and suitable working conditions – e.g. adequate freedom from fatigue, stress, noise and other interruptions. 

Incentives help ensure that a rater will deploy their expertise to greatest effect.  Note that incentives need not and often should not be “extrinsic” such as monetary rewards.  Intrinsic incentives such as reputation and professional satisfaction may be far more effective. 

Support

A third major opportunity to improve ratings is to improve the quality of the support provided to raters and rating teams.

One form of support is the range of resources raters can draw on such as guidance documents and training materials.

Another form of support is the “technology” used to support rating activity.  This technology might be as simple as a one-page form to be filled in by hand.  (This is, in fact, how some agencies currently produce ratings.) However even a paper form can be well- or poorly-designed, and redesigning forms can be “low hanging fruit” for improving rating quality or productivity.

One step above paper forms, there are online forms and databases, which can efficiently record individual ratings and help manage collective rating procedures.

However we can now start to envisage much more sophisticated support platforms, which might for example use AI to help focus raters’ attention, provide adaptive context-sensitive guidance, and some forms of feedback.  Such platforms could not merely improve the quality or productivity of rating activity; they could help transform the activity of rating into expertise building.  In other words, producing ratings should not be seen merely as an activity in which previously acquired expertise is deployed.  It should be seen as an opportunity for continuous cultivation of rating expertise.

What’s missing?

This breakdown is sure to be missing some options for improving ratings.  Suggestions most welcome!

 

 


Interested in a weekly digest of content from this blog? Sign up to the Analytical Snippets list. This blog, and the list, is intended to keep anyone interested in improving intelligence analysis informed about new developments in this area.


Image by Gerd Altmann from Pixabay