The “Analysis of Competing Hypotheses” method, or ACH, is one of the most important tools on the intelligence analyst’s bench. It is a procedure for determining which of a range of hypotheses is most likely to be true, given the available evidence. At its heart is a matrix, wherein hypotheses are listed across the top, and items of evidence are listed down the left side. There is then a square or cell in the matrix corresponding to every hypothesis/evidence pair, and in that square one indicates whether the item of evidence is (in)consistent or neutral with the hypothesis. For more information on ACH, see
- the classic chapter by Richards Heuer
- This brief overview (pdf file) of ACH compared with argument mapping (AM). Some of the points made below are presaged in the overview.
ACH is based on some fundamental insights:
- The network of relationships between items of evidence and hypothesis is “many-many”. That is, one piece of evidence can bear, one way or another, on many hypotheses, and an hypothesis is generally considered in the light of a number of pieces of evidence. The ACH matrix is an obvious and natural way to accommodate this web of relationships.
- At least in many situations, structured thinking techniques yield better results than informal or intuitive “pondering”. The ACH imposes a strong structure on hypothesis testing.
- Structured techniques are even more effective when making use of suitable external (i.e., outside the head or “on paper) representations. Thus long division in the head is hard; on paper, following a standard procedure, it is easy. The ACH matrix is an external representation aiding hypothesis testing.
- When making informal or qualitative judgements, it is usually better to use coarse schemes such as consistent/neutral/inconsistent rather than more elaborate and precise schemes such as numerical consistency ratings.
Nevertheless, in my experience using ACH, difficulties of various sorts rapidly arise; and as the expenditures of mental effort involved in struggling with those difficulties mount up, alternatives, such as “muddling through” without the use of a tool such as ACH, or using some other tool such as argument mapping, look increasingly attractive.
(Admittedly, I’ve never tried using ACH on a real, “industrial strength” problem of the kind that, presumably, intelligence analysts are engaging with on a daily or at least weekly basis. Perhaps the difficulties don’t arise so much in real cases; or perhaps they do arise, but are more than compensated for by the various benefits of using the ACH method, given the complexity of real cases. Perhaps; but I doubt it.)
Further, I’ve heard that while some intelligence analysts use the ACH technique regularly and perhaps even enthusiastically, the majority tend not to use it unless they really have to. It seems that their perception is that ACH is not worth the effort. Presumably this can be explained in part in terms of the various difficulties discussed here.
1. Too many judgements to make.
ACH, at least in a strong form, requires that you enter a judgement of consistency for every evidence item/hypothesis pair; i.e., you have to fill in every cell in the matrix. This is both a great strength of ACH, and a serious problem. It is a strength because it makes the process of comparing hypotheses against evidence exhaustive, thereby helping ensure that the evidential weight of all the items of evidence is properly accounted for.
The trouble is that the number of separate judgements becomes very large; for example, with 20 items of evidence and 5 hypotheses, you’d have to make 100 distinct judgements, each taking some modicum of conscious mental effort. Ugh! To make matters worse, many of these judgements return a “nil” verdict. In other words, in many cases, after careful consideration you conclude that item of evidence e is neutral (“neither here nor there”) with respect to hypothesis h.
So for example, suppose you are investigating the death of Princess Diana, and you are considering hypotheses including drunk-driving accident and assassination by MI5; and that one piece of evidence is that the driver had been drinking prior to the crash. This is clearly consistent with the drunk-driving hypothesis. ACH requires you to also consider whether this item of evidence is consistent, neutral or inconsistent with the assassination hypothesis. So you consider it, and you conclude that it is neutral; it really has nothing to do with that hypothesis.
In such a case, the mental effort of making the judgement seems to have yielded no immediate progress towards the goal of assessing the relative merits of the hypotheses. Arguably, that effort has in fact yielded some value in the context of the overall process, value which becomes apparent when you look across a row (to assess diagnosticity of evidence) or down a column (to assess the plausibility of an hypothesis). But it takes serious commitment to crank through dozens of such boring judgements in pursuit of some result at the end of the process. When in the midst of the ACH procedure, being forced to consider every e in relation to every h, only to conclude that it is (in and of itself) irrelevant, is a dispiriting activity; it feels like “makework” demanded arbitrarily by a tedious and laborious process.
(2) No e is an island
Superficially, ACH treats an item of evidence as consistent or inconsistent on its own with each of the hypotheses. Thus it seems to make sense to ask whether [the driver’s drinking before the crash] is consistent with the hypothesis that [the death of Diana was a drunk-driving accident]. However this is an illusion. In fact, and always, the evidential relationship between one proposition and another is mediated by other propositions. Put another way, an item of evidence is only consistent or otherwise with an hypothesis in the context of other relevant pieces of information or assertions. Thus the drink driver’s drinking before the crash is only consistent with the drunk-driving accident hypothesis given the general background knowledge that driving under the influence of alcohol increases the chances of an accident. If this were false – if drinking improved driving – then the driver’s drinking would be inconsistent with the drunk-driving hypothesis.
In argument mapping terms, we would say that every reason or objection is actually a multi-premise structure. In the philosophy of science, we would say that observations only confirm or disconfirm hypotheses in the context of auxiliary hypotheses. Sometimes we call these additional propositions assumptions. However we cast the point, the fundamental problem is that ACH’s way of structuring the evidence, hypotheses and their relationships leaves something important out of the picture. The kernel of the problem is the matrix representation at the heart of ACH; it naturally pairs individual items of evidence with individual hypotheses, and so is ill-suited to handling the actual structure of evidential relations even in the simplest case.
Does this matter? By necessity, every graphical or structural display of the web of evidential relations must select and simplify. The question is whether a particular display is, on balance, useful. Does the display enable us to think through the issues more effectively than using our default, informal and “in the head” methods? ACH enthusiasts of course think that the tradeoff is a good one. However, I think that while choosing a way to organise evidence and hypotheses which treats items of evidence discretely and independently of other information offers short-term gains, it does so at the cost of problems further down the track.
One such situation is where an additional item of information comes in, which has the effect of undermining a co-premise/auxiliary hypothesis/assumption. To illustrate: consider the question of what caused the Permian Extinction. One hypothesis is
h: it was a massive meteor collision.
A relevant piece of evidence is that
e1: there is no known meteor impact crater of the right age.
This appears to be inconsistent with the meteor hypothesis. Later, you find out that
e2: it is possible for lava to flow back up through the hole created in a large meteor impact, erasing the impact crater.
Now, the question is, how to accommodate this new piece of information in an ACH matrix? It seems to make little sense to treat it as a new, independent piece of evidence, against which each hypothesis can be tested. So the only option left is to leave it out of the matrix, but to change the “inconsistent” rating of e1 wrt h to neutral or consistent. However without e2, such a rating is mysterious. It seems e2 has to be recorded somewhere, but the ACH matrix offers no space for it.
A better treatment of this situation is to recognise that e1 is inconsistent with h only given the natural assumption that
a: A large meteor impact would leave a crater.
e1 alone is not inconsistent with h; rather, it is the bundle [e1, a] which is inconsistent with h; or alternatively, e1 is inconsistent with [h given a]. e2 is then a challenge to a.
However we express this verbally, the fundamental problem is that you can’t adequately represent, and make sense of, what is going on here in a basic ACH format. (You can handle this sort of situation quite easily in an argument mapping format, but that is another topic.)
(3) Flat structure of hypotheses
Another major problem with ACH is that it cannot handle the hierarchical structure of hypotheses (or it can do so at best only in an ungainly and unilluminating manner).
Hypotheses can be more or less general or abstract, and a general hypothesis can have sub-hypotheses. So in the Princess Diana case, one general hypothesis is assassination and another general one is accident. The general assassination hypothesis can have sub-hypotheses such as assassination by MI5, assassination by mafia, etc..
This is important because distinct items of evidence can count for or against hypotheses at various levels. Thus a bullet hole in the limousine would count in favour of any assassination hypothesis (or at least many such hypotheses), while an internal MI5 document might count for or against only the MI5 sub-hypothesis.
The classic ACH matrix asks for all hypotheses to be entered individually across the top row, and then to be compared against all pieces of evidence. But in the case of an hierarchical structure of hypotheses, this will result in an absurd duplication of effort, in which for example a piece of evidence bearing on all assassination hypotheses is compared not only against the general assassination hypothesis but also against all its sub-cases.
(4) Subordinate deliberation
By its very nature, being based on a matrix structure, the ACH approach does not consider what is “behind” or “underneath” any given piece of evidence. From a piece of evidence, it looks “forwards” or “upwards” to its bearing on the hypotheses under consideration. However the weight of a piece of evidence wrt an hypothesis depends on information bearing upon that piece of evidence. e may be quite (in)consistent with h, but how seriously we take this (in)consistency depends on how seriously we take e itself (its plausibility or credibility). This can only be evaluated in the light of further information subordinate to e. If you like, think of e as itself an hypothesis, in relation to which there is supporting or opposing evidence. In the standard ACH framework there is no way to represent or display this layered structure. (Again, the ability to handle such structure is a strength of argument mapping.)
(5) Decontextualisation and discombobulation
We’ve seen in points 2 and 4 above that the ACH matrix does not accommodate co-premises/assumptions, or subordinate deliberation. An ACH matrix is a like a sieve on the web of evidence, letting through some items and relationships but keeping out many others. Unfortunately what is left out is the context which helps make sense of the relationship of any given item of evidence to an hypothesis. Absent that context, the judgement becomes difficult to make. For a not-very-exaggerated example, consider: Is
e: David Hicks was captured in Afghanistan
consistent, inconsistent, or neutral, with respect to:
h: David Hicks was a terrorist
The proper answer is: uh….dunno…it depends. Absent any other information, you’d probably choose neutral, but this is not because e is neutral wrt to h. It is only because without surrounding information it is hard to tell what the evidential value of e is.
ACH, in demanding that we make so many judgements even as it strips the context of those judgements away, is constantly asking us to engage in these sorts of mentally taxing, even discombobulating exercises. After an extended bout of ACH, I tend to feel a bit dazed and confused, and have to stave off that feeling with redoubled mental effort to see the sense of the judgements I’m making.
We might reduce my complaints about ACH to two:
- ACH asks us to make too many distinct judgements; and
- Those judgements are emaciated due to the stripping away of relevant context, of both hypotheses and evidence
These problems are deeply related to choice of structure of external representation, i.e., to the choice of a matrix as the way to organise evidence, hypotheses, and judgements. I’m inclined to think that the use of the matrix is the fatal mistake of ACH; it is a commitment which seems obvious and natural initially, but as things unfold the limitations and problems inherent in the matrix structure come to the fore.
Much of the ACH procedure, as outlined for example by Heuer, could be retained even if the matrix structure was dispensed with in favour of some richer, more flexible format. But if you throw out the matrix, you also must throw out all those further aspects of the classic ACH which only applied if a matrix was being used. It is doubtful that what you’d have left would be worth calling ACH.
If you wanted to replace the ACH matrix, what would you use? One candidate is the argument map (of the kind you can create in, for example, the Rationale software). However while these have some strengths, for hypothesis testing they have a complementary set of weaknesses. At Austhink we are working on a new structure, one which will take and blend the best elements of both ACH and argument mapping, thus superseding them both. This new structure enables users to rapidly and intuitively assemble a (hierarchical) set of hypotheses in relation to some issue, items of evidence bearing on multiple hypotheses, assumptions, subordinate considerations, etc..
I agree with your assessment, Tim. It’s interesting that the matrix representation seems to be a fundamental weakness of ACH. In the realm of decision making, which has many parallels to analysis but is directed toward the future, there is a corresponding tension between “issue mapping” (which, like AM, is hierarchical) and classical “multi-criterial decision matrix” techniques. Typically in the latter one has a matrix in which the rows are the options/alternatives and the columns are the criteria (which may also be weighted), and the process is to fill in each cell of the matrix with a number indicating the strength of support that the row (option) gets from the column (criteria). Then a simple calculation across the matrix ranks the options from most preferred to least. The method works well for “tame” problems, in which the options are finite and stable and the stakeholders can all agree on a finite and stable set of criteria, but it is unwieldy, tiresome, and unilluminating to go through the exercise for a wicked problem such as climate change, where you need the richer representation capabilities of an issue (or dialogue) map.
An interesting piece. Some of my thoughts below.
1. Too many judgments to make
This doesn’t strike me as a problem inherent in the matrix structure. It’s a problem inherent in any analytic method. Analysis does just that: break down global judgments into multiple smaller judgments. If you didn’t consider all of those e/h combinations, you’re not fully analysing. This is true in a hierarchical map format, too, if you’re fully analysing. So changing the format doesn’t change the number of judgments.
You may, of course, decide to leave things out of a map (just you may decide to leave things out of a box). But if this is by decision rather than by thoughtlessness/accident, then you’d be making a judgment anyway (to leave it out or enter it in).
2. No e is an island
OK, so sometimes you have to justify a matrix rating because otherwise it looks mysterious. How often does this happen? We could easily modify the matrix to cope with it; e.g., where an additional premise is required:
– Reformulate the evidence to incorporate that premise/auxiliary information; OR
– If the problem occurs only in one cell, write the additional information (the argument supporting your judgment) in that cell, e.g. by adding a footnote.
We fill in the gaps with auxiliary information all the time, even in argument mapping. Most people operate like this most if not all of the time – and manage to make judgments, even good ones! I suspect that the number of times this became a crucial issue in ACH is small, and so a slight modification such as occasional footnoting would take care of it.
3. Flat structure of hypotheses
There is a genuine challenge here, but not with the matrix per se. It’s about the right level of granularity at which it’s useful to start any such process.
Heuer says (I think – or is this something I’ve found useful?) to ensure hypotheses are mutually exclusive and at the same level of abstraction/granularity. But what level is that? Ultimately, a hypothesis is likely to be quite a complex story, e.g. addressing the “Who, what, where, when, how and why” of an action or event. But starting with that degree of detail across the board can be wasteful and tedious, while starting too “high up” can be pointless, confusing and/or misleading (more on this later). If you can start high up and eliminate top-level possibilities (e.g. you can eliminate murder) then you’ve saved a lot of work. But sometimes that’s not possible. Identifying the right level at which to start is difficult.
Again, however, I don’t think this is a problem with a matrix format per se. It’s about what to put into the matrix. A hierarchical map structure confronts this problem in a different way, but no less awkwardly. For example, should the hypothesis “Prescribe antidepressants together with counselling” be at the same level as, or below, the hypotheses “Prescribe antidepressants” and “Prescribe counselling”? Conceptually they should be at the same level, but doing that results in duplication of work. It’s more practical to make it subordinate to one of the others, thereby doing violence to The Order of Things…
3.a The problem has worse implications for AM
Note that ACH may fare better than AM (Argument Mapping) in an important respect related to this issue. Contrary to AM, which focuses on support or corroboration, ACH is a process of falsification. This means that, under certain circumstances, getting the level of granularity wrong in ACH is less misleading than getting it wrong in AM. For example, if you formulate the hypothesis as “Diana was murdered” (without specifying by whom or why), multiple bits of evidence can be consistent with that hypothesis, but each of those might be consistent with only some particular sub-hypothesis (that specifies a culprit). In ACH, the effect would simply be to keep the “murder” hypothesis alive (unfalsified) – but in AM it would look like lots of actual support – even though any particular murder hypothesis would not be poorly supported. Cognitively speaking, believing she was murdered on poor grounds (AM) seems worse than simply failing to dismiss the possibility (ACH). This, of course, applies only if users don’t misuse ACH matrices. I’m quite sure that, regardless of the fact that it’s the contradictions that count, people find it difficult NOT to read consistency as support. Now that might be a drawback of ACH, but one that’s easily dealt with by not distinguishing between consistency and neutrality – but that’s another story.
I won’t tackle this here, but I suspect that things get even nastier if you consider evidence that’s mutually incompatible, i.e. sets where e1 supports/is consistent with h1a but is incompatible with h1b, e2 is consistent with h1b but inconsistent with h1a, and so forth, where h1a and h1b are both subordinate to (sub-hypotheses of) h1; e.g. h1a is assassination by MI5 and h1b is murder by a crazed bystander, and h1 is murder.
4. Subordinate deliberation
Any analytic method is a precision tool, not a Swiss army knife. That’s why we need a whole toolkit. This is not a problem. It’s essential to analysis, which, by definition, breaks down complex processes into discrete steps in order to make the process more rigorous and thereby (hopefully) minimise error. If a tool did everything, it’d be as complex as the process.
I don’t see why, therefore, you should be able to do something with ACH that another tool is better for.
What might genuinely be useful is a way to rate the reliability of evidence in a highly visible fashion. But we could introduce a convention into ACH, such as colour-coding of evidence and impacts, to show reliability. (The grounds for that judgment could then be shown in an argument map – perhaps hyperlinked to the table entry.)
5. Decontextualisation and discombobulation
Even in argument mapping we take a whole lot of context for granted – even when we articulate co-premises. I had a shock a while ago when I decided to take a map regarding a particular country and substitute a fictional country’s name. All of a sudden, things didn’t make any sense! I realised how much contextual historical knowledge about the relationships between countries like Iran, the US and Russia came into play in understanding the original map. Ordinarily, to have added all that as co-premises would have been madness. Context is ineliminable.
In the Hicks example, I would argue that there is insufficient granularity in the specification/description of the evidence. The claim “Hicks was captured in Afghanistan” is inappropriate or insufficiently refined, to do the job that’s asked of it. Again, it’s not the matrix that’s at fault but the way the information is entered.
1. Does ACH ask us to make too many distinct judgments?
ACH asks us to make the same number of distinct judgments as any analytic method that’s equally thorough. What’s “too many”? If it’s laborious, that’s the cost of analysis. The only time the judgments are “too many” is when the incremental error resulting from the number of judgments is greater than the degree of error arising out of the global judgment. Perhaps what we want, for practicality, is a degree of analysis that gives enough improvement in hit rate with the least amount of effort/boredom. That might be a different tool.
2. Are ACH judgments emaciated due to the stripping away of relevant context, of both hypotheses and evidence?
If they’re emaciated, I doubt this has much impact except in a very restricted number of cases. In those cases, we can address this easily by expanding the technique in a simple ways, as outlined.
OTHER PROBLEMS WITH ACH?
There are other problems with ACH. I mention some here briefly. Note that these may also be true of maps.
1. Malleability of hypotheses and evidence
Given that in many real-life cases from the human domain (as opposed to the laboratory) judgments of support/consistency/opposition don’t rely on direct evidence but on murkier facts or more indirect evidence (e.g. that someone has a motive is not clear-cut evidence for murder or against suicide), we can all too often interpret or reinterpret either the evidence or the hypothesis to make them consistent. This is a great feature of conspiracy theories: you tell a sufficiently convoluted story, you can just about make anything consistent with it. How useful is ACH in such contexts?
2. Invisibility of aspect
This is something that was pointed out to me in the US: If all the evidence corroborates only one part/aspect of a hypothesis, the matrix makes this invisible.
My thoughts: Does it matter? Only if we read consistency as support – a dangerous thing anyway. It’s a worse problem for rough argument mapping, which really does show it as support…
3. Invisibility of source
Also from the US: What if all evidence (or all inconsistent evidence) comes from the same source or is of the same source type (e.g. imaging)? The matrix doesn’t show that either.
My thoughts: Perhaps a simple device (like the bases in argument maps) could take care of the source type issue.
I’m not convinced that the matrix is a problem. But given that both ACH and AM are considered laborious – too analytic – to use, perhaps a tool that’s less analytic would do well. However, you’d have to show that the tool made a significant improvement to judgments, for the time and effort it demanded, compared to an intuitive, more global, or less technologically supported judgment.
That’s it for now!
As ACH gets more popular, criticism and suggested improvements abound. You raise an important topic. According to Heuer, though, the key consideration is “the relative likelihood of each hypothesis”. This would seem to rule out many of the suggested interpretations and variants, and avoid their errors. Heuer says “The matrix should not dictate the conclusion to you.” He does acknowledge that simply operating his method without regard to his cautions can lead to error, and gives some guidance.
I agree that in (all too common) complex situations inexperienced users will need experienced guidance, but isn’t this at least implicit? It is also true that at first ACH seems tedious, but with experience one can take short-cuts, and as long as one understands that one is concerned with relative likelihoods, this is perfectly safe.
Argument Mapping (AM) is a form of formal reasoning. Its use would thus complement ACH’s empirical approach, for example by helping to generate hypotheses. As Heuer says: “It is useful to make a clear distinction between the hypothesis generation and hypothesis evaluation stages of analysis.”
My own view is that it would be helpful to develop some extensions to ACH (“ACH+”) that deal ‘out of the box’ with some common problems, whilst retaining the validity of being linked to likelihood and hence empirical logic. One could then get a balance between simplicity and applicability.
I agree that AM can complement ACH, but not by generating hypotheses. AM does not in any way help generate hypotheses any more than ACH does. Both techniques presuppose the ability to generate hypotheses. There are other techniques to help with that.
I see AM as a later stage than ACH in the thinking process. ACH gives you the focal hypothesis – the one that best survives the process of elimination. AM then asks, “But how good is the case for it?”, since failure to falsify (at the end of ACH) isn’t the same thing as evidential support, only consistency. AM can help uncover important gaps in the evidence, and identify what assumptions are being made along the way.
1. What kind of extensions would ACH+ incorporate?
2. Does anyone out there use ACH not just with existing evidence but with what we might call indicators – i.e. “what would I expect to see if this [hypothesis] were the case”? Is that the kind of thing that might go into ACH+? (It could be incorporated into AM.)
The choice of a method is inevitably a compromise between generality with precision on the one hand and trainability for a low execution error rate on the other. ACH is a good compromise, teachable in a week. I accept that no method can be perfect and that there will always be the need for experienced review, so that a huge investment in a notionally better method may not actually be worthwhile. However, my judgement is that there is a niche for an upgraded ACH ‘ACH+’ that is reliable for a wider range of situations. The extensions that I see can be grouped under:
– removing the restrictions on the hypotheses
– providing support for the proper determination of likelihood
– consequential changes to the interface, to maintain ease of use.
http://conferences.computer.org/vast/vast2007/ shows the use of formal logics followed by ACH, but the use of ACH can also stimulate the use of formal logics in the search for new or modified hypotheses. Ideally, I would see formal and empirical logics combined, with a natural to-ing and fro-ing until one has satisfactory hypotheses. A fully extended ACH (‘ACH++’?) would need to accept ‘plug-in’ formal logics. In particular, ACH may show that the evidence is inconsistent with all current hypotheses. Formal logics could then be used to suggest variants. For example, AM could be used to identify an assumption that ‘Iraq’s actions are generally in line with Saddam’s public statements’, which (if this leads to an inconsistency) suggests the hypothesis ‘Saddam is deliberately misleading in his public statements’ by quasi-negation.
The use of ACH with what you call indicators is already seen by many practitioners as part of the use of ACH, although maybe not part of ACH itself. I’m not sure. But ACH+ would need to support this too.
My main contentions are:
– much of the criticism of ACH is ill-founded
– where it is well-founded, ACH has some ‘stretch’ potential.
A very thoughtful piece and subsequent commentary. I have employed ACH in an intelligence analysis setting, but largely (as I think Heuer intended) as one tool to help sort out more or less plausible hypotheses and the arguments surrounding and underlying them. Some of the points about assumptions being brought into the analysis and reliability assessments of the evidence are already addressed in some ACH software implementations I have seen. I agree with those who argue that this is only one of the tools available and, as long as we are careful not to demand too much from it (mechanical computation rather than highlighting probabilities for evaluation) it is an irreplaceable one.