A common decision making trap is thinking more data = better decision – and so, to make a better decision, you should go out and get more data.
Let’s call this the datacentric fallacy.
Of course there are times when you don’t have enough information, when having more information (of the right kind) would improve the decision, and when having some key piece of information would make all the difference.
Victims of datacentrism however reflexively embark on an obsessive search for ever more information. They amass mountains of material in hope that they’ll stumble across some critical piece, or critical mass, that will suddenly make clear what the right choice is. But they are usually chasing a mirage.
In their addiction to information, what they’re neglecting is the thinking that makes use of all the information they’re gathering.
As a general rule, quality of thinking is more important than quantity of data. Which means that you’ll usually be better rewarded by putting any time or energy you have available for decision making into quality control of your thinking rather than searching for more/better/different information.
Richards Heuer made this point in his classic Psychology of Intelligence Analysis. Indeed he has a chapter on it, called Do You Really Need More Information? (Answer – often, no. In fact it may hurt you.)
A similar theme plays out strongly in Phil Rosenzweig’s The Halo Effect… and the Eight Other Business Delusions That Deceive Managers. Rosenzweig provides a scathing critique of business “classics” such as In Pursuit of Excellence, Good to Great and Built to Last, which purport to tell you the magic ingredients for success.
He points out how in such books the authors devote much time and effort to boasting about the enormous amount of research they’ve done, and the vast quantities of data they’ve utilised, as if the sheer weight of this information will somehow put their conclusions beyond question.
Rosenzweig points out that it doesn’t matter how much data you’ve got if you think about it the wrong way. And think about it the wrong way they did, all being victims of the “halo effect” (among other problems). In these cases, they failed to realise that the information they were gathering so diligently had been irretrievably corrupted even before they got to it.
Another place you can find datacentrism running rampant is in the BI or “business intelligence” industry. These are the folks who sell software systems for organising, finding, massaging and displaying data in support of business decision making. BI people tend to think decisions fall automatically out of data, and so presenting more and more data in ever prettier ways is the path to better decision making.
Stephen Few, in his excellent blog Visual Business Intelligence, has made a number of posts taking the industry to task for this obsession with data at the expense of insightful analysis.
The latest incidence of datacentrism to come my way is courtesy of the Harvard Business Review. I’ve been perusing this august journal in pursuit of the received wisdom about decision making in the business world. In a recent post, I complained that the 2006 HBR article How Do Well-Run Boards Make Decisions? told us nothing very useful about how well-run boards make decisions.
I was hoping to be more impressed by the 2006 article The Seasoned Executive’s Decision Making Style. The basic story here is that decision making styles change as you go up the corporate ladder, and if you want to continue climbing that ladder you’d better make sure your style evolves in the right way. (Hint: become more “flexible.”)
In a sidebar, the authors make a datacentric dash to establish the irrefutablity of their conclusions:
For this study, we tapped Korn/Ferry International’s database of detailed information on more than 200,000 predominantly North American executives, managers, and business professionals in a huge array of industries and in companies ranging from the Fortune 100 to startups. We examined educational backgrounds, career histories, and income, as well as standardized behavioral assessment proﬁles for each individual. We whittled the database down to just over 120,000 individuals currently employed in one of ﬁve levels of management from entry level to the top. We then looked at the proﬁles of people at those ﬁve levels of management. This put us in an excellent position to draw conclusions about the behavioral qualities needed for success at each level and to see how those qualities change from one management level to another.
These patterns are not ﬂukes. When we computed standard analyses of variance to determine whether these differences occurred by chance, the computer spit out nothing but zeroes, even when the probability numbers were worked out to ten decimal points. That means that the probability of the patterns occurring by chance is less than one in 10 billion. Our conclusion: The observed patterns come as close to statistical fact (as opposed to inference) as we have ever seen.
This seems too good to be true. Maybe their thinking is going a bit off track here?
I ran the passage past a psychologist colleague who happens to be a world leader in statistical reform in the social sciences, Professor Geoff Cumming of Latrobe University. I asked for his “statistician’s horse sense” concerning these impressive claims. He replied [quoted here with permission]:
P-value purple prose! I love it!
Several aspects to consider. As you know, a p value is Prob(the observed result, or one even more extreme, will occur|there is no true effect). In other words, the conditional prob of our result (or more extreme), assuming the null hypoth is true.
It’s one of the commonest errors (often made, shamefully, in stats textbooks) to equate that conditional prob with the prob that the effect ‘is due to chance’. The ‘inverse probability fallacy’. The second last sentence is a flamboyant statement of that fallacy. (Because it does not state the essential assumption ‘if the null is true’.)
An extremely low p value, as the purple prose is claiming, often in practice (with the typical small samples used in most research) accompanies a result that is large and, maybe, important. But it no way guarantees it. A tiny, trivial effect can give a tiny p value if our sample is large enough. A ‘sample’ of 120,000 is so large that even the very tiniest real effect will give a tiny p. With such large datasets it’s crazy even to think of calculating a p value. Any difference in the descriptive statistics will be massively statistically significant. (‘statistical fact’)
Whether such differences are large, or important, are two totally different issues, and p values can’t say anything about that. They are matters for informed judgment, not the statistician. Stating, and interpreting, any differences is way more important than p-p-purple prose!
So their interpretation of their data – at least, its statistical reliability – amounts to a “flamboyant statement” of “one of the commonest errors.” Indeed according to Geoff it was “crazy to even think of” treating their data this way.
The bulk of their article talks about the kinds of patterns they found, and maybe their main conclusions hold up despite the mauling of the statistics. Maybe. Actually I suspect their inferences have even more serious problems than committing the inverse probability fallacy – but that’s a topic for another time.
In sum, beyond a certain point, the sheer volume of your data or information matters much less than thinking about it soundly and insightfully. Datacentrism, illustrated here, is a kind of intellectual illness which privileges information gathering – which is generally relatively easy to do – over thinking, which is often much harder.
It is a little too long ago since I did statistics. Does his point about large samples apply simply because they are large, or is it specific to properly done random samples?