The Problem with Aggregate Data

In this information age, we have access to more data and analytics tools then ever before. Between polling, surveys, and the endless data gathering through web and device usage, and the algorithms that turn that data into action through advertisements and social media, we truly live in very data driven society. But there’s a problem with most of this data, at least in how it’s used in conversation: It’s almost always aggregate data.

And aggregate data can’t solve problems. It can’t even come close.

And yet we pretend it can. It’s used by business leaders to prove points or sell action plans, by media outlets to confirm stories, by politicians to garner support and validate policies. But despite all this usage, there is only one thing that Aggregate data can do.

Aggregate data tells you where there might be a problem.

This is actually very helpful. It’s not always easy to diagnose whether there’s a problem or not, and having an objective analysis that helps sniff out potential issues can be an immense help and time saver.

The issue occurs when we pretend that aggregate data makes things obvious- sometimes it might seem obvious from the data that there is a problem and how to solve it. But the great strength of this data is also it’s greatest weakness: It’s simplicity.

Aggregate data takes a lot of complexity and distills into into a simple number or graph. The pitfall is when we start believing the complexity has been done away with, and the problem is as simple as that number or graph. While this number is great for identifying overall progress, potential problems, and for impressing shareholders, it’s useless for finding out what causes a problem, or for pointing to how to solve it. You need multidimensional and highly detailed case studies for that.

But because it’s a single number, it’s easy to share. It’s easy to throw around. News media, politicians, activists, salesmen, marketers, human resource departments, managers and leaders, even analytics specialists run amok when they grab on the aggregate data and utilize it to “confirm” their pre-existing assumptions about what causes a problem and what the solution is. They say “See? The data confirms this is a problem. Naturally, the cause and solution most obvious to me must be true.”

This all too common misuse of the data happens so often not just because it’s so easy to grab onto, remember, and integrate a single number that lacks all complexity, but also because it minimizes the amount of work done with the data, but allows you to pretend your strategies are data driven. Then when things go wrong, you can absolve yourself of the responsibility- “I was only doing what the data said.”

We need to end this lazy misuse of data. It’s a bad habit we’ve built into our companies, our strategies, and our accountability programs. It’s an excuse to feel comfortable while putting in the bare minimum of work. It’s a false belief that our decisions are backed by objective analysis, when we are simply attaching a number to what we would have done anyway.

Next time you see a single number validating an action ask yourself: Does this number actually describe what it’s being said to describe? Or is only confirming that there might be a problem, and the rest is baseless extrapolation? We are fooling ourselves on a mass collective scale that our systems are being built on strong foundations when we are doing the equivalent of consulting a magic 8 ball from a dollar store.

It’s better to be honest, and use no data, then to use data incorrectly and naively believe we will reap the benefits.

Leave a comment