But when data gets big, big problems can arise. That’s the message from Nate Silver, who works with data a lot. Silver is the founder of data-driven journalism site FiveThirtyEight (now owned by ESPN); he spoke at the HP Big Data Conference in Boston recently, outlining some of the problems that can come along with big data.
Where to put it
Silver says that even small and medium amounts of data can be difficult to manage, both technically in terms of how to store it and in terms of analyzing it. So, the more data companies have the even more complex the problems of managing it can become. Do you buy hardware? Do you store it in the cloud? How often will you need to access it? Can you deal with latency? These are the types of questions that will help you decide how to manage your data.
One issue with a lot of data is that it can create bias. Let’s say you have two polls, it can be pretty easy to decipher what those polls are saying. Now if you’re analyzing 100 surveys, there can be much more nuanced issues within that data. Ever heard of that saying that you can make statistics say whatever you want? Well the more data you have, the more wiggle room there can be to sway the stats.
Silver referenced a book by Daniel Kahneman titled “Thinking, Fast and Slow,” the point of which is that sometimes people rush decisions based on a subset of data (thinking fast). A better practice is to “think slow” and really rationalize data. With big data, thinking fast (not analyzing the data fully) can lead to false positives.
Silver calls this extracting the “signal from the noise.” Said another way, it’s the problem of finding the needle in the haystack. The more data you have, sometimes the harder it can be to find true value from the data.
That’s not what I was looking for…
Imagine Google Maps giving you directions and suggesting an alternate, “faster” route. You take it only to find that it’s a dirt road under construction. Sometimes big data systems think they have found a shortcut, but in reality, it’s not exactly what the user was looking for.