Big Data isn't just about size
- 02 April, 2012 09:04
David Campbell, technical fellow at Microsoft and general manager of its data and modelling group, told a Microsoft symposium in Sydney on 29 March that he is witnessing more and more people who are realising new value in Big Data. However, he said it is not just about being “big” but about volume which creates value for the user.
“A lot of what's happening in the Big Data space is the ability to reason over connected graphs of data. One could argue that Google found a way to build a graph over Web pages and then turned that into value,” Campbell said.
“Facebook found a way to build a graph over people and then turned [it] into value ... We're doing work where we're building graphs of machines and how they're connected and how data flows from one machine to another and then reasoning over that to create new value.”
Big Data is also about velocity and variety, according to Campbell, and a lot of the data involved with Big Data is unstructured; “information production” can turn it into data which can be “consumed by existing tools”.
“Quite often it's transforming it through information production into something that you can incorporate and run through with your existing systems ... So any data, any size, anywhere,” he said.
Campbell also touched on Cloud at the symposium and said how quickly the transition to Cloud will take is unknown.
“So our strategy is about saying let's make our strategy less sensitive to that,” he said. “...I did some press and analysts discussions yesterday and the phrase garbage in, garbage out came out a whole bunch and it is clear that's still a challenge in the Big Data space,” he said.
Campbell told the symposium about a scenario where two petabytes of data from an online service was run on a large map in the Big Data environment to reduce it down to around 14 terabytes. This was then converted in a compressed in-memory engine and then spread across 24 blade servers and fronted with Excel. The result? Around 100 billion rows in Excel.
The analysts who were witnessing the compression were nonplussed by what had just occurred. Campbell asked for the model to be run again.
“I said ‘do you realise what you just saw? That was 300 billion calculations [in] real time’, and they're like, ‘really?’” Campbell said.
“It's so mind boggling to be able to do it, then this notion of time to insight and can I keep up speed of thought - that is where a lot of this transformation will come from. So the ability to take two petabytes of data, run something that may take an hour or two but ultimately produce something that someone can reason over 100 billion facts ... that is transformative. So how do you take it from this scale and get it to something that matters in the end?”
Campbell also told the symposium he began to develop the idea of a “data market” on the internet in 2009 where users could exchange and share data.
“So this notion of exploratory versus explanatory visualisations [and] models and such is incredibly important. This pattern from exploratory to explanatory plays out several times over the course from petabyte scale down to terabyte, gigabyte and finally where an end user can get the experience,” he said.
Campbell said he is excited about the future of Big Data.
“There will be a bit of hype ... [but] this is really going to revolutionise a lot of different aspects, and even life sciences and such,” he said. “I'm talking to people in the pharmaceutical space, I'm talking finance, it's just - there are a lot of really great examples in businesses you really wouldn't think may benefit from this.”
Follow Stephanie McDonald on Twitter: @steph_idg
Follow Computerworld Australia on Twitter: @ComputerworldAU