Big data and the Cloud that loves it part 1

More and more organisations are storing big data sets in the Cloud for cheap archiving, easier access or even online analytics — but is it too risky a proposition?

Think you’re managing a lot of data? It’s possible, but don’t bet James Bangay that you’ve got more than him; you’re likely to lose.

As program director with Queensland utility Ergon Energy’s ROMES (Remote Observation Automated Modelling Economic Situation) project, Bangay is helping lay plans for the ongoing management and analysis of a data set, to be generated by an aerial laser survey, that will map every centimetre of power infrastructure — and build 3D models of hundreds of millions of trees within 500m of those assets — across the length and breadth of Queensland.

Centimetre-resolution mapping will dramatically improve the accuracy of cadastral maps that can be out by 60m in rural areas, but will let Ergon Energy better track and manage its assets, comparing tree growth from year to year to ensure early removal of obstacles that could damage its network. A target of $44m in annual savings has been mooted — but to get there, Bangay’s team will have to generate and manage over 400TB of data and update it on a rolling basis as new mapping data is collected.

Yes, four hundred terabytes. And it’s all going into the Cloud.

The data will be fed into Google’s new Earth Builder project, a collaborative effort to source geographical data sets for Google’s Cloud geoinformatics system, where they’ll be available for anyone to access through Google Earth and Google Maps interfaces. Earth Builder will also allow the addition of third-party data overlays showing areas such as fish habitats, koala habitats, noxious weed infestations, and sensitive ecosystems — including many that Ergon Energy will supply and make broadly accessible through Google’s public marketplace for a nominal fee.

That makes the initiative a revenue earner, a way to improve its own processes, and a way to finally correct lingering errors in its geographic databases that make it all but impossible to accurately overlay different data sets created to different scales; Queensland government data alone has been created to over 80 different mapping standards through the years. Better still, Ergon Energy will be able to derive these benefits without having to buy, configure, and manage a dedicated 400TB internal storage area network.

It is, in other words, a win-win-win-win situation. “For us to operate efficiently, we need hundreds of data sets from different government agencies and private-sector organisations,” Bangay explains.

“Every one of these comes from a different mapping agency and with different co-ordinate sets, which makes it very difficult for us trying to understand the actual situation. Putting it in the Cloud will provide live data and one version of the truth, served in open formats compatible with geographical information systems.”

Out of site, out of mind

Given the way companies have jealously guarded their data, dumping a massive business-critical data set onto a Cloud-hosted data storage environment isn’t the kind of thing to be done lightly — or at all, in many cases. Yet that’s exactly what is happening as a growing movement to free up commercial and government data sets gains momentum.

“The key driver in this on-demand world is the consumerisation of IT,” says Carl Michael, enterprise architect with health provider Australian Unity. “Exploding numbers of customers mean huge growth in database sizes and processing speeds, and many enterprises — especially when they’re smaller — can’t easily provide this.” The Commonwealth government, which has more reasons than most to be slavishly diligent when it comes to sharing data, has recently taken a big step towards Cloud-hosted data by launching data.gov.au, a clearinghouse of raw database information modelled after a similar effort by the US government, data.gov.

Data.gov.au site is still a fraction of the size of its American counterpart, but nonetheless contains hundreds of data sets — many of which are sourced from popular government-data mashup competitions — that cover everything from the locations of public toilets and ACT BBQs to a selection of catalogues from the Atlas of Living Australia and new data sets from the National Native Title Tribunal. Interested citizens can download raw data sets and download, analyse and use them to their heart’s content.

What’s even more interesting about these data sets, however, is they way they’re hosted. Setting up dedicated Web servers would require commercial contracts, server infrastructure and convoluted service level agreements (SLAs) — but data.gov.au is hosted entirely on Web servers hosted as virtual machines in Amazon’s Elastic Compute Cloud (EC2), where it runs at high availability and scalability for virtually zero cost.

That makes it far more politically palatable than making the funding a budget line-item. “The good thing about that option is that it doesn't cost the government an enormous amount,” explains John Sheridan, first assistant secretary within the Agency Services Division of the Australian Government Information Management Office (AGIMO).

“We've already got the data and, while we’re not going to curate it, we’re prepared to put it out in a form that people might be prepared to use. The public Cloud has low setup costs, it’s costed as OpEx, elastic, demand driven, and pay as you go — and for data sets that we intend to be in the public domain, there’s no problem putting it in the public Cloud. If they can find a use for it, that’s great.”

Far from being a clearinghouse of junk data, these sets are fuelling a new way of thinking about government-sourced information, and in many cases the response has been surprising.

The World Bank Open Data project, for example, offers over 7000 data sets containing key global financial and other data and has been accessed by over 4.5 million users.

“I’m astonished by the number of people apparently just waiting for our data to become free,” a World Bank data curator recently told the New York Times. “I had no idea how big this was going to be.”

Tags Cloudstoragebig data

More about ACTAmazonC2etworkGoogleWorld Bank

Comments

Comments are now closed

Amazon vs. Google vs. Windows Azure: Cloud computing speed showdown

READ THIS ARTICLE
DO NOT SHOW THIS BOX AGAIN [ x ]