The petabyte is the new petaflop.
Interest in raw computational speed waned -- sorry, IBM -- after data center managers began turning away from super-expensive supercomputers toward massive grids comprised of cheap PC servers.
Meanwhile, the rise of business intelligence and its even more technical cousin, business analytics, has spurred interest in super-large data warehouses that boost profits by crunching the behavior of millions of consumers at a time.
Take Yahoo's 2-petabyte, specially-built data warehouse, which it uses to analyze the behavior of its half a billion Web visitors a month. The Sunnyvale, Calif. firm makes a strong claim that it is not only the world's single-largest database, but also the busiest.
Based on a heavily modified PostGreSQL engine, the year-old database processes 24 billion events a day, according to Waqar Hasan, vice-president of engineering for Yahoo's Data Group.
And the database, all of it constantly accessed and all of it stored in structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.
By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150 TB.
eBay reportedly operates databases that process 10 billion records per day, and are also able to do deep business analysis. They collectively store more than 6 PB of data, though the single largest system is estimated at about 1.4 PB or larger.
Even larger than Yahoo or eBay are the databases of the National Energy Research Scientific Computing Center in California whose archives include 3.5 PB of atomic energy research data, and the World Data Centre for Climate in Hamburg, Germany, which has 220 TB of data (download PDF) in its Linux database but more than 6 PB of data archived on magnetic tape.
But Hasan says that archived data is far different from live, constantly accessed data.
"It's one thing to have data entombed, it's another to have it readily accessible for your queries," he said. He also points out that other large databases store unstructured data such as video or sound files. Those can bulk up a database's size without providing easily analyzable data.
Read up on the latest ideas and technologies from companies that sell hardware, software and services. Business Intelligence and Enterprise Performance Management: Trends for Emerging Businesses
Controlling storage costs with Oracle database 11g
IT Service Management Needs and Adoption Trends: An Analysis of a Global Survey of IT Executives
Making the Business Case for IT Consolidation
Refresh your AUP: Top tips to ensure your acceptable use policy is fit for purpose
Taking On Demand CRM Integration to the Next Level
Email Archiving 101—Customer Case Study
Delivering the Power of Choice with Microsoft Dynamics CRM
Zones provide focussed content from Computerworld and leading technology partners.Discover how SOA can create smarter outcomes for your business.
Attend and learn:
- How SOA is helping leading companies to become more agile
- Where you should be applying SOA processes in your company
- The top SOA implementation mistakes to avoid
Click here for more information.
- +
Computerworld Live Podcast #97: The Future of Enterprise Networking 25/07/2008 09:45:36
This week CW Live chats with Mark Thompson, global sales and marketing manager for HP ProCurve, on the future of the enterprise networking. Mark discusses the trends we can expect to see in the near future and how the right infrastructure can ensure your enterprise network is secure. - +
Computerworld Live Podcast #96: Security at the Edge 11/06/2008 09:22:22
CW Live speaks with Amol Mitra, HP ProCurve Director of Marketing for Asia Pacific and Japan. Today's topic: how enterprises are starting to shift away from simply controlling security via server logins, firewalls and moving to more adaptive security frameworks. - +
Data Management Edition #10: Multi-Petascale Systems 02/05/2008 09:12:33
This week we look at sustainability and the development of multicore technologies to build multi-petascale systems. - +
IT Security Edition #11: How to poison the Storm botnet 01/05/2008 08:51:55
This week CW Live presents a case study on how to poison the notorious Storm botnet . Plus we take a look at Cisco's plans for Ironport. - +
IT Security Edition #10: Cyber-battles fought and won 24/04/2008 11:09:47
Vendors bow to end user pressure to improve product security, and we take a look at the latest concepts shaping the cyber-battlefield of the future.
MySpot SOS "Panic Button" Smartphone Application could save lone worker lives 2008-12-04 13:34:00+11
Charles Sturt University Commences Unified Communications Deployment With Interactive Intelligence 2008-12-04 08:30:00+11
AOC Launches 18.5” Widescreen Green 16:9 LCD Monitor in Australia and New Zealand 2008-12-03 15:30:00+11
FrontRange Solutions eases software license management with new License Manager 3.0 2008-12-03 14:56:00+11
Progress Software's Cure for Managing Services-based Applications 2008-12-03 14:42:00+11
How to improve employee productivity in small and medium businesses
U.S. businesses lose 5.4 billion productive hours through employees searching for information annually. Avoid the same inefficiencies occurring in your business. Read on to discover the productivity issues facing SMBs and how the Oracle Application Express (APEX) can improve employee productivity and enhance development efficiencies.












