Yahoo claims 2-petabyte database is world's biggest, busiest
- 23 May, 2008 08:20
- Comments
The petabyte is the new petaflop.
Interest in raw computational speed waned -- sorry, IBM -- after data center managers began turning away from super-expensive supercomputers toward massive grids comprised of cheap PC servers.
Meanwhile, the rise of business intelligence and its even more technical cousin, business analytics, has spurred interest in super-large data warehouses that boost profits by crunching the behavior of millions of consumers at a time.
Take Yahoo's 2-petabyte, specially-built data warehouse, which it uses to analyze the behavior of its half a billion Web visitors a month. The Sunnyvale, Calif. firm makes a strong claim that it is not only the world's single-largest database, but also the busiest.
Based on a heavily modified PostGreSQL engine, the year-old database processes 24 billion events a day, according to Waqar Hasan, vice-president of engineering for Yahoo's Data Group.
And the database, all of it constantly accessed and all of it stored in structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.
By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150 TB.
eBay reportedly operates databases that process 10 billion records per day, and are also able to do deep business analysis. They collectively store more than 6 PB of data, though the single largest system is estimated at about 1.4 PB or larger.
Even larger than Yahoo or eBay are the databases of the National Energy Research Scientific Computing Center in California whose archives include 3.5 PB of atomic energy research data, and the World Data Centre for Climate in Hamburg, Germany, which has 220 TB of data (download PDF) in its Linux database but more than 6 PB of data archived on magnetic tape.
But Hasan says that archived data is far different from live, constantly accessed data.
"It's one thing to have data entombed, it's another to have it readily accessible for your queries," he said. He also points out that other large databases store unstructured data such as video or sound files. Those can bulk up a database's size without providing easily analyzable data.
- Bookmark this page
- Share this article
- Got more on this story? Email Computerworld
- Follow Computerworld on twitter
- Eight threats your antivirus won’t stop - Why you need endpoint security
- INFORMATION FOR SUCCESS - Customers Achieve Extreme Performance at Lowest Cost with Oracle Exadata Database Machine
- Sun Blade 6000 Modular System: Power and Cooling Efficiency
- Consolidated Storage for Virtualised Server Environments
- There is a HP Printer for everyone
-
The NBN, service providers and you... what could go wrong?
-
Facebook vs. Google: Who will win?
-
Alternatives to Raspberry Pi you can get right now
-
Wednesday Grok: Microsoft’s browser lockout is to be pitied more than despised
-
Change My Password logs 10 millionth account
-
MYOB Software for Dummies 6E Australian Edition
-
Microsoft Office
-
Excel 2007 All-In-One Desk Reference for Dummies
-
Computers for Seniors for Dummies, 2nd Edition
-
Office 2007 for Dummies
-
Office 2007 All-In-One Desk Reference for Dummies
-
Teach Yourself Visually Windows 7
-
Windows 7 for Seniors for Dummies®
-
Windows 7 for Dummies®









Comments
Post new comment