How Facebook dealt with its big data problem
- 18 February, 2014 10:21
For a social media company that manages billions of messages and photo uploads per year, Facebook's ability to keep on top of that didn't come without experiencing some “growing pains”, according to former technical lead Kannan Muthukkaruppan.
Muthukkaruppan, who now works at Nutanix in the United States as an engineer, spent almost six years at Facebook from 2007 to 2013, a time when the social media company made significant changes to its data storage due to growing pains.
Muthukkaruppan gave an example where all user details, which included people’s profiles and list of their friends, were kept in an MySQL open source data base.
“MySQL is good for some workloads but not data intensive workloads. Everything was bursting at the seams in 2007 so we had to look at this,” he said.
- ShutterStock uses OpenStack to reduce network latency issues
- eBay bids on big data challenge
- Instagram drives triple j Road Trip Relay website
Within the company’s data warehouse, information was kept in an open source file system called Hadoop distributed file system (HDFS).
Muthukkaruppan said the social media company was constantly buying new servers because it was running out of storage space for photos. It was using network attached storage (NAS) for photo storage.
“The challenge Facebook had was to design infrastructure out of simple building blocks that you don’t have to throw away every two years,” he said.
To solve the growth issues, Muthukkaruppan considered HBase, a distributed database that can scale to petabytes.
Facebook messaging was the first application that used HBase in 2010.
“[Facebook CEO] Mark Zuckerberg’s vision was to unify all mediums of communication into a single product. The new version of Facebook messages meant that every chat message had to be stored,” he said.
This was because Facebook chat was generating five billion messages per day in 2010 with a user base of 350 million. Today, the amount of messages is at 10 billion.
In 2007, there were 1.7 billion photos uploaded to the site. By 2013, this had grown to 250 billion photos per year.
By November 2011, Facebook was growing at 20 petabytes of disk space per year.
“All of this data needs to be protected for disaster recovery purposes so we gave up the traditional NAS. We went with servers and x86 Intel boxes. That gave us huge cost savings,” he said.
Follow Hamish Barwick on Twitter: @HamishBarwick
Follow Techworld Australia on Twitter: @Techworld_AU
Join the Computerworld Australia group on Linkedin. The group is open to IT Directors, IT Managers, Infrastructure Managers, Network Managers, Security Managers, Communications Managers.
Thanks a million, Drupal
Optus goes over the top with VoIP service
Turnbull asks how the NBN got that way
U.S. retailers insist on PIN requirement in smartcard rules
Yelp speeds database access with flash storage