How Amazon's DynamoDB helped reinvent databases
Behind every great ecommerce website is a database, and in the early 2000s Amazon.com's database was not keeping up with the company's business.
Part of the problem was that Amazon didn't have just one database it relied on a series of them, each with its own responsibility. As the company headed toward becoming a $10 billion business, the number and size of its SQL databases exploded and managing them became more challenging. By the 2004 holiday shopping rush, outages became more common, caused in large part by overloaded SQL databases.
Something needed to change.
But instead of looking for a solution outside the company, Amazon developed its own database management system. It was a whole new kind of database, one that threw out the rules of traditional SQL varieties and was able to scale up and up and up. In 2007 Amazon shared its findings with the world: CTO Werner Vogels and his team released a paper titled "Dynamo Amazon's highly available key value store." Some credit it with being the moment that the NoSQL database market was born.
The problem with SQL
The relational databases that have been around for decades and most commonly use the SQL programming language are ideal for organizing data in neat tables and running queries against them. Their success is undisputed: Gartner estimates the SQL database market to be $30 billion.
But in the early to mid-2000s, companies like Amazon, Yahoo and Google had data demands that SQL databases just didn't address well. (To throw a bit of computer science at you, the CAP theorem states that it's impossible for a distributed system, such as a big database, to have consistency, availability and fault tolerance. SQL databases prioritize consistency over speed and flexibility, which makes them great for managing core enterprise data such as financial transactions, but not other types of jobs as well.)
Take Amazon's online shopping cart service, for example. Customers browse the ecommerce website and put something in their virtual shopping cart where it is saved and potentially purchased later. Amazon needs the data in the shopping cart to always be available to the customer; lost shopping cart data is a lost sale. But, it doesn't necessarily need every node of the database all around the world to have the most up-to-date shopping cart information for every customer. A SQL/relational system would spend enormous compute resources to make data consistent across the distributed system, instead of ensuring the information is always available and ready to be served to customers.
One of the fundamental tenets of Amazon's Dynamo, and NoSQL databases in general, is that they sacrifice data consistency for availability. Amazon's priority is to maintain shopping cart data and to have it served to customers very quickly. Plus, the system has to be able to scale to serve Amazon's fast-growing demand. Dynamo solves all of these problems: It backs up data across nodes, and can handle tremendous load while maintaining fast and dependable performance.
"It was one of the first NoSQL databases," explains Khawaja Shams, head of engineering at Amazon DynamoDB. "We traded off consistency and very rigid querying semantics for predictable performance, durability and scale those are the things Dynamo was super good at."
DynamoDB: A database in the cloud
Dynamo fixed many of Amazon's problems that SQL databases could not. But throughout the mid-to-late 2000s, it still wasn't perfect. Dynamo boasted the functionality that Amazon engineers needed, but required substantial resources to install and manage.
The introduction of DynamoDB in 2012 proved to be a major upgrade though. The hosted version of the database Amazon uses internally lives in Amazon Web Services' IaaS cloud and is fully managed. Amazon engineers and AWS customers don't provision a database or manage storage of the data. All they do is request the throughput they need from DynamoDB. Customers pay $0.0065 per hour for about 36,000 writes to the database (meaning the amount of data imported to the database per hour) plus $0.25 per GB of data stored in the system per month. If the application needs more capacity, then with a few clicks the database spreads the workload over more nodes.
AWS is notoriously opaque about how DynamoDB and many of its other Infrastructure-as-service products run under the covers, but this promotional video reveals that the service employs solid state drives and notes that when customers use DynamoDB, their data is spread across availability zones/data centers to ensure availability.
Forrester principal analyst Noel Yuhanna calls it a "pretty powerful" database and considers it one of the top NoSQL offerings, especially for key-value store use cases.
DynamoDB has grown significantly since its launch. While AWS will not release customer figures, company engineer James Hamilton said in November that DynamoDB has grown 3x in requests it processes annually and 4x in the amount of data it stores compared to the year prior. Even with that massive scale and growth, DynamoDB has consistently returned queries in three to four milliseconds.
Below is a video demonstrating DynamoDB's remarkably consistent performance even as more stress is put on the system.
To see a demo of DynamoDB, jump to the 16:47 mark in the video.
Feature-wise, DynamoDB has grown, too. NoSQL databases are generally broken into a handful of categories: Key-value store databases organize information with a key and a value; document databases allow full documents to be searched against; while graph databases track connections between data. DynamoDB originally started as a key-value database, but last year AWS expanded itto become a document database by supporting JSON formatted files. AWS last year also added Global Secondary Indexes to DynamoDB, which allow users to have copies of their database, typically one for production and another for querying, analytics or testing.
NoSQL's use case and vendor landscape
The fundamental advantage of NoSQL databases is their ability to scale and have flexible schema, meaning users can easily change how data is structured and run multiple queries against it. Many new web-based applications, such as social, mobile and gaming-centric ones, are being built using NoSQL databases.
While Amazon may have helped jumpstart the NoSQL market, it is now one of dozens of vendors attempting to cash in on it. Nick Heudecker, a Gartner researcher, stresses that even though NoSQL has captured the attention of many developers, it is still a relatively young technology. He estimates revenues of NoSQL products to not even surpass half a billion dollars annually (that's not an official Gartner estimate). Heudecker says the majority of his enterprise client inquiries are still around SQL databases.
NoSQL competitors MongoDB, MarkLogic, Couchbase and Datastax have strong standings in the market as well and some seem to have greater traction among enterprise customers compared to DynamoDB, Huedecker says.
Living in the cloud
What's holding DynamoDB back in the enterprise market? For one, it has no on-premises version it can only be used in AWS's cloud. Some users just aren't comfortable using a cloud-based database, Heudecker says. DynamoDB competitors offer users the opportunity to run databases on their own premises behind their own firewall.
And many organizations still get great value out of SQL systems. Those RDBMs aren't going away they're still great for enterprise systems of record.
Perhaps the biggest criticism against DynamoDB that it only lives in the cloud is also one of its biggest selling points, AWS officials contend.
Shams, AWS's DynamoDB engineering head, says because the technology is hosted in the cloud, users don't have to worry about configuring or provisioning any hardware. They just use the service and scale it up or down based on demand, while paying only for storage and throughput, he says.
For security-sensitive customers, there are opportunities to encrypt data as DynamoDB stores it. Plus, DynamoDB is integrated with AWS - the market's leading IaaS platform (according to Gartner's Magic Quadrant report), which supports a variety of tools, including other relational databases such as Aurora and RDS.
Adroll rolls with AWS DynamoDB
Marketing platform provider Adroll, which serves more than 20,000 customers in 150 countries, is among those organizations comfortable using the cloud-based DynamoDB. Basically, if an ecommerce site visitor browses a product page but does not buy the item, AdRoll bids on ad space on another site the user visits to show the product they were previously considering. It's an effective method for getting people to buy products they were considering.
It's really complicated for AdRoll to figure out which ads to serve to which users though. Even more complicated is that AdRoll needs to decide in about the time it takes for a webpage to load whether it will bid on an ad spot and which ad to place. That's the job of CTO Valentino Volonghi --he has about 100 milliseconds to play with. Most of that time is gobbled up by network latency, so needless to say AdRoll requires a reliably fast platform. It also needs huge scale: AdRoll considers more than 60 billion ad impressions every day.
AdRoll uses DynamoDB and Amazon's Simple Storage Service (S3) to sock away data about customers and help its algorithm decide which ads to buy for customers. In 2013, AdRoll had 125 billion items in DynamoDB; it's now up to half a trillion. It makes 1 million requests to the system each second, and the data is returned in less than 5 milliseconds -- every time. AdRoll has another 17 million files uploaded into Amazon S3, taking up more than 1.5 petabytes of space.
AdRoll didn't have to build a global network of data centers to power its product, thanks in large part to using DynamoDB.
"We haven't spent a single engineer to operate this system," Volonghi says. "It's actually technically fun to operate a database at this massive scale."
Not every company is going to have the needs of Amazon.com's ecommerce site or AdRoll's real-time bidding platform. But many are struggling to achieve greater scale without major capital investments. The cloud makes that possible, and DynamoDB is a prime example.