How Amazon's cloud solved the "perfect storm" in networking
- 14 November, 2014 04:14
LAS VEGAS - Amazon Web Services operates at huge scale - in Gartner's latest Magic Quadrant the research firm estimated AWS's capacity at five times larger than the company's next 14 competitors combined.
The cloud has 11 regions around the world made up of 28 availability zones, each of which has at least one data center and each data center has between 50,000 and 80,000 servers. Every day AWS adds enough new capacity to its cloud to power the needs of Amazon.com in 2004 when it was a $7 billion revenue company.
And there was one thing AWS Vice President and Distinguished Engineer James Hamilton was worried would slow the whole thing down: the network.
+ MORE AT NETWORK WORLD: Amazon gingerly embraces the hybrid cloud +
There's a "perfect storm" in the networking industry, Hamilton said during a presentation at AWS re:Invent. "It's a problem, a red-alert situation."
A variety of factors make networking such a large concern for AWS. First of all, the relative cost of networking compared to compute and storage is increasing. Server prices are falling, while networking prices are "frozen in time," Hamilton said.
As the cost of compute capacity falls, each server is more densely packed with virtual machines. That alone strains the network, but more network-intensive advanced data analytics processes are being done too, adding additional stress. The main concern was traffic within their data centers, so called "east-west" traffic compared to the inbound and outbound traffic, referred to as "north-south."
AWS's answer to this was a simple one: The company built its own network and equipment. Hamilton said it was an audacious move but at AWS's scale, it was a natural solution.
Starting years ago AWS worked with original networking equipment manufacturers and designed their own custom networking gear. AWS has built a customized protocol it now uses to run its cloud. Not only have costs reduced, but availability went up. Hamilton says AWS gets advantages from working directly with manufacturers. Many networking hardware companies today are unable to specify their equipment to their exact customers' needs. By working directly with the manufacturers, AWS can.
That's not enough to support AWS's massive scale though. AWS has its own private network to connect its regions. By not relying on public providers, the company has dedicated tunnels, that produce increased availability, higher performance with less jitter and reduced cost. "It's more reliable, a cheaper link and lower latency," Hamilton said. "It's just a happier place to be."
AWS has a unique architecture compared to other IaaS cloud providers. Behind each of the 11 regions are Availability Zones, which offer physically separated data centers to create fault tolerance in each region. In front of each region are transit centers, which provide connections to three main destinations: between the Availability Zones and other regions; to AWS's Direct Connect partners; and to the public Internet.
AWS has built redundant paths into its Availability Zones so that if one goes down in a region, then the region can survive. Each of the Availability Zones within a region has latencies of less than 2 milliseconds between them, and usually closer to 1 millisecond. Every Availability Zone has at least its own data center and US-East, AWS's oldest region has an Availability Zone with five data centers.
AWS has been notoriously closed mouth about the inner workings of its operations, but Hamilton shared the view to show how the company is able to operate and innovate so fast. This year the company expects to release about 500 updates to its cloud. Meanwhile, it's Simple Storage Solution (S3) is growing at more than 120% usage year-over-year; the Elastic Compute Cloud (EC2) is growing 99% every year. Not every company will be able to take measures like AWS did to solve its problems, but not every company has the scale and problems like AWS does either.