Power key challenge in race to exascale computing
- 19 December, 2017 13:28
Lenovo ANZ DCG CTO, Joao Almeida
Addressing the enormous power demands of exascale computing remains the biggest challenge in the race to push the boundaries of supercomputing, according to Joao Almeida, CTO at Lenovo ANZ’s Data Centre Group.
Exascale supercomputers remain at least two or three years away, but teams in the US, the EU, China and Japan are hard at work on tackling the technical barriers to systems that can deliver sustained performance in excess of 1000 petaflops.
In June, China’s National Research Centre of Parallel Computer Engineering and Technology and the National Supercomputing Centre revealed plans to launch an exascale prototype system by June next year.
The Sunway system is one of three projects in China to build exascale prototypes.
Earlier this month, the Mont-Blanc 2020 effort to build a low-power processor for exascale computing kicked off. The project is funded by the European Commission under the Horizon2020 program.
Meanwhile, the Exascale Computing Project, a collaboration between the Office of Science and the National Nuclear Security Administration, is seeking to push forward supercomputing development in the United States.
Meeting, and where possible reducing, the energy demands of exascale-capable systems remains a key focus for project teams around the world, Almeida said.
The current fastest supercomputer— Sunway TaihuLight based at China’s National Supercomputing Centre — is capable of sustained performance of 93 petaflops and requires over 15 megawatts of power to run.
“An exascale system is 1000 times bigger than petaflop system, so you can see the amount of power consumption that is needed to run a system that size is massive,” Almeida said. “It’s bigger than some of the cities in the world.”
The energy swing when a supercomputer receives a job can be “a challenge for a power grid to deliver”, the CTO added. “It’s like turning on an entire city at the same time.”
Almeida said that exascale requires technological advances on two key fronts. The first is CPU and GPU designs.
Over the last few years there has been a “very, very high level of innovation and improvement on CPU/GPU technologies,” he said. “They’re getting more and more compute capacity on the same power density.”
The other is watercooling technologies, Almeida said, which is an area where Lenovo believes it can make a significant contribution to the exascale race.
“Any system that is going to be exascale has to be water cooled,” he said. “It’s simply impossible to air cool, like we used to do, a system of that size.”
The vendor expects to shortly announce details of a new supercomputer project that represents “a real breakthrough in terms of watercooling innovation”.
Almeida said that Lenovo’s key innovation is hot-water cooling, employing water at significantly warmer temperatures than most watercooling systems.
“We’re running watercooling technologies with 55-degree input water,” he said. “That’s warm water without the need to have it chilled. Just the fact that you don’t have to chill the water is an advantage on its own because of the wasted energy that it takes to chill water.”
“Running hot water you can actually do a very interesting exercise,” he added. “You take warm water at 55 degrees and you run it through the system – it comes out at about 65 degrees; so about 10 to 12 degrees variance.”
That output water can be fed through absorption chillers to cool systems, such as storage, that are not appropriate for direct watercooling.
Lenovo’s approach involves a “holistic system view,” Almeida said. “We cool the entire system and we go beyond the entire system — we go into other parts of the data centre,” he said.
The approach can deliver heat recovery efficiencies in excess of 90 per cent, he added.
“We started a very long time ago and this is our fourth generation of warm watercooling technology,” the CTO said. “If you look at the experience we have running very large systems worldwide, it puts us in a leading position in relation our competitors.”
“We can go up to 93 per cent heat recovery, and the 7 per cent is really something we cannot do anything about because it’s radiant heat – there’s really nothing else we can do about it at the moment,” he added.
The innovations that make exascale computing possible will flow through to smaller systems, Almeida said. “The race for exascale is going to produce what I believe is a breakthrough in CPU innovation,” the CTO said. We’re going to be learning many lessons from that challenge, and those lessons will be trickling down,” he said.
Lenovo’s hot-water approach to watercooling is already employed in a 30-node system as well as a number of similar-sized systems.
“Our solution is very modular,” Almeida said. The smallest system that it can work on is a 6U installation, the CTO said.
“So if a customer is focusing on energy efficiency, they can start with a 6U-high compute cluster which has 12 compute nodes and that can be directly watercooled. We’re already using that technology to cater to some of our customers with smaller workload needs.”
Almeida drew an analogy with Formula One and the development of anti-lock braking systems.
“It was developed for F1 and now every car in the world and motorcycle in the world has ABS,” the CTO said. “I think that the biggest advantage from the exascale race is going to be the innovation that we’re going to get from energy efficiency, cooling technologies, CPU efficiency and compute technologies.”