Garvan Institute eyes FPGA boost to genomics research
- 29 August, 2018 00:01
The parallel processing capabilities of field-programmable gate arrays (FPGAs) are set boost to the efforts of the Garvan Institute of Medical Research as it works to understand the human genome.
The Garvan Institute is one of Australia’s highest-profile biomedical research institutes and a leader in genomics — a key component of precision medicine: The idea of delivering medical treatment specifically tailored to a particular individual.
Advances in computing have dramatically reduced the cost and time it takes to sequence a genome, from close to US$3 billion and 10 years for the Human Genome Project to today where Garvan can sequence around 50 genomes a day for US$1000. However, the institute looks set to speed up the process even further following a significant expansion of its high-performance computing (HPC) cluster.
Beginning in June, the institute rolled out Dell EMC’s 14th generation PowerEdge HPC platform. The upgrade to Garvan’s HPC cluster has involved deploying 47 Dell EMC PowerEdge servers offering an additional 1632 Intel Xeon cores, 122,880 NVIDIA Tesla V100 CUDA cores, and 15,360 NVIDIA Tensor cores.
In addition, the new HPC system has 41TB of RAM and 744TB of NVMe, and access to 530TB of CephFS storage.
The project also involved an upgrade to the power and cooling of the Garvan Institute’s data centre, as well as a boost to its network; HPC nodes are connected either via 25GbE or 100GbE.
The platform is expected to be able deliver an up to 330 per cent increase in the number of genomes processed each day – for about the same amount of energy consumed, according to Andrew Underwood, Dell EMC’s HPC lead for Australia and New Zealand.
The institute is now preparing to install 10 Intel Arria 10 GX FPGAs as part of the project
FPGAs have become so commoditised that they can now be put into a common compute platform, Underwood told a media briefing at the Dell Technologies Forum in Sydney.
“If you think of a CPU processor it might have up to 28 compute cores and each core can undertake a task in serial – and they’re really designed to process data in serial,” Underwood said. “They’re parallelised through their ability to do multiple threads and multiple cores on each CPU — but an FPGA is designed for much higher throughput and for parallelisation of data.
“On a single FPGA device, we can actually see that it can move data on the device up to 30 times faster than a traditional central processing unit. What that means is larger datasets can be processed faster, potentially [at a] lower latency.”
The use of FPGAs means that not only will researchers get responses to their queries quicker, but the system will be able to process more datasets.
“One of the goals of this partnership is to drive the adoption and the use of FPGAs,” Underwood said.
By the end of the year a large portion of the Garvan Institute’s genomics analysis pipeline may have shifted to FPGAs, opening up the potential to explore next-generation sequencing techniques that can’t be performed using a traditional CPU, he said.
Dr Warren Kaplan, the chief of informatics at Garvan’s Kinghorn Centre for Clinical Genomics, said that the HPC upgrade was the third expansion of the institute’s compute cluster. The original cluster was rolled out in 2012, also in partnership with Dell.
“We still have great utility for the original machines we got six years ago,” he said. “They still run 24/7 and carry enormous workloads.”
Underwood said that the software-defined approach to HPC that the Garvan Institute has taken made a staged installation of the new nodes possible.
“As Garvan potentially expands in the future, should the dataset size grow or the computational demands grow, because it’s a software-defined system it’s a lot easier to continuously add to – [you’re] looking at minutes to add [a node], rather than hours or days,” he said.
Kaplan said that the institute was “really enthusiastic” about the potential of FPGAs partly because it relied heavily on the Broad Institute’s Genome Analysis Toolkit (GATK). Version 4 of GATX saw an influx of commercial investment in the project, including in allowing the software to take advantage of the capabilities of FPGAs.
“So that’s really going to be our starting point to be able to immediately get going in this area,” he told Computerworld.
However, over the years the institute has also seen a change in its demographics and lot of engineers with low-level programming experience have joined the organisation — and many of them have “expressed a lot of enthusiasm with the coming of the FPGA cards and all the things they’re going to do with it,” he added.
“So we have an immediate use case for the cards coming in and that’s the Genome Analysis Toolkit — in terms of workflow it’s running within the application itself and we just shift the location of where computing is occurring from the CPU to the FPGA — but we’re hoping that this will diversify our opportunities to [be] able to drive innovation in this area,” Kaplan said.
Dell EMC plans to work with researchers over the life of the system to help them take advance of FPGAs capabilities, Underwood said.