Big data is poised to drive advances in the highly promising area of cancer genomics as researchers use information from increasingly disparate sources to identify the genetic and external factors that trigger this group of potentially deadly diseases.
This is the view of Dr Ian Gibson, CEO at Intersect, Australia’s largest research support agency providing high performance computing services to NSW’s 10 universities as well as the University of Canberra in the ACT.
“There’s a number of areas in science that are well-known for requiring a lot of computational effort – genomics is one of them,” Dr Gibson told CIO Australia.
“Just at the moment we are on the verge of being in a position to have a lot of data being aggregated in one place from different and disparate sources.
“This didn’t really happen in the past – once that data all comes together, people can be more creative in how they look for causal relationships between things. I think this goes to the heart of many science and research disciplines.”
Dr Gibson said at its heart, cancer is influenced by causal factors which are tied to a particular individual’s genetic structure. This is well understood with diseases such as breast cancer, he said.
“There are also lots of external environmental factors that influence cancers and other large-scale diseases,” said Dr Gibson.
“For example, if you have a database of all the melanoma suffers and it was spatially-enabled – indicating where they live – together with environmental monitoring data that shows the intensity of ultraviolet [light] over a period of time … you would then be able to do a correlation between those two.”
This would remove the need to do a ‘post-op study’ identifying these melanoma sufferers and asking them about their history, said Dr Gibson.
“It becomes a data mining task rather than a more traditional scientific study which takes a lot longer. We’re in a position for people to speculate, and use computers to test a theory very quickly without disturbing anybody before moving on.
“This is going to be an important step for accelerating the progress in a lot of areas and this is just one example.”
Federal and state government policies on open source will be vital to the success of big data projects in the scientific community as agencies capture rich sources of information, which can be used in many ways.
“Traditionally, each university researcher has kept their data to themselves, and governments have kept their data and released their reports according to their legislative requirements,” Dr Gibson said.
“This has restricted any secondary use of that data that the original custodians couldn’t have possibly imagined.”
Although government departments across Australia now have legislation around open data, there’s a lag in implementation, Gibson said.
“From my view, at the top they make a statement that government data should be released to the public unless there’s a good reason why not.”
He said all government agencies are going through the process of working out policies and processes around releasing information to ensure strict privacy rules are adhered to.
“So I don’t think there’s anything drastic that needs to change other than this is going to take some time,” he said. “There’s a lot of good government data available.
“If you look at where we were two years ago, there’s a pretty good acceleration of this program so I think if we have another look in two years, we’ll see a substantial step up.”