Dr Brian Boyle, CSIRO SKA director
You touched on data storage and processing. Just how big are the processing and storage challenges?
They’re phenomenal. We’re talking about exabytes of raw data coming off of telescopes. Now there will be an element of data reduction at all stages, so at each telescope you will reduce the data a little bit and each of the telescopes will go to a central processing point where we will reduce the data further — at least in the core and then all the various satellite stations around it — and then all that data across Australia and New Zealand will then go to the central processor, the exaflop machine, which actually forms the images of the telescope.
Are you considering discarding much of the data?
That’s one of the possible ways of doing it. There is a real balancing act between how much you can process, how much you can store and how much you [can] transport. And, to some level, it depends where you are in all those technologies. How expensive are exabyte storage solutions in 2020? How expensive is the exaflop computer? What is the power requirement for each of those? At the moment, we are looking to actually lose a lot of data — up to 99 per cent — but even losing 99 per cent of the data you’re still talking about 10 petabytes minimum a day of data to be stored, which is a remarkable amount.
How do you work out what is redundant and what is useful?
That’s part of the process — to be able to take all that data. A lot is what we would call visibilities and Fourier space. Radio telescopes do things in quite a funny way because when you’re looking at the sky with a radio telescope it’s not like an optical telescope where you’ve got one big mirror. In a radio telescope, you’ve got lots of different mirrors and the image actually forms by allowing the sky to rotate above you, so you actually fill in the gaps between all the antennae. So, you have to process it in what we call Fourier space. It’s a geometrical transformation of normal X,Y space.
What are we going to lose? Are we going to lose real data and real information? We would hope that most of the stuff that we lose is essentially blank sky or blank information. We cannot rule out, of course, something happening. You might throw away data that over a certain time average period was zero, but it was actually something happened. A blip happened in it like a supernova or a big exploding star going off that you didn’t pick up.
So, what we will try to do is keep as much of the data for as long as possible so that if we realised perhaps from another telescope something had happened that we hadn’t seen we could recover it from the data and then resurrect it later on. But the actual volumes of the data are so huge [that] our ability to store every single last piece of information from this constant data stream is going to be limited to about 30 seconds at the moment.
It is like drinking from a data fire hydrant. The real challenge is around the processor to cope with this phenomenal data flow.
With that kind of processing, how would power and cooling work? Can you use solar energy?
The central site for the full SKA is going to be 60 megawatts. Now, you have a few options: You could build out the grid or you could have your own renewable source of energy or your own diesel energy. But powering a telescope with 60 megawatts with a diesel power generator is going to be pretty dirty, not to mention very expensive. So certainly, the international project team are looking at ways of which we can increase the renewable penetration.
What’s it going to be? This is a telescope that is operating 24/7 and solar power is not 24/7. So you have to think of efficient ways of either having base load power. It could be geothermal. It could be wind. Or, you have significant battery storage or biodiesel backup or you could look at things like the provision of gas-fired power stations.
Lots of the work that we’re doing at the moment is looking into the best possible energy supply for this telescope. And the cost trails are interesting because you’re building a telescope in 2020. What is going to be a better solution? Where are the costs? In the same way as we’re looking at processing versus storage versus data transport we’re looking at energy generation versus storage versus distribution. They all have different cost curves in the way that they’re growing.
Right now, I don’t know the answer because these things are quite variable. We’re working on it — the CSIRO has its own renewable energy teams looking at this. We’re also working with people like the Fraunhofer Institute for renewable energy systems in Germany and with places like the Boeing company, all of whom are looking at this big challenge of how to power very large pieces of infrastructure.
There is also the generation of storage and the demand side, so you’re also looking at how to make computers much more efficient. If you scale up today’s current high-end computers, we wouldn’t just require 60 megawatts — it would be gigawatts.
Those are big challenges: Does Moore’s Law keep on going or are we going to be stymied, not by computer architecture, but by our ability to power the damn things. And cooling becomes an increasingly large factor.
Did you have to factor in the Carbon Tax when you planned your submission?
We factored in the cost of power, based on the advice of the consultant engineers involved in the overall project. They would have made some allowance for the appropriate carbon tax. Are you considering distributed computing options? You have the SkyNet, which runs on fixed datasets. Could you open it up globally and ask people to donate processing power and storage?
The trouble is that the torrent is so great that the process of farming it out to individual computers and then getting it back would give us an even greater headache than having the computer in one place. The distribution of data would be the real challenge. At the moment, given the nature of the data stream, it’s actually quite difficult to do. If we could parse it out more discretely and more effectively then it’s a possibility, but our current preferred option is a single data stream.
[Having said that, we are also looking at] the provision of broadband data links around the world because people will want to have real-time access to these reduced datasets that we provide from the central supercomputer and we’re confident that will be able to provide people with the appropriate level of access.
One of the biggest headaches for project managers is delivering value when the outcomes are unknown. How does that work in the context of the SKA?
We need to have very clear requirements for the SKA, but equally, I don’t know the universe well enough to say what those technical requirements, when they’re delivered, will actually discover. If we knew what we were looking for there would be no point in building the telescope. Astronomy is, fundamentally, an observational science, so you’re only constructing the next biggest thing because antiques should be on the boundaries of where you were.
Columbus knew how to build his boat. He didn’t know what he was going to discover; he just knew that it would give him the vehicle to discover it. When he went the Queen Isabella he knew exactly how to provision himself for the journey, what it would cost, how he had to outfit the boat and who he needed on the boat. That’s a bit like the SKA.
We have to be clear about the technical requirements. We have to be clear about who is going to build it and we have to be able to resource that. Once we have all that, we deliver to the requirements — and then we embark on our voyage of discovery.
So what has been one of the biggest technical challenges of the Australian SKA Pathfinder (ASKAP) program to date?
In the past, radio telescopes operated by having a single pixel camera at the focus and combining all the information together to generate an image. Now, there have been some experiments to increase the number of pixels that you use, so you increase the amount of sky you look at simultaneously. It’s kind of like a panoramic photograph with more megapixels or better resolution on your camera, but those sort of 10-pixel cameras have been patchy coverage rather than contiguous pixels. So at ASKAP, we decided to build a 100-pixel camera that is contiguous. It’s like a very, very big CCD in the back of your camera.
The trouble is radio waves are much bigger than the optical waves. Correspondingly, your cameras are the size of a 44 gallon oil drum! Also, because of the electromagnetics of radio waves, they interfere with each other and give you horrible problems at the boundaries. That is why you don’t have sort of contiguous detectors.
However, we managed to attract a chap back to the CSIRO called John O’Sullivan, who was the leader of the team that did wireless, did 802.11. He thought something like this was one of the biggest engineering challenges you could face and we recently released on our website our first images of our radio galaxy taken with the ASKAP radio camera. We’ve cracked the problem of delivering a radio camera that is sensitive enough to do radio astronomy.
From a project management perspective, this is our biggest risk and so I'm really delighted. Scientists dream dreams and engineers make them happen and I've never lost money yet by banking on a good engineer.
Tim Lohman contributed to this interview
Download the CIO Australia iPad app CIO Australia for iPad