Australian defence and national security organisations are "drowning in the data" they generate, Brenton Cooper from the Defence Systems Innovation Centre told a Canberra summit on big data today.
Addressing the Australian Information Industry Association's Navigating Big Data conference, Cooper, DSIC's technical director for information superiority, said Defence was generating a "huge amount of data".
"The challenge and the opportunities that we see [are making] the processes that the organisations are trying to deal with, rather than task-driven, make them data-driven," Cooper said.
"So let the data drive the business process."
"To do that, you really need to build the machine enablement, the smart analytics, that can turn the data into information the organisation can then use to make the decisions that they need to," Cooper said.
Cooper gave examples of two of DSIC's data-heavy research areas where the organisation has employed machine learning to deal with the onslaught of data.
"We've been doing a little bit of work in collaboration with Defence's Imagery and Geospatial Organisation," Cooper said.
"One of the tasks they deal with is looking at the satellite feeds, the imagery feeds that they receive from coalition systems." One of DIGO's tasks is monitoring changes in facilities in other countries.
"So they might be looking at a port in a foreign country, or a nuclear facility in a foreign country, and trying to detect changes in it that are of significance," Cooper said.
If the organisation was only receiving a few images a week, they could easily be studied and assessed by an analyst. "But if you're receiving 100,000 [images] a week and you have 40 analysts, analysts, it's very, very difficult to do," Cooper said.
DSIC has been investigating the use of machine learning "to really put some semantic understanding around those images, and therefore look at changes at that semantic level," Cooper said.
"By doing that, you're making it a data-driven process."
Citing 'The Unreasonable Effectiveness of Data', a paper written by Google's Alon Halevy, Peter Norvig, and Fernando Pereira published in IEEE 2009, Cooper argued that the scale of data used for machine learning could be much more important than the completeness of a data set.
He said that DSIC has recently conducted research into using machine learning to analyse free text documents to discover relationships between entities such as people, organisations and locations.
"So for example you might see that Brenton Cooper is the technical director of DSIC, so therefore there's a relationship between myself and DSIC," Cooper said.
"What we've found is you can build a machine learning system to learn those relationships, and if you train it with a small number of examples, say 40,000 examples, the performance is really poor," Cooper said.
"It's almost unusable.
"But if you start talking about using several million training examples then the performance becomes as good as a human and so even the data is incomplete, it's noisy, you can still use it effectively to make the decisions that you need ."