The Australian Greens have called for a revamp of the government’s approach to releasing sensitive data after a University of Melbourne study revealed that two supposedly de-identified datasets could be re-identified.
A study by University of Melbourne researchers Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague found that health data released by the government could be re-identified.
“We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual such as medical procedures and year of birth,” Culnane said.
“This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy.”
The government last year unveiled proposed laws to criminalise the act of re-identifying government data sets after the Melbourne Uni researchers revealed flaws in the Department of Health’s de-identification process for data it released via data.gov.au.
“Open publication of de-identified records like health, census, tax or Centrelink data is bound to fail as it is trying to achieve two inconsistent aims: The protection of individual privacy and publication of detailed individual records,” Teague said today.
“We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data. Legislating against re-identification will hide, not solve, mathematical problems, and have a chilling effect on both scientific research and wider public discourse.”
“Given 10 per cent of Australian’s are included in this historical data this public release can effectively be viewed as a data breach on the grandest scale,” the Greens’ digital rights spokesperson, Senator Jordon Steele-John, said in a statement after the research was published.
“Whilst I would agree that access to this kind of data is very important for research and innovation, it must be provided in a format that protects the security of detailed individual records.”
“It is critical that this kind of information is only ever given out in a secure research environment with greater control and visibility for patients over their data,” Steele-John said.
The government’s proposed criminalisation of re-identification has previously drawn criticism from Labor and Greens senators, as well as security researchers.
“Legislating against misuse of this kind of data will not stop it occurring, especially when it is this easy to re-identify individuals’ records,” Steele-John said today. “What are the implications for other publicly released data sets that are supposedly ‘de-identified’ and secure?”
“The Australian government holds vast quantities of information about individual Australians,” the Melbourne Uni researchers argue in their paper.
“It is not really ‘government data’. It is data about people, entrusted to the government’s care. Data about government should be published openly and freely - not so for sensitive data about people. That should be published only when a clear, public explanation of the encryption and anonymization methods has received enough peer review and public scrutiny to convince everyone that personal information will remain private.”
“For some datasets, including the MBS/PBS unit-record level data, this is probably not possible,” the researchers conclude.