The Australian Privacy Commissioner has launched an investigation following revelations that doctor and other service provider ID numbers may be able to be extracted from datasets released by the Department of Health.
The department has temporarily removed datasets that were drawn from the Pharmaceutical Benefits and Medicare Benefits schemes and published on the government’s open data portal, data.gov.au.
The revelation comes as the attorney-general says he will introduce legislation to make re-identifying government data a crime, potentially criminalising the research that unearthed the health department privacy breach.
“No patient information has been compromised, and no information about the health service providers has been publicly identified or released,” the department said in a statement.
The department on 1 August released some 1 billion lines of historical health data relating to around 3 million Australians.
The department said the information included details on services provided to Australians by doctors, pathologists, diagnostic imaging and allied health professionals together with details of subsidised scripts.
The release included Medicare data dating from 1984 to 2014 and PBS data from 2003 to 2014.
The Department of Health pulled the data set after being alerted by Melbourne University researcher Dr Vanessa Teague. Teague and her colleagues managed to decrypt some service provider ID numbers.
Teague along with Dr Chris Culnane, Dr Benjamin Rubinstein have published details of their research.
“Partial details about the linkable encryption algorithm were described
online at data.gov.au, but were later removed at the same time as the
dataset,” the researchers wrote. “Although neither the exact algorithm nor the details of
subsequent processing were described in detail, we could guess those
details for provider IDs and use the dataset to check our hypothesis. We
were able to decrypt every service provider ID in the MBS dataset.”
The trio recommended that details of privacy protections employed for datasets should be published ahead of the datasets themselves, in order that they “be subject to empirical testing, scientific analysis, and open public
review, before they are used on real data”.
“Then we can make sound,
evidence-based decisions about how to benefit from open data without
sacrificing individual privacy,” the researchers wrote.
“The primary purpose of the investigation is to assess whether any personal information has been compromised or is at risk of compromise, and to assess the adequacy of the Department of Health’s processes for de-identifying information for publication,” Privacy Commissioner Timothy Pilgrim said.
“I welcome the decision of the Department of Health to immediately suspended access to the data set.”
Attorney-General George Brandis yesterday said the government will introduce legislation to amend the Privacy Act that will make it a criminal offence to re-identify de-identified government datasets.
Brandis said the government will make it an offence “to counsel, procure, facilitate, or encourage anyone to do this, and to publish or communicate any re-identified dataset.”
The broad wording of the Attorney-General’s statement has caused concern in some quarters over the potential to capture legitimate research.
The Department of Health said it is “undertaking a full, independent audit of the process of compiling, reviewing and publishing this data and this dataset will only be restored when concerns about its potential vulnerabilities are resolved.”