Tax collects Linux for open source analytics

Despite its strong ties to proprietary software vendors, the Australian Taxation Office has finally dipped its toes into open source by establishing a Linux-based system for information analytics.

Tax has had a Teradata warehouse for 10 years but the change program initiated by chief knowledge officer Philip Hind in November last year is a "series of activities to enhance the ATO's ability to collect and analyse data", assistant commissioner of information management John Body told Computerworld.

As a result the ATO invested in Teradata's Warehouse Miner, SAS Enterprise Miner, and hired CSIRO's principal computer scientist for enterprise data mining Dr Graham Williams who pioneered open source analytics for organizations like the Health Insurance Commission and NRMA.

Tax implemented the system in March and now has a number of open source products for analytics running on a Debian GNU/Linux stand-alone server. A spokesperson for the ATO said part of the advantage of the GNU/Linux environment is that it ships with a large collection of basic tools that interoperate, like Emacs, Vim, Perl and cvs.

"These provide many basic data manipulation capabilities over the very large datasets that we are dealing with for our analytics work," the spokesperson said.

In addition to the basic tools, Tax has also deployed open source data mining and data manipulation packages, including "R" - a statistical and data mining package widely used in industry and academia that supports modern data mining approaches and graphing capabilities. R is used for data summarization, manipulation and cleaning, modeling, and model evaluation.

Tax is also using Weka, a Java-based data mining toolkit with more than 60 traditional and modern analytic tools.

For development, Tax is using Python scripting language, which is "ideally suited to automated data manipulation and transformation".

Analytics looks at the relationships within the data to make it easier for clients to pay tax, and the ATO to identify cheats. Tax has been looking through data for some considerable time but data mining with more sophisticated algorithms is a recent initiative.

The ATO's assistant commissioner of analytics, Stuart Hamilton, said the Office is using open source because a lot of the newer algorithms and techniques are available before they make it into enterprise software.

"The data is explored with the open source tools and then SAS is used to do predictive modelling," Hamilton said, adding that the system seems to be "working satisfactorily".

"Open source provides some advantages in terms of flexibility and costs but we can't say it is industrial-strength enough to handles millions of records with hundreds of conditions."

Even with the early successes, open source is still strictly limited to a stand-alone system and is not allowed on the main network "because of some issues involved in using open source".

"For example, if something goes wrong, where is your fallback?" Hamilton said.

Hamilton said Tax will investigate open source further as the new analytics system forms part of a trial of certain software to evaluate risk and fit for purpose.

Open source is being used at the "innovative edge" at the ATO and Hamilton thinks it unlikely that within a year open source will be used on the main computing platforms.

"Trials will continue and we may find niches," he said.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about Australian Taxation OfficeCSIROCSIROCVSDebianHealth Insurance CommissionLinuxNRMA GroupSASTeradata Australia

Show Comments