Tuesday | 7 October, 2008
Computerworld
Data shuffling: Safer way to analyze confidential data?
US professor explains data protection technique
Bob Brown (Network World) 24/06/2008 09:48:57

Computerworld Buyer's Guide - Vendors Matched to this Article
Related Features
  • +

    Doing Your Sums on . . . Build, Buy or Rent 05/11/2007 13:32:30

    You’re trying to build a world-class IT team, but everyone’s going after the same talent pool. What mix works best? Should you grow your own, draft your players or barter your way to the line-up you want to field?
    CIOs should never forget that while new technologies have a maturity cycle, the maturity cycle for human beings in IT is even longer
Additional Resources
Executive Guides
Whitepapers
Zones
Zone logoZones provide focussed content from Computerworld and leading technology partners.

Newsletter Subscription

Sign up for our Computerworld newsletters!
Computerworld's twice-daily news service keeps you in touch with the latest, most important headlines from Australia and around the world.
Keep up with the latest virtualisation technologies, products, news and features.
RSS Feeds

Oklahoma State University's Technology Business Assessment Group recently announced it will fund research on an approach to information protection called data shuffling. The project is led by Professor Rathindra Sarathy of OSU's Department of Management Science and Information Systems, who explains to us just what data shuffling is and why it could be coming to your network soon.

Can you give me a quick layman's explanation of data shuffling, then a little more technical one for our readers in IT security? Also, how's it different from encryption?

Data shuffling (US patent: 7200757) belongs to a class of data masking techniques that try to protect confidential, numerical data while retaining the analytical value of the confidential data. Let us say that you want to provide confidential salary data to an analyst. The goal is to try to answer questions such as "Controlling for experience, education and other factors, is there a difference between male and female managers?" or "What are the best predictors of salary among variables such as Age, Sex, Experience, Education, Race, etc.?"

You do not want to provide the original salary data to the analyst, for obvious confidentiality reasons. Even if you remove personally identifiable information before providing the original confidential data, security is not assured since it is usually easy to identify an individual if you know their characteristics. Conventional encryption techniques would not be of value, since the unencrypted original salary is necessary to perform analysis. Hence, one approach is to try to modify the numbers (masking the numbers) before you provide them to the analyst. Data shuffling would intelligently re-assign the original salary numbers such that the results of the analysis come out correct. Simultaneously it prevents you from associating the original salary numbers with the correct individuals. The real power of data shuffling shows up when you want to maintain complicated relationships among several variables, including both confidential and nonconfidential, such as in the second question above.

Data shuffling isn't something we've written about, though I do see a fair number of references to it on the Web. Do you have a sense of how hot a concept this is now?

Several researchers are working on data masking concepts. Data shuffling is a particular method of data masking that we have patented. We believe that it has strong potential. Unfortunately, organizations have not realized the power of data shuffling and the potential benefits that come from using this approach. Our main thrust in the next two years will be to educate and promote the benefits of data shuffling.

As for commercial products, there are a couple of data masking products in the marketplace. But, unlike data shuffling, they provide fairly simplistic situations. As a result, the masked data does not offer the same quality assurance that data shuffling provides.

I saw a presentation you did that focused on protecting data in healthcare settings. Is that where you see data shuffling taking hold initially, or what other vertical markets do you think are especially good fits?

Healthcare is definitely one of our current focus areas, but there are many other applications such as insurance claims data or other types of financial analysis applications where it can be useful. In fact, data shuffling can be used in any situation where an organization wishes to analyze or share any confidential data.

Computerworld Buyer's Guide - Vendors Matched to this Article
More about USPTO, SAS
Market Place

Computerworld Member Login


 

Smart SOA World Tour

Discover how SOA can create smarter outcomes for your business.

Attend and learn:

  • How SOA is helping leading companies to become more agile
  • Where you should be applying SOA processes in your company
  • The top SOA implementation mistakes to avoid

Click here for more information.
Whitepaper

Solve Exchange Mailbox Storage Issues Once and for All

Join industry expert Bob Spurzem and Chuck Arconi of Fox Hollow to discover how to reduce Exchange total storage and keep it at a manageable level. Learn how Exchange storage growth can be contained without sacrificing security and accessibility.

Enterprise IT Buyer's Guide
Find Technology Vendors Fast
 
Find vendors by name | Find by category
Sponsored Links