IBM stores medical data in DNA streams

In a significant move towards realizing the promise of personalized medicine, IBM researchers have devised a computer language to create a ''smart'' stream of DNA code that includes a patient's medical record.

The report, authored by Barry Robson and Richard Mushlin of IBM's T.J. Watson Research Laboratory, was published online in July by the Journal of Proteome Research, and will appear in print in next month.

The paper describes the "Genomic Messaging System (GMS)" - a basic computer language that enables the seamless "compression, encryption and transmission ... of clinical and genomic data, including bringing data together such as the clinical record and the patient DNA results from the sequencing laboratory."

In essence, GMS enables medical and supporting information to be embedded in streams of DNA data, adding human- and computer-generated content into the primary genetic sequence. The authors suggest that this stream is "capable of storing and transmitting an entire Integrated Medical Record (IMR)."

GMS language is inserted into genomic sequence streams at appropriate intervals via "plug-in" scripts called "cartridges," thereby coupling a patient's medical and genetic records, or annotating the primary sequence. GMS can also provide links to relevant medical data, and provide security passwords. The same technique could be used to encode data from patient X-rays and MRIs.

Once the data have been aggregated and are represented in GMS syntax, the encoding process begins, resulting in a compact binary representation of the data. GMS uses a minimum of 2 bits per base (A=00, G=01, C=10, T=11). The code uses Perl 5 with XML capabilities, as well as Java. The encoded, encrypted stream is then transmitted to the receiving system, and decoded. In future, some output cartridges might be specifically tailored for research use (preserving patient confidentiality), while others might be adapted for physician use.

''GMS links archives of digital patient records to enable analysis of those records by a variety of bioinformatic and computational biology tools,'' says Robson. These tools include data mining to discover unexpected relationships, large-scale epidemiological studies and three-dimensional modeling of patient proteins to study the effect of SNPs -- single-nucleotide polymorphisms.

In the example in the report, Robson and Mushlin combined medical data submitted from an IBM facility in Israel, with DNA records from IBM New York (simulating a clinical laboratory). Based on the patient's DNA data, they used GMS to model SNPs from that patient encoded in an immunohistocompatability protein, as a means to evaluate the impact of those variations on binding of an immunosuppressive drug. Although only a preliminary study, this gives an indication of how a patient's personalized genetic data could be factored into decisions about suitable drug administration. The entire analysis took about one minute.

The authors believe that GMS represents "a 'smart' plug-in multi-branched adaptor at any junction in a clinical or biological IT network where genomic data is at least partly involved." They acknowledge that this is only a proof-of-concept from the IT perspective, but argue that, in a pharmaceutical context, "there is no reason to believe that a profoundly different information technology is required at the GMS level of contribution."

Robson and Mushlin thanked several of their IBM colleagues in the report, including Blue Gene supercomputer director William Pulleyblank, "for their encouragement to initiate these studies at an earlier time when the DNA-augmented patient record seemed to many to be mere science fiction."

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about IBM AustraliaIMRPromiseVIA

Show Comments