Whenever we think about Y2K failures, we tend to focus on the "visible" problems -- for example, the embedded systems failure that causes a refinery to explode is "fix on failure." Meanwhile, there's another Y2K failure that's far more insidious, one that will require attention and resources throughout next year: the data-corruption problem.
I'm not talking about data corruption that's massive, sudden and visible -- such as a payroll system that runs amok and sets every employee's salary to zero. What I worry about is the Y2K bug that corrupts only a tiny percentage of a database, in such a way that its impact is not immediately visible. For example, what if a bug updates an active database record correctly but also clobbers a small portion of a dormant database record -- such as a code rewrite that correctly replaces a two-digit YY "year" field in an active database record with a four-digit YYYY field but contains a bug that clobbers the first two bytes of an adjacent record? It may be months, or even years, before that dormant database record is accessed or until enough dormant records have been clobbered that the entire database collapses. And the problem can be more subtle still if the bug involves interfaces between systems operated by separate organisations.
Data corruption isn't a new concept, and it's not unique to Y2K. But, ironically, some organisations learned about long-term data corruption problems in their databases only as they began working on their Y2K remediation efforts.
So, how do we cope with data corruption? Most organisations believe they can avoid the problem through rigorous testing and through whatever error-checking mechanisms are built into the application code and the vendor's DBMS package. But they may be fooling themselves; the odds of avoiding corruption in a database with 10 million records that has been running for 10 years are small. Indeed, it's likely that the only reason organisations do have stable systems is that they build them one at a time and modify them relatively slowly over time.
Y2K is fundamentally different because it involves making massive changes to all the systems all at the same time. Yes, the testing effort has been extensive in most large organisations, and we'll probably eliminate most, if not all, of the visible bugs. But it requires enormous optimism to assume that we will have eliminated the subtle bugs that cause the insidious data-corruption problems -- especially when independent verification and validation vendors such as Cap Gemini, MatriDigm and Reasoning Systems report finding between 400 and 900 bugs per million lines of code that were supposedly remediated and supposedly tested. I believe it's more realistic to assume that the data corruption problems will occur and that we might not see them for months or years after January 1.
So the question remains: How do we cope with data corruption? The solution is simple and obvious -- though by no means foolproof. We need to develop extensive data auditing, data verification and data integrity programs and then use them periodically throughout 2000 and possibly beyond. Depending on the size of the database and the amount of spare CPU cycles available, we should run these programs daily or at least weekly for the first few months. Depending on the outcome, we may be able to relax our vigil later and run the programs monthly.
There are commercially available data verification packages, and some organizations have developed their own programs to minimise data corruption. But it's not a common practice, and most of the organisations I visit haven't planned on spending money or computer resources on this kind of strategy next year. I believe this will be an expensive oversight and one that will exacerbate the Y2K problem far beyond what it should have been.
Yourdon heads the year 2000 service at Cutter Consortium in Arlington, Massachusetts. Contact him at firstname.lastname@example.org.