Computing platforms, most definitely including storage systems, are becoming increasingly self-sufficient when it comes to caring for themselves. In previous articles we have addressed their potential in the areas of self-configuration and self-optimization. Today a quick look at the promise of embedded self-healing and self-protection.
Self-healing capability refers to a system's ability to identify problems, diagnose them and respond in an appropriate way. "These problems might include such obvious things as corrupted data and disk head crashes, but also take in more subtle concerns such as adapting to network brownouts and storage pool suboptimization - with "suboptimization" being defined in whatever way the local site chooses.
In most IT shops today, bad data means a fire drill. Identify the disks that hold the bad data, find the appropriate set of backups or archive tapes (sometimes in a silo, but more often than not in a "run and fetch" configuration), and reload.
Those of you who mirror data have it easier of course, unless you have mirrored the corrupted file blocks. And sites that snapshot all the delta blocks have it easiest of all ... unless they don't know enough about the time of the data failure to tell their management tool what snapshot to go and fetch.
And Murphy's Law tells us that while all this is going on some key business process has been impacted for the worse.
An autonomic environment would heal by identifying the bad data, locating an uncontaminated data set, swapping out the bad data for the good, alerting the humanware about the event, and so on. Hopefully it would also go one step further, identifying the root cause of the problem, taking appropriate steps to fix it, and with the most advanced systems, learning from the experience so that future occurrences of the same sort could be preempted.
All with minimal or no human intervention.
The self-protection piece of an autonomic system will automatically guard critical data assets from all varieties of intrusion, intended and inadvertent, external and internal.
Some hardware companies have offered various kinds of protection (from overheating, for example) for some time now, but other than shutting down and perhaps sending out an alert, they have not gone much further. We mean here a capability that is intertwined with all the other aspects of monitoring, analyzing and management. In the beginning this will likely just apply to the software assets. But we are talking about more than just authenticating and authorizing access to the data. We also will see protection optimized for individual applications.
For example, the management system will know about the different needs of different databases. This means more than just monitoring tables for fragmentation; it also means understanding and protecting the route of the data across the entire data path of each application, and ensuring that data sent back to the storage environment is clean. (Whether or not application protection is best left in the hands of the application itself may be as much a philosophical question as a technical one. Admins who look at whole systems however, are likely to want some centralized controlling mechanism.)As we look at storage from the viewpoint of systems rather than as a sum of individual parts, it is clear that the four aspects of autonomic computing - self-healing, self-protection, self-configuration and self-optimization - are tightly coupled. It should also be clear that the interrelationships between these four broad categories of self-maintenance are both subtle and complex. Establishing and enhancing flowthrough across all these categories will be a challenge for the industry for several years to come.
All this drives us to the conclusion that ultimately, at the heart of autonomic computing lies the issue of automating the complexities of the monitoring, analytic/predictive, and management functions. But that, along with the ability of systems to learn from their mistakes, is the subject for next time.