My first foray into the world of commercial IT was managing mainframe storage. Those were the days when we operated one or two systems, people talked in gigabytes (no-one knew what a terabyte was) and we thought ourselves awash with DASD (sorry "disk" for those Unix people out there) when we had 300GB to manage. We used to be able to account for almost every dataset (oops, file) on the system and attribute them to owners.
Thanks to the comparatively high cost of disk space, it was essential that our accounting processes were accurate and run frequently. As we used HSM (Hierarchical Storage Management) it was possible for the system and for users to migrate their data to a cheaper secondary medium such as tape. Some clever users worked out they could beat the charging by migrating all of their datasets from disk to tape just before the accounting job was run, then recalling them back afterwards and reducing their billing costs. We had to increase the frequency of our auditing in order to stop this abuse.
Nowadays, we have a much more complex environment to deal with. We have multiple operating systems, multiple storage vendors and a more complex host to disk fabric interconnect. We talk in multiple terabytes (which for us is growing at a rate of 5TB per month) and we have no idea how the storage is used on each server. Now that the cost of storage is coming down, precise accounting is becoming less important and, in reality, would take too long and be too costly to gain any significant benefit in the savings produced. Currently we account for storage assigned to a host, whether the host is using that storage or not. We also only collect statistics and charge monthly, as allocations per host don’t happen that often.
So, is this the right approach? I’d have to say both yes and no.
On the one hand (the "Yes" camp), accounting at the host level is simple and can be automated from the documentation database of hosts that we maintain. We can also automatically validate that the information contained in our database is accurate in terms of what the host has allocated to it by looking at the zoning and LUN masking attributes. The charge back process then has a simple calculation to perform, based on the cost per gigabyte per month.
Saying "no" to my own argument sounds like I’m contradicting myself, but I don’t think I am. For our team, accounting at the host level is the best we can achieve, as we’re not responsible for managing host allocations. We don’t have the resources to perform active storage management on all hosts but concentrate on providing storage to the host and to ensuring that storage is delivered and available in a timely manner and that we can account for its usage.
In terms of managing storage at the host level, this is still a very necessary task, but I think environments have changed over the years. The majority of application data resides on large databases on dedicated servers, taking up most of the storage allocations. User communities tend to be supported on dedicated fileservers (rather than sharing space in the way they did in the mainframe environment) and these servers have dedicated quota management functionality built into the operating system. This means that there is no need to monitor application servers to the same degree that was required in the mainframe days. This split means we tend to allocate the majority of our storage to application servers where the growth is consistent and usually in easily accountable quantities to satisfy, for example, the implementation of a new database or the take-on of new data.
By comparison user data tends to grow incrementally, without any perceptible reason. Products such as Netapp filers, or the storage servers from Microsoft, provide much better facilities for day-to-day management of this kind of data and allow the segregation of true "user" data from application information.
So, does that mean my job has become easier? In some respects yes, however there will always be a need to account for storage usage, regardless of how cheap and plentiful a resource it becomes.