Network Appliance's NearStore ushered in the era of using inexpensive, Advanced Technology Attached (ATA) disk arrays for disk-to-disk backup or secondary, near-line storage. The product, launched in March 2002, offers faster backup and recovery times at a cost per megabyte that's competitive with tape backup systems. Now vendors are rushing to add application-specific intelligence to ATA-based storage appliances that reduces application server workloads while offering more efficient ways to store and retrieve data.
Perhaps the best example is Centera, EMC Corp.'s system for indexing, storing and retrieving "fixed content" files. In Centera's Content Addressed Storage scheme, the client application bypasses the server's file system by making calls to a proprietary application programming interface (API). Centera intercepts each file storage request, strips off the metadata (such as date and time stamps) and runs a hashing algorithm to create a unique, 27-character content ID. It then returns a content descriptor file (CDF) to the client application that points to both the stored object and its metadata. Thereafter, the application need only request the stored object's content ID. Abstracted from the storage media in this way, the application needn't worry about disk I/O, tracking the file path or keeping up with changes in the back-end storage configuration.
The bottom line: "You should need less of a server . . . and the applications should run more efficiently on lower-cost compute platforms," says Steve Duplessie, an analyst at Milford, Mass.-based Enterprise Storage Group.
Centera's technology also eliminates redundant file storage by creating multiple references that point to a single instance of the stored file. For example, to store an archived e-mail file attachment sent to 1,000 users, Centera would create 1,000 CDF references to a single content ID, which in turn would reference a single, stored file.
Startup Avamar Technologies Inc. takes this technology one step further to address the problem of backup inefficiencies. While Centera's CDF technology can eliminate storage of redundant files, Avamar's Axion backup appliance indexes the individual data blocks that make up those files on disk in order to eliminate both file and partial file redundancies. When a sentence changes in a document, for example, Axion updates only the affected blocks within that file.
"We're so much more efficient [that] we can store 10 to 100 times the amount of daily backups that you could on a [disk-to-disk backup system that is] mirroring tape backup," says Jed Yueh, Avamar's executive vice president. The result is a system that requires less space for backups, can restore faster and can efficiently back up distributed systems over a wide-area network, he says.
Another startup, Netezza Corp., has taken the intelligent storage concept the furthest by embedding parallel processing power with individual disk drives. It designed the Netezza Performance Server as a "data appliance" that optimizes business intelligence queries against very large databases, replacing the traditional Oracle database running on high-end Unix servers and EMC storage arrays. CEO and co-founder Jit Saxena says disk I/O is a bottleneck when querying such databases. Netezza's parallel processing architecture packages what it calls Snippet Processing Units (SPU) with each disk drive -- up to 450 per appliance -- and integrates those with a symmetric multiprocessing front end that can accept SQL queries from any application that supports the Open Database Connectivity protocol. Each SPU has dedicated memory and communicates over a Gigabit Ethernet connection.
"We have deployed huge amounts of intelligence right next to each drive," says Saxena. By keeping all drives processing in parallel, he says, "we provide 10 to 20 times the performance of a [traditional] system at half to one-third the cost." And because the system is read-intensive and application-specific, Saxena says ATA-based drives work well.
By using smart, inexpensive ATA-based storage appliances that offload I/O processing for application-specific tasks, vendors may eventually change how users view the traditional server's role, says Duplessie.
"What we're doing is taking distributed computing to the next level by ëappliance-izing' the intelligence in the server," he says. But even big-name products like Centera are still in early stages of acceptance. "It will take some time for people to make the best use of this," predicts Jamie Gruener, an analyst at The Yankee Group in Boston.
|Centera Turns State's EvidenceThe Southern California High Tech Task Force in Norwalk, Calif., became an early adopter of EMC's Centera, using it to archive forensic evidence gathered from suspects' computers. Prior to using the system, investigators burned evidence onto CD-ROMs -- as many as 100 for a 60GB drive image. "We needed something that was secure, very reliable," says project director Rick Craigo. Centera's design supported mirroring and provided an audit trail, since stored objects can't be changed without generating a new content ID. "Centera was almost a custom fit," says Craigo.Using custom-developed software, investigators now store captured evidence on a Linux server cluster with 6TB of direct-attached storage. Completed cases migrate to the Centera archive before users erase them from the active storage area. Craigo says Centera was priced right. "Our sheriff's department has a Symmetrix system that cost a million bucks, and that's 1TB. We're at a quarter of that for 10TB. It's a day-and-night comparison," he says. But the system has another benefit: Craigo uses it to back up files on both the evidence network and Windows 2000 servers in the Task Force's offices. Backups run quickly and with minimum space because Centera saves only one copy of redundant files and updates only those files that have changed. "With the amount of archiving we do, we'll see the overall savings in about a year and a half," he says.
-- Robert L. Mitchell