BOSTON (06/12/2000) - Soon after Don Cawthorne accepted the position of disaster recovery systems officer for Northern Trust Co., he found himself facing a variety of technical and political problems.
Technical woes were at the top of the list. "We were spending 20 man-hours per day running around the bank changing tapes on a little more than 100 [Windows] NT servers and sending them to off-site storage locations," Cawthorne says.
On the Unix side, the IT department was using IBM Corp.'s ADSM software to back up the Unix servers. The last full backup was done a year before Cawthorne took over. When he conducted a disaster-recovery test, data was still not fully recovered 48 hours later. This was clearly unacceptable at a prestigious custodial bank with a blue-chip clientele and $1.5 trillion dollars in managed assets.
Cawthorne immediately set about overhauling the process. During the course of his research, he found out about storage-area networks (SAN) and Fibre Channel technology. Cawthorne noted the similarity between Fibre Channel and IBM's Enterprise Systems Connection (ESCON) technology, which the mainframe people at Northern Trust had been using for disaster recovery for the past five years.
Here's where the political problems arose. Cawthorne figured with a little help from his mainframe colleagues, he could apply the ESCON disaster-recovery model to the open systems he was responsible for. But he knew communication between the mainframers and the NT/ Unix folks was virtually nonexistent. Cawthorne decided to break down the cultural platform bigotry, and seek help from the mainframe team to solve his disaster-recovery problem.
Fibre Channel seemed like a perfect solution. Its 100M bit/sec speed would give the bandwidth necessary to mirror data to a remote location. Furthermore, Fibre Channel's ability to traverse distances up to 10 kilometers would let him mirror data over the distances that separated the bank's three sites with room to spare. In addition, the bank already had the existing infrastructure in the form of dark fiber that was put in as part of the mainframe installation.
Cawthorne's next political hurdle became apparent when he pitched the SAN plan to upper management, which is known for using technology to gain business advantage, but is also known for keeping a safe distance from the bleeding edge.
He presented the proposal as an upgrade to the existing ESCON technology, which could exploit the SONET infrastructure already in place. Management gave Cawthorne the budget he requested and set an 18-month project deadline.
Cawthorne and his team evaluated all the major vendors that offered SAN platforms. At the time, the list consisted of Clariion, Compaq Computer Corp., Data General, EMC Corp., Hitachi Ltd. and IBM. He chose Hitachi because he found its SAN solution was more cost-effective for his needs.
The disaster recovery systems officer narrowed the list of Fibre Channel switch vendors to two: Brocade and Ancor Communications. He ultimately gave the nod to Ancor because that company's switches "were two or three revisions ahead in technology."
Because Cawthorne hadn't worked with Fibre Channel before, he then decided he needed a consultant. After conferring with the people from Ancor, he chose Intelligent Solutions of Burlington, Mass., which had five years experience deploying Ancor switches.
To successfully deploy a SAN, you need a unique skill set, says Intelligent Solutions President Kevin Metcalf. "You have to understand disks, networking, servers, operating systems and how to plug all of them together," he says.
The implementation of the SAN began in September 1998. The integration team made sure all of the boxes running SCSI could see the Hitachi disks, and they made sure the IP network worked. Then they tied three buildings together, connected 20 Lotus Notes servers to the SAN, established two Fibre Channel arbitrated loop connections, and did mirroring in the background. Ancor 15-port Fibre Channel switches were added during this phase.
The SAN pilot was not without problems. When Cawthorne added IP traffic to the switched fabric, he broke the microcode in the Hitachi 7700e. "I effectively sent data screaming across this high-speed backbone that hit not only the other server but every other port along the way, including the ports that the storage was attached to. The microcode thought it was illegal data and shut down the ports," he says.
Cawthorne was able to quickly diagnose the problem and apply a fix. "I disabled the ports on the switched environment for IP broadcast and enabled the rest.
With this fix, I had it up and running the next day," he says. Cawthorne says that since this incident occurred Hitachi engineers have rewritten the microcode so it no longer rejects IP traffic as illegal data.
Despite this episode and a few minor problems, the pilot was deemed successful and the bank went to full production with the SAN in December.
During the pilot, Cawthorne and his team encountered additional benefits. For example, by building the SAN the way they did and with the common disk technology in place, the IT team found they could do what Cawthorne refers to as "true enterprise backup and recovery." For this purpose, the IT team is using the Harbor Backup and Recovery Systems from Beta Systems in Calgary, Alberta. The team is using an MVS mainframe as the host server with clients on the Unix and NT machines. By year-end, the IT department plans on backing up its 500 Unix and NT servers over the SAN and over ESCON to StorageTek tape silos.
With the SAN in place, senior managers are now confident that the company can weather floods and other natural disasters.