Preparing Exchange for High Availability

SAN MATEO (04/10/2000) - Microsoft Corp.'s Exchange Server has certainly woven its way into the corporate landscape. Many companies have now deployed tens or hundreds of Exchange Servers worldwide. But whether you have tens, hundreds, or only one Exchange Server, you need to protect your investment -- not so much your investment in the platform but your investment in all of the corporate data that the mail server now holds. You never know if disaster will strike, and if you haven't planned for every contingency, it probably will. Instead of leaving the integrity of your Exchange Server 5.5 to fate, you should create a DR (disaster recovery) plan.

In an increasingly electronic business world, e-mail has become crucial to the way we work. The typical corporate mail server stores thousands upon thousands of messages and documents that a company can't afford to lose. When the mail server goes down because of a hardware failure or some other catastrophe, work no longer gets done, thereby compounding the problem. Consequently, it is critically important for you to be able to bring a failed mail server back online as quickly as possible and with little or no data loss.

To do that, you need to be aware of some important aspects of the Exchange architecture, configuration tips that guard against data loss, different backup strategies, and the process of recovering from backups. In addition to helping you overcome an actual disaster, these DR techniques can help you when, for example, you need to expand disk space on your servers, move a server to a different physical box, or re-create a server to perform tests offline.

Anytime you deploy a new server, you hope that the last thing you'll need to do on that server is a DR operation. But failures happen, and DR is a fact of life. By planning ahead, you can minimize both data loss and downtime.

Before we dive into the actual restoration procedures, it is important to lay some architectural foundations up front, because understanding how the Exchange message store works will help you design the most effective backup and recovery processes.

Choosing between backups

You can perform two types of backups on an Exchange server: online backups and offline backups. Online backups are performed while Exchange is up and running.

The backup software works with Exchange APIs to back up the contents of the Exchange database without stopping the application or halting services, so users can continue sending and receiving e-mail.

Offline backups are performed after halting all Exchange services, allowing you to completely back up all of the files on the Exchange server, including all Exchange databases. With that out of the way, let's look at the Exchange database architecture.

The Exchange message store, termed the Exchange Information Store (IS), consists of two database files, PRIV.EDB and PUB.EDB. PRIV.EDB is the "private" store in which Exchange stores the contents of individual user mailboxes, and PUB.EDB is the "public" store in which Exchange stores Public Folder documents.

In addition, Exchange maintains a third database, DIR.EDB, which contains the contents of the Exchange Directory Service (DS).

Exchange uses a transactional model when writing new messages, documents, or directory entries into each of these three databases. In other words, Exchange records messaging transactions to log files first and then completes the more performance-intensive write operation to the IS or DS in the background as system resources permit.

This transactional approach provides two benefits. First, it yields better performance. Because transaction logs consist of unstructured data, writes to logs can be performed sequentially across the drive to a preallocated file. The second benefit is fault tolerance. If the IS ever becomes corrupted or damaged due to a disk failure or other problem, you can restore the last good backup of the IS and then play the Exchange logs forward to record transactions to the IS and DS that have been processed since the last backup. Obviously, this is crucial to disaster recovery.

When installing Exchange on a single volume, both the IS and its associated logs are stored in the directory :\exchsrvr\mdbdata, and the DS and its logs are stored in :\exchsrvr\dsadata (where represents the drive on which Exchange is installed).

For both the IS and the DS, the current log is named EDB.LOG and is 5MB in size. After Exchange writes 5MB of data to EDB.LOG, it renames the file and starts a new EDB.LOG. When Exchange renames EDB.LOG, the new filename contains an eight-character base name, prefixed with EDB and sequentially numbered in hexadecimal with a .LOG extension: EDB00001.LOG, EDB00002.LOG, EDB00003.LOG, and so on.

As logged transactions are recorded to the IS and DS databases, Exchange updates a checkpoint file called EDB.CHK, which is located in the same directory as the DS and IS databases. EDB.CHK tracks which transactions have been recorded. If you run out of space on the log drive, Exchange will shut down the IS and DS services.

Two additional files in each log directory, RES1.LOG and RES2.LOG, reserve disk space. If the log drive becomes full, Exchange will use the space allocated in these files to write log entries for transactions that are in progress when the IS and DS shut down.

Once each transaction is actually written into the IS or DS, the log files are needed only for replaying in the event that you need to restore from a backup.

In this case, the only log files you will need are those that contain entries that were written since the last online backup. Because the older log files are unnecessary, Exchange automatically purges them after the successful completion of an online backup.

However, Exchange will not purge the old logs if the data does not pass an integrity check during backup. Exchange verifies the checksum of each database page, comparing it to the checksum that was recorded when the page was written to disk. This extra step provides some reassurance that you will not purge the logs after backing up an information store that has corrupt data due to bad data blocks on the disk or a faulty controller. Furthermore, a failed backup can often be a symptom of a hardware problem or a potentially corrupt information store.

Lessons of the architecture

In a default installation, Exchange Server is configured with circular logging enabled. Circular logging uses only a handful of log files (usually five to seven) and simply overwrites previous logs. This feature is designed to protect smaller sites that handle minimal messaging volumes. It is a good feature if you don't want to concern yourself with running out of disk space due to failed backups. But because the sequential logs are overwritten, it provides almost no fault tolerance in the event of a failure, because the logs cannot be replayed.

As an enterprise Exchange administrator, one of your first tasks after installing a new Exchange Server should be to disable circular logging on the server.

When configuring your Exchange server for high availability, your second task should be to separate the IS and DS from their log files. Due to the nature of Exchange's transactional architecture, it is wise to store the IS and DS databases on a different physical disk set than the one on which their log files reside. That way, if the disk set containing the IS or DS is compromised, the log files will remain intact, and vice versa.

When adding more volumes to the configuration, you can run the Exchange Performance Optimizer Wizard to move the logs (or the IS and DS) to another volume.

A third piece of advice would be to provide enough disk space on the log drive to handle one week's worth of messages.

As the number of log files on a server grows, so does the amount of disk space consumed. You must be sure to allocate enough disk space for the logs, because if you run out of space on your log drive, the IS and DS services will shut down. You can free up space by copying log files to another drive or server. If you do this and you need to perform an online restore, you may need to expand the log drive temporarily to accommodate replaying all the log files.

Finally, consider performing full daily backups, and if possible, have the backup program write checksums to tape. When planning your online backup strategy, you can take an approach similar to that of typical file-server backups -- performing full backups once a week and incremental or differential backups each day.

However, if you have the capacity, you should consider doing full backups daily, including full file-level backups of the drives and full online backups of the Exchange databases. This provides the easiest recovery scenario if you need to perform a restore, because it requires only one restore operation. In addition, full backups will save disk space on your log drive, because Exchange purges the logs only after a full online backup.

Most backup administrators rarely verify data when they perform backups because the process carries a lot of overhead for little benefit. However, if your backup software has the option, you should choose to write checksums to tape.

This option allows you to verify your data when you restore it. This carries a little overhead, but much less than when performing a full verification during backup.

Restoring from an online backup

Usually when you must restore an Exchange server, it will be from an online backup. An online restore uses the Exchange backup APIs to restore the DS and IS and associated logs for transactions that were outstanding at the time of the last online backup.

Usually an online restore will be used to restore the server if the IS is compromised or corrupted. In this case, you will need to replay all of the logs for the transactions that occurred since your last good backup. So, the first step before you proceed is to halt all Exchange services and make a safe copy of the logs, either to another server on the network or by backing up to tape.

When performing an online restore, you may or may not be performing a complete disaster recovery of the whole server. Depending on what's happened, you may only need to restore your last good backup of the IS and then roll the logs forward to recover from a corrupted database. Or you may need to reinstall the entire server, if all of your volumes are on the same array and you lose the array or if you want to restore a server on a different machine for, say, a server upgrade.

If you need to rebuild the whole machine, you can do so by installing Windows NT from scratch, installing Exchange in recovery mode (Setup /R), and applying various service packs, tools, and customizations for your environment; or you can perform a DR from your last good backup to recover the OS, Exchange, and other tools to prepare the machine for online restore.

Because performing a DR will get you running much faster, I recommend this method. (For installing Windows NT and Exchange from scratch, see Microsoft's white paper on Exchange disaster recovery at www.microsoft.com/exchange/55/whpprs/BackupRestore.htm.)In this case, if your backup software has a DR option, then you can attempt to use that method to restore the OS volume, which generally requires a local tape drive and booting from a floppy set, followed by booting to the OS, restoring the remaining volumes over the network, and performing the online restore.

If a DR option is not available, you can install a temporary copy of Windows NT to a different directory on the boot drive. For the steps in this process, see the chart that accompanies this article online at www.infoworld.com.

Once you have the boot partition and other volumes restored, you can proceed with the process of performing the online restore. You'll also find the steps for this procedure in a chart accompanying this article online. Here are a few notes on the online restore process that can help you troubleshoot problems.

First, when you perform an online restore, Exchange makes several entries in the Windows NT Registry so that the next time you start Exchange it knows that a restore has taken place from an online backup and that it needs to perform the process of replaying log files into the IS or DS as appropriate. These entries are made under: HKEY_LOCAL_MACHINE\SYS TEM\CurrentControlSet\Services\MSExchangeIS\RestoreInProgress and HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeDS\RestoreInProgress.

Under each of these keys you will find values for the first and last logs in the sequence of log files that were restored from the online backup. These are the LowLog (first) and HiLog (last) entries. When Exchange finds the RestoreInProgress key, it then examines the LowLog and HiLog values to determine where to start the process of replaying logs.

If you want to replay additional logs, such as those that have been created since the online backup, simply copy the sequence of logs that begins after the HiLog value into the respective log directory before starting the DS or IS.

However, if any logs are missing from the sequence or if you have a corrupt log in the sequence, Exchange will stop playing the logs from that point forward, and you will lose transactions that occurred after that point in time.

Online restores have one limitation: You can only go back as far as your last good backup to have the opportunity to play the logs forward. Because Exchange purges the logs that have been committed at the time of the backup, some logs will always be missing from the sequence between successful backups.

So, if for some reason the IS from your last complete backup is damaged or corrupt after you restore it or if the tape goes bad, you can still restore from a previous backup. However, you will lose data.

Restoring from an offline backup

As far as disaster recovery goes, performing a restore from an offline backup is less common than from an online backup. But performing an offline backup and restore is a good way to migrate to a new server, expand disk volumes, or recover from maintenance activities and upgrades that have somehow gone awry.

In addition, an offline restore may be your only option if you cannot call on a recent online backup.

You can perform an offline restore by using your backup software or by installing a temporary copy of Windows NT to restore all drive partitions and then following the chart that accompanies the online version of this article to finish the offline restore procedure. (For restoring an offline backup of an Exchange Server to a newly configured server, see Microsoft's white paper on Exchange disaster recovery, which covers the process in fairly good detail.)Whether you are recovering from a server crash or simply using backup and restore procedures to ease maintenance or migration, this Test Center Action Plan should help guide you through the process. Naturally, you'll need to tailor the DR procedures discussed here to fit your environment. You should also familiarize yourself with the techniques in Microsoft's white paper on Exchange DR and DR articles posted on its knowledge base to build a DR strategy that matches the needs of your company.

Once you have a strategy in place, exercise the procedures on test servers and whenever possible during maintenance. This will keep you familiar with DR procedures and expose potential loopholes. When it comes to protecting your data, you will definitely want to practice what you preach.

Jeff Symoens (jeffsym@ix.netcom.com) supports Exchange Server as a senior IT systems engineer for a large SiliconValley-based high-tech company.

Exchange 5.5 disaster defense step-by-step(1) Disable circular logging, which avoids running out of disk space by overwriting previous logs but provides little fault tolerance. (Smaller sites should disable this after step 3.)(2) Separate the IS and DS databases from their log files. If you keep them on separate disk sets, you won't lose data unless both disk sets are compromised.

(3) Provide enough space to store one week's worth of message logs. If you run out of disk space on your log drive, the IS and DS services will be shut down.

(4) Consider performing full online backups daily. Full online backups require only one restore operation, providing the fastest and easiest way to recover from a disaster.

Microsoft resources

Additional information on Exchange disaster recovery is available from a Microsoft white paper and the company's knowledge base.

Exchange Disaster Recovery White Paper

* www.microsoft.com/exchange/55/whpprs/BackupRestore.htmMicrosoft Knowledge Base* support.microsoft.com.

Join the newsletter!

Error: Please check your email address.

More about MicrosoftNetcomSoftware Works

Show Comments

Market Place