Every company that does business on the Internet, regardless of the server or software chosen for the job, has experienced site performance crises. These woes can degenerate to site failure -- or even worse, to the loss or corruption of data.
If you lack a sensible strategy for addressing Web site performance problems, you have little hope of diagnosing them and meting out the appropriate cure before users are seriously affected. The appropriate response to reports of site performance trouble is not to shrug them off, overspend on site analysis software or consultants, or turn programmers loose on your site. Minimally invasive diagnostics will pinpoint most sites' choke points without breaking users' connections, and they don't require costly tools. In fact, some such tools are free.
Some companies unwittingly render their access and error logs useless by cutting back the level of detail recorded or by capping the maximum size of the logs to save disk space. If you're doing that, buy more storage space. Logging should always be enabled and should be set to gather enough detail to aid investigation. For example, Microsoft Corp.'s Internet Information Services (IIS) will record page processing time, an invaluable measure of site performance, but not by default.
Commercial sites are often spread across a virtual or physical cluster of servers. As you drill into your logs, you'll need to coalesce multiple log files into one searchable, time-sequenced document. The Unix/Linux sort utility comes in handy here; Windows users can sort their logs in Excel.
For this work, as for all tasks related to investigating troubles on your site, your servers' real-time clocks must be in perfect sync. Also remember to sync the clocks of customer service and help desk systems, database servers, and all other machines that might contribute time-stamped logs to your investigation.
The Web server's logs tell you what is happening with the HTTP server, but they do not tell you anything about application code, databases, object brokers, or any other component of your site. Well-written applications can document interactions with critical services. You may need to turn on the application's debugging tool to produce logs. Do this sparingly on production sites; debug-level logging can hamper an application's performance.
If Web and application logs don't point directly to a smoking gun, you'll need to sit in the user's chair. The logs, plus your knowledge of typical navigational paths through the site, should help you identify troubled Web pages or server-side code. Using free tools such as VeloMeter and OpenSTA, you can simulate hordes of users selectively hammering on your site's weakest points. The objective is to force your site to fail or degrade while you're monitoring it.
Running user simulations on production servers is dicey because the process itself places a strain on your site. In a perfect world, you'd have a parallel environment set aside just for testing, complete with dummy back ends and bogus data. If you must run tests on your production site, double-check your simulation to make sure that any data introduced is backed out of the system or is marked invalid.
Simulator testing is done in two stages. First, one or more actual users navigate the target site using the simulator as a proxy. As the simulator forwards data to the site, it captures the user's actions in a script. In the second stage of testing, the simulator plays back the script. Edit the script and the number of simulated users to the minimum test set necessary to reproduce the problem. If it takes dozens or hundreds of dummy users to stress the site, run the tests during an ebb in traffic. The alternative is to carve off a small subset of servers or partitions for testing, leaving the bulk of your processing power available to real users.
Two open-source load simulators, VeloMeter (www.velometer.com) and OpenSTA (www.opensta.org), meet most testing needs. If you suspect your proxy or cache server is interfering with your site's performance, a tool called Web Polygraph (www.web-polygraph.org) will help you get to the bottom of it. Finally, you'll find server-specific tools bundled with most open-source and commercial Web and application servers. Microsoft's site testing tools are part of the IIS Resource Kit.
The best strategy for dealing with Web site performance problems is to plan for them up front. Check your performance testing strategy by running tests when you don't need to -- drills are a great way to work the kinks out of your performance testing and response plan.
Tom Yager is the technical director of the InfoWorld Test Center. Send him e-mail at firstname.lastname@example.org.
THE BOTTOM LINE
Web site performance testing
Executive Summary: Companies typically react to early signs of performance trouble by doing nothing or by overspending on software and consultants to identify the problem. Investigating site performance problems doesn't require a huge budget, just knowledge and planning.
Test Center Perspective: Web site performance problems are inevitable, so plan for them and drill your staffers on the plan. Design low-impact logging into your site's server-side code. Get all you can from free tools such as Analog, VeloMeter, and OpenSTA before spending thousands on commercial performance testing utilities.