FRAMINGHAM (07/24/2000) - In a perfect world, your software vendor's (or your own company's) application programmers would precisely and fully understand the relationships between an application's consumption of computing resources and the available network bandwidth, server CPU speed, server disk speed and client CPU speed. Their grasp of these relationships would be so perfect they could tell you, for a specific set of network components, computers and users working at a given time of day, the exact response time for a particular application transaction or other unit of work.
Unfortunately, the world is far from perfect.
More realistically, your staff might include a skilled troubleshooter who tries to solve application performance problems by visiting users complaining of sluggish response times with a protocol analyzer under his or her arm. The troubleshooter measures response times, determines network utilization, identifies client and server CPU speeds, calculates disk I/O rates, draws pictures showing network links to the servers and gauges the per-unit-of-work capacity of the servers and the network. The person then makes an educated guess to pinpoint the application's performance bottleneck.
As an alternative and better-automated approach to solving application performance problems, several vendors offer software tools that identify bottlenecks, determine capacity and gauge scalability. To help you decide the best tool to solve your organization's application performance woes, we invited vendors to submit their performance monitoring tools for this review. Our evaluation focused on software products that inspect network, client and server application behavior to find the causes of snags, stoppages and snarls.
We tested CompuWare Corp.'s EcoScope 4.1, EcoTools 7.1 and an EcoScope WAN Adapter Hardware Kit, Lucent Technologies Inc.'s VitalSuite 7.1, NextPoint Networks Inc.'s S3 2.51, Ganymede Software's Pegasus, Response Networks' ResponseCenter 2.5 and NetScout Systems Inc.'s AppScout 2.0.
Lucent's VitalSuite proved to be a superior, but pricey, performance problem solver and capacity-planning aid. Its high accuracy and support for a wide range of applications earns VitalSuite our Blue Ribbon Award for best performance monitoring tool.
The Vital solution
Surprisingly easy to use in light of its complexity, VitalSuite is a well-integrated collection of software modules for monitoring network activity, ensuring WAN link service-level agreement (SLA) compliance and tracking application performance. It accurately and easily pinpointed our deliberately caused bottlenecks in the Visual Basic client/server, Web-based search engine and other software we tested with (See How We Did It, page xx).
We especially liked VitalSuite's responsive and intuitive user interface.
Furthermore, its flexible architecture impressed us with its ability to handle a wide variety of business application environments.
The suite includes VitalNet, VitalAnalysis, VitalHelp, VitalAgent, VitalAgent AutoMon and Transact toolkit. VitalNet gathers information from SNMP-aware devices and desktop machines on which you've installed the VitalAgent client software, then relays the information to VitalAnalysis and VitalHelp.
VitalAnalysis monitors applications and maintains an historical analysis of system and application performance and trends. For capacity planning and other purposes, it stores a year's worth of data in the included Sybase Inc. database or, optionally, in a Microsoft Corp. SQL Server database that you buy separately. VitalHelp assesses the health of TCP/IP-based applications, posting alerts to the network administrator when it determines the cause of a problem.
AutoMon is a script-driven synthetic transaction engine for generating network traffic that simulates an application's network traffic for measurement purposes. (The alternative to using synthetic transactions is a passive approach that watches the wire for an application's actual network traffic.
Both approaches are useful.) The Transact toolkit lets programmers define unique business application transactions for VitalSuite to monitor. We found using AutoMon and the Transact toolkit simple and straightforward, although both required rudimentary programming skills. AutoMon was particularly helpful when we used it to track down a database-related bottleneck in our Web-based test application.
VitalSuite's key report, Heat Chart, made troubleshooting application bottlenecks a breeze with its at-a-glance identification of problems and their causes. Each Heat Chart displays a color-coded matrix of application performance factors and computing components, termed "resource classes." Each Heat Chart cell corresponds to a resource class and a performance metric. Heat Chart cells change color to indicate the health of the underlying computing resources that comprise each of the corresponding resource classes. For example, if VitalSuite detects excessively slow response times for Sybase database transactions, the Heat Chart cell defined by the intersection of "Sybase" and "effective transaction throughput" turns from green to red.
VitalSuite reports application performance data in three views, Business, Applications and Reports. Customizing the Business view as either My Vital or My Business is a configurable preference, with each view a different way of looking at performance metrics from application and network statistics. My Vital typically presents a network-centric view of performance, while My Business is most often an application-centric view. The My Vital personal Web page is highly configurable and uses password protection.
The Applications view groups tab-indexed information into categories such as domains, groups, clients and servers. Each tab index displays network-related application performance criteria, including lost packets, round-trip delays, availability, response time throughput and client, network and server delay times.
The Reports view is a high-level menu of available reports, categorized by job description. These descriptions include management, application monitoring, network monitoring and capacity planning. To show application performance trends, VitalSuite's planning report uses a simple trending arrow, pointing up or down, along with the current average, one-month, three-month, six-month and one-year utilizations. It offers only a relatively small number of preconfigured reports, but setting up new reports is easy. Moreover, linking the reports to show increasing levels of detail takes just a few mouse clicks and makes the reports highly effective and useful.
The VitalSuite installation process provides a hint as to the product's complexity with all the questions it asks and options it offers. Lucent's comprehensive documentation is a joy to use.
Synthesizing a solution
Like VitalSuite's AutoMon engine and Transact tool kit, S3 synthesizes transactions to artificially create conditions it can measure. For almost all Web-based applications (via a Web traffic interception module) as well as several well-known applications, such as PeopleSoft, SAP R/3 and Exchange, S3 comes with preconfigured transaction monitors that you configure and launch to find performance problems. To uncover problems with our custom-written vertical-market software, we used S3's API to build our own synthetic transactions. The API supports both user-defined plug-ins, which are custom-written synthetic transactions S3 can track, and Application Response Monitoring (ARM)-enabled applications based on the joint IBM Corp. / Hewlett-Packard Co. ARM standard.
The process is simple and well-documented, and after less than a day's programming, we had S3 monitoring our Visual Basic insurance rating application. The Web traffic interception module made quick work of monitoring our Web-based search engine's performance troubles. In each test, S3 correctly identified the bottleneck we'd deliberately created. However, VitalSuite's synthetic transaction tools were easier to use, and we found S3's user interface more difficult to navigate and operate than VitalSuite's. S3's menus didn't always provide quick access to the tasks we needed to do, and the product's use of names and IP addresses was inconsistent and confusing.
S3 has two significant strengths: its ability to detect variances from a baseline it develops from repeated exposure of the application simulation, and how it can predict performance (and identify trends) by analyzing historical data. NextPoint calls this baseline data "traffic signatures." Each signature represents application performance as a set of minimum, maximum and average response times.
S3's reports disclosed exactly the information we needed to solve performance bottlenecks in our application environments. For capacity planning, S3's performance predictions uncovered trends early and accurately in our tests.
Installing S3 is straightforward but tedious, and the online documentation is only adequate.
Scoping the problem
Out of the box, Compuware's EcoScope and EcoTools supported the widest range of applications. The extensive list of applications includes Oracle Corp., SQL Server, Sybase Adaptive Server, Exchange, Lotus Development Corp.'s Notes, PeopleSoft Inc. and even a smattering of SNA-based software products. We especially liked the ability to define applications and business functions in EcoScope Interactive Viewer by associating and naming groups of specific software, server, client and network components.
EcoScope and EcoTools focus on application uptime, time to return to service after a failure and response times. Using data from software probes running on Windows NT, NetWare and Unix machines around the network, EcoScope and EcoTools span the Open Systems Interconnection reference model from the network layer through the application layer to decode all the major network protocols. The probes gather information on network utilization, traffic patterns and application response times. EcoScope uses the information to establish baselines of utilization and response times. The probes additionally act as sentries by sending traps to an SNMP management tool when baselines are exceeded.
To monitor application performance, the probes passively monitor actual application network traffic to determine average and maximum response times for user-selectable transactions. In addition, EcoScope monitors Web site usage by measuring traffic volumes and response times for a given URL. This passive approach supplied us with performance data from the running of the actual application, which meant we didn't need to use synthetic transactions to simulate application performance problems. On the other hand, it also meant EcoScope and EcoTools had to be active any time the application might experience a problem.
EcoTools' Java-based console can remotely manage only some EcoTools functions.
We had to use the native Windows interface to configure the monitoring tasks.
Conversely, while the Windows interface can show only one statistical chart at a time, the Java-based console impressed us by graphing and displaying multiple charts simultaneously.
We were disappointed that EcoScope and EcoTools offered little in the way of historical reports. While the capacity planning and trending features are excellent, we had to manually export EcoScope data into a relational database to use query tools for advanced analysis of our historical data. Compuware suggests using a reporting aid (such as Seagate Software's Crystal Reports) for seriously studying the EcoScope and EcoTools data.
Installing EcoScope and EcoTools is complicated only by the need to distribute the software probes around the network. Compuware's installer even automatically enables Active Server Pages (ASP) in Microsoft's IIS Web server if ASP isn't already installed. All the detailed, technical documentation is online. Only an Installation and Quick Start guidebook accompanies EcoTools.
A Stellar approach
Ganymede's Pegasus impressed us with its support for a great many platforms.
Its probes (Ganymede calls them endpoints) run on MVS, various flavors of Unix, Linux, Windows 3.x, Windows 95, 98 and NT, NetWare and OS/2. The endpoints are exceedingly small, which makes them highly unobtrusive on client computers.
Their network utilization was similarly well within acceptable limits. We were delighted to note each endpoint was able to automatically update itself when a network administrator decides to install an upgrade. However, we found removing endpoints much more difficult than installing them.
Pegasus takes a bimodal approach to performance monitoring - it can emulate an application's use of the network by timing its transmission of synthetic transactions or it can simply eavesdrop on existing application-specific traffic. In both approaches, Pegasus detects trends and signals existing or impending performance deviations by applying automatically-calculated thresholds. Collectively, the deviations make up what Ganymede calls the Pegasus Severity Index. This index didn't always correctly identify the exact source of our deliberately-caused performance problems. In one test, we flooded a subnet with excess traffic through a router. Pegasus singled out the right subnet but nominated the wrong router as the gateway through which the traffic flowed onto the net.
Configuring Pegasus to synthetically mirror a custom application's network usage was particularly straightforward, involving only the editing of application-specific scripts. Pegasus comes with more than 30 application scripts, and the script language is simple, powerful and well-documented. Each script, representing an actual network-aware application, generates synthetic network traffic that Pegasus measures. Pegasus stores the result in a central proprietary repository for baselining and reporting.
The granularity of setting service levels is rather coarse. Because each application script can have only a single service level, the resulting trend data isn't as useful as it otherwise might be. We worked around this limitation by creating multiple application scripts for the same application.
The user interface lacks the flexibility and complexity of VitalSuite and S3.
Those portions of the interface for viewing reports and network connections, called the Monitor Console, are accessible via a Web browser. In fact, the Pegasus reporting engine provides Web-based access to all reports, ranging from "Executive Overview" to in-depth statistics. However, most Pegasus configuration tasks, such as defining endpoints, identifying applications to monitor and designing reports, are available only through the Win32 user interface.
Installing Pegasus is a matter of establishing the server module on Windows NT, setting up the Win32 configuration software on Windows 95, 98, NT or Windows 2000 Professional and deploying endpoints around the network. The documentation is a set of well-written manuals.
The right Response
ResponseCenter is a monitoring tool that generates and benchmarks synthetic customer transaction network traffic to detect performance problems. It does an excellent job of identifying and measuring common types of traffic, such as Open Database Connectivity (ODBC), SQL Server, HTTP, FTP, Network News Transfer Protocol, POP3 and Simple Mail Transfer Protocol activity, but it is almost useless for tracking custom applications. ResponseCenter lacks the ability to synthesize application-specific network request and response messages.
The ResponseCenter probes were about twice the size of the Pegasus probes and did not support auto-updating. Updating the probes to a new version entails visiting each network segment and installing the new software.
To its credit, ResponseCenter's management console software quickly and correctly discovered the installed probes. The user interface is simple, uncluttered and intuitive. Setting up a ResponseCenter threshold is a matter of invoking the New Test wizard within its Explorer-like interface. The wizard asks for the type of traffic to watch for, the type of agent to collect data from, the identity of a server to monitor and the specific event to use as a trigger (HTTP page receipt, for example). It then produces an application monitoring process you can schedule to run on a regular basis.
The ResponseCenter Win32 console displays graphical reports that can also be distributed as Web pages. The response time graph shows response times in user-selectable intervals, while the candle chart (vertical lines representing response time ranges, with the maximum value a line's topmost point and the minimum value a line's lowest point) displays mean, minimum and maximum response times along with standard deviation information. A Status Grid report highlights multiple monitoring processes concurrently, and the Throughput graph is a 3-D picture of response times and traffic volume for each time interval.
Unfortunately, ResponseCenter doesn't offer the array of comprehensive, analysis-oriented reports that VitalSuite and S3 do. In particular, ResponseCenter desperately needs predictive trend analysis reports.
Except for the need to distribute ResponseCenter probes across the network, installation is unremarkable. The 3-ring binder Startup Guide whets your appetite for detailed information but fails to supply it.
Scouting for slow applications
Like ResponseCenter, NetScout Systems' AppScout is a protocol-oriented rather than application-oriented performance monitoring tool. It can accurately and quickly tell you the volumes and generic types (FTP, HTTP, SMTP, SAP, etc.) of traffic crossing your network, but it can't pinpoint bottlenecks for custom-written applications. To use AppScout to solve an application performance problem, you'll need considerable expertise in how the application uses the network as you analyze AppScout's reports. Because AppScout requires the presence of NetScout SNMP probes on the various segments of the network, it's an appropriate accompaniment to the Netscout Manager Plus product.
AppScout is based on Application Response Time (ART) MIB technology. It collects network utilization data from the probes you've dispersed on the network and displays the result by protocol type, location, server and client.
Publishing AppScout reports as Web pages is easy to set up, and protecting the content of the Web pages with passwords is an easy process.
We loved its simple but effective user interface, which has an expandable tree view window of network nodes on the left and multiple results panes in the main window. The results panes can be an enterprise view, an application view, a server view, a location view or a client view. The enterprise view reveals aggregate network volume for all application traffic, the application view shows server utilization and response times, the server view pictures activity for a specific server and the client view identifies network activity for individual clients.
AppScout's online help is especially comprehensive and informative. The printed documentation, which explains the installation and initial use of AppScout, is brief but clear. Installing AppScout is easy and takes almost no time at all.
Making applications hum
If you live in a perfect world, where your network-savvy application programmers have an intuitive grasp of performance considerations and your skilled troubleshooter only needs a pencil, paper and a protocol analyzer to solve problems, you may not need any of the tools reviewed here. Everyone else should give VitalSuite a try.
Nance, a software developer and consultant for 29 years, is the author of Introduction to Networking and Client/Server LAN Programming. He can be reached at email@example.comNance is also a member of the Network World Test Alliance, a cooperative of the premier reviewers in the network industry, each bringing to bear years of practical experience on every review. For more Test Alliance information, including what it takes to become a member, go to www.nwfusion.com/alliance.