Once you have a number of devices in the system, the simple ingenuity of the unified object database becomes clearer. Each device Zenoss monitors becomes linked to many other objects within the database -- most without any user intervention. For example, an HP ProLiant server running Windows Server 2003 with the HP Insight Manager agents installed will be related to automatically created objects that represent every piece of hardware and software within the box, all the way down to individual Microsoft hotfixes that are installed and RAID cards being used. Selecting any of these objects from within the device view will switch your perspective to the new object. You can then see what other devices share the same object. For example, if HP sends you an advisory that dictates a critical firmware upgrade for a specific type of RAID controller, it is very easy to identify all the monitored devices using that card. As such, Zenoss becomes more than just a monitoring framework -- it can just as easily perform a broad set of inventory management tasks just by virtue of the fact that it tracks the relationships among all the devices it has collected.
Events and alerts
As soon as a device is added and properly modeled, Zenoss will immediately start collecting performance and event data about the device. Performance data generally include network interface, CPU, memory, and disk statistics, which Zenoss stores in standard RRDtool round-robin databases. All of these performance metrics are displayed in an intuitive graph viewer. Different types of devices will have different degrees of performance data recorded by default. Fortunately, it's fairly easy to define new performance characteristics for monitoring, though it does require some knowledge of SNMP and which MIBs the devices in question will answer to. A built-in SNMP browser integrated with some kind of monitoring wizard would make this task far easier -- perhaps we should look for this in a future release.
Detailed event data is captured and recorded into a MySQL database back end. Events can be acknowledged and moved to history, and the admin handling them can make notes to provide a historical record of what happened, as well as how and when it was resolved. This provides the data necessary for determining the historical uptime of a device. It also provides a way of identifying recurring events and how they might be correlated. Each Zenoss user can define which types of devices they want to be alerted about, what method should be used to alert them, and when they do or don't want to be alerted, as well as for what types of failures. Alerting is generally done via e-mail, though Zenoss can also generate SNPP (Simple Network Paging Protocol) and TAP (Telelocator Alphanumeric Paging) pages through the use of a gateway package such as Sendpage if it's required.
Events are generated via several different means. Zenoss can automatically monitor a device via ICMP (Internet Control Message Protocol) pings, TCP probes to service ports, process table monitoring via SNMP, and Windows process and service monitoring via WMI (Windows Management Instrumentation). In previous versions of Zenoss, WMI discovery, modeling, and monitoring functionalities were implemented via an outboard service that would be installed on a Windows proxy server and communicate with the Linux-based Zenoss host. This was due to the absence of a Linux-compatible WMI stack, and it could prove to be unwieldy -- especially if the proxy server was experiencing a problem.
In Zenoss Core 2, this functionality has been integrated natively within the main installation of Zenoss through the use of the WMI implementation introduced in Samba 4.0. Zenoss also supports the use of standard Nagios plug-ins, which immediately provide a huge library of specialized monitoring tools. Setting these up is not a fully automated process, and doing it correctly does require some knowledge of how the plug-ins work.
Tackling the test network
In my testing, I implemented Zenoss in a production network consisting of approximately 30 servers and about as many network devices. The test network was largely Windows-based, but also included a number of Linux and VMS hosts as well as a huge variety of network equipment. I downloaded Zenoss Core 2.1 as an RPM and installed it into a CentOS Linux virtual machine. Within a few minutes, I had the Web interface up and running and manually added a few test devices. By far the most time-consuming part of the initial setup was configuring the servers and network devices in the test environment with the proper SNMP settings. Once that was done, getting them to be properly recognized by Zenoss was easy. If your environment is already configured correctly, you can use an automated network discovery feature to detect and model whole subnets en masse.
Slightly more difficult was getting the WMI functionality of Zenoss to operate properly. I found the WMI implementation to be sensitive to the case of the device name. In some cases it seemed to work with a capitalized name, and in other cases it would work only with a lowercase name -- regardless of the actual host name capitalization. Additionally, domain controllers required slightly different Windows user name syntax in order to function correctly. These wrinkles were easy to iron out in my test environment, but in a much larger Windows environment, they could take a significant amount of time to work through. Zenoss fixed many other WMI-related problems in previous minor releases, but it looks as though there are still a few left. But overall, once the WMI subsystem was configured properly, it worked well.