Zenoss: New dog masters old monitoring tricks
- 30 November, 2007 12:50
Since the dawn of the business network, there has been a need to ensure that the network services provided to the enterprise are alive and responsive. Traditionally, in midsized businesses, this role has been filled by complex, closed source, and fantastically expensive solutions from manufacturers such as BMC, CA, HP, and IBM. And while these extravagant expenses make no customer happy, many users of these packages also complain of their complexity. Enough administrators have spent enough time wrangling with their monitoring systems to make a lot of smart people imagine that there must be a better way.
Enter the latest salvo in this war: Zenoss Core 2, an enterprise-class open source monitoring package that has been built from the ground up to replace complex, closed source monitoring solutions. It certainly won't take the place of every function of these high-end solutions, but for the vast majority of IT shops, Zenoss Core will be deployed faster, be managed by fewer people, and cost a fraction as much as its closed source rivals.
Zenoss' key strength is a unified design that collects many types of information from numerous sources and displays them in an intelligent way. While many monitoring products feel like an amalgamation of several different pieces of software that have been stapled together, Zenoss stands apart with a unified, object-based repository and a tightly integrated set of tools and reports, yet doesn't draw itself into a corner as far as extensibility and future growth are concerned.
The latest Zenoss Core releases, Versions 2.0 and 2.1, sport several new features as well as a raft of bug fixes. New features include integration with Google Maps, providing interactive network topology views, plus network status visualizations and a drag-and-drop dashboard that allows users to assemble the dashboard components that they use most often. Although these features may seem somewhat superfluous, they are actually some of the most sought-after features in a management system. In a real-world scenario, it is sometimes far more helpful to have a large diagram with one system or site lit up in red than it is to receive an e-mail or SMS with error code in it.
Inside Zenoss Core
Zenoss is built on the open source Zope application server and the Python programming language, which provide a solid, standards-based development platform that is largely responsible for Zenoss' meteoric growth. The class-based relationships between monitored devices, performance data, event logs, and all of their associated organizational containers are well thought out and easy to navigate. The underlying data structures are equally straightforward, making the software easy for developers to extend and grow. These factors fuel a dynamic open source project that is one of the most active on SourceForge.
Zenoss is distributed either in the form of Linux RPM (Red Hat Package Manager) packages or as a prebuilt VMware appliance. It is readily compatible with a wide range of popular Linux distributions as well as Apple's OS X. Distribution in the form of a VMware appliance makes Zenoss easy to evaluate and helps pave the way for shops with no Linux expertise or available dedicated hardware to implement it. The RPM installation is nicely scripted and works well enough such that an admin with very limited Linux experience will find it relatively painless to get up and running. Upgrades are also relatively easy to accomplish -- usually only requiring the application of a new RPM and the execution of a data migration script.
After Zenoss is installed and running, the Web management GUI becomes available and you can start adding devices. Depending on the type of device being added, generally all that's needed for full discovery is a hostname and a read-only SNMP community string. The included device modeling software is intelligent enough to figure out whether it is parsing a switch, a server, a UPS, or a number of other basic types of devices, and determine which operating system is running. The vast majority of properly configured SNMP-capable devices will be automatically detected without very much direct intervention. If a modeler has been written to recognize the device (in the case of Dell or HP servers, for instance), a great deal of extra hardware-specific information can also be gleaned from the manufacturer's management agents.
As with any auto-discovery process, there are always devices that won't be detected correctly, but these will generally be the exception rather than the rule -- even in fairly diverse environments. Moreover, it's fairly easy to suggest to Zenoss what types of devices they are and how they should be treated if the device modeler doesn't quite get it to begin with. Only the more unusual devices such as a network-attached Fibre Channel switch or tape backup unit will be entirely unrecognized. Even in these cases, some statistics can still be recorded about the Ethernet interfaces and basic SNMP information about the device so long as it implements a standard SNMP MIB (Message Information Block).
Device support is always a challenge for monitoring packages, and this is one area in which commercial software solutions are generally more capable; after all, they typically have had more capital to invest in developing monitoring plug-ins. However, this increased capability is usually at the expense of user and community extensibility -- not a trade-off Zenoss has made, and thankfully so. Zenoss is betting that a combination of internal commercial development and a great deal of community development will bridge this gap and make Zenoss' built-in device support comparable with that of much larger competitors. Although this is admittedly not the case today, Zenoss Core did correctly recognize and model more than 80 percent of a rather complex corporate network that I used for testing.
Once you have a number of devices in the system, the simple ingenuity of the unified object database becomes clearer. Each device Zenoss monitors becomes linked to many other objects within the database -- most without any user intervention. For example, an HP ProLiant server running Windows Server 2003 with the HP Insight Manager agents installed will be related to automatically created objects that represent every piece of hardware and software within the box, all the way down to individual Microsoft hotfixes that are installed and RAID cards being used. Selecting any of these objects from within the device view will switch your perspective to the new object. You can then see what other devices share the same object. For example, if HP sends you an advisory that dictates a critical firmware upgrade for a specific type of RAID controller, it is very easy to identify all the monitored devices using that card. As such, Zenoss becomes more than just a monitoring framework -- it can just as easily perform a broad set of inventory management tasks just by virtue of the fact that it tracks the relationships among all the devices it has collected.
Events and alerts
As soon as a device is added and properly modeled, Zenoss will immediately start collecting performance and event data about the device. Performance data generally include network interface, CPU, memory, and disk statistics, which Zenoss stores in standard RRDtool round-robin databases. All of these performance metrics are displayed in an intuitive graph viewer. Different types of devices will have different degrees of performance data recorded by default. Fortunately, it's fairly easy to define new performance characteristics for monitoring, though it does require some knowledge of SNMP and which MIBs the devices in question will answer to. A built-in SNMP browser integrated with some kind of monitoring wizard would make this task far easier -- perhaps we should look for this in a future release.
Detailed event data is captured and recorded into a MySQL database back end. Events can be acknowledged and moved to history, and the admin handling them can make notes to provide a historical record of what happened, as well as how and when it was resolved. This provides the data necessary for determining the historical uptime of a device. It also provides a way of identifying recurring events and how they might be correlated. Each Zenoss user can define which types of devices they want to be alerted about, what method should be used to alert them, and when they do or don't want to be alerted, as well as for what types of failures. Alerting is generally done via e-mail, though Zenoss can also generate SNPP (Simple Network Paging Protocol) and TAP (Telelocator Alphanumeric Paging) pages through the use of a gateway package such as Sendpage if it's required.
Events are generated via several different means. Zenoss can automatically monitor a device via ICMP (Internet Control Message Protocol) pings, TCP probes to service ports, process table monitoring via SNMP, and Windows process and service monitoring via WMI (Windows Management Instrumentation). In previous versions of Zenoss, WMI discovery, modeling, and monitoring functionalities were implemented via an outboard service that would be installed on a Windows proxy server and communicate with the Linux-based Zenoss host. This was due to the absence of a Linux-compatible WMI stack, and it could prove to be unwieldy -- especially if the proxy server was experiencing a problem.
In Zenoss Core 2, this functionality has been integrated natively within the main installation of Zenoss through the use of the WMI implementation introduced in Samba 4.0. Zenoss also supports the use of standard Nagios plug-ins, which immediately provide a huge library of specialized monitoring tools. Setting these up is not a fully automated process, and doing it correctly does require some knowledge of how the plug-ins work.
Tackling the test network
In my testing, I implemented Zenoss in a production network consisting of approximately 30 servers and about as many network devices. The test network was largely Windows-based, but also included a number of Linux and VMS hosts as well as a huge variety of network equipment. I downloaded Zenoss Core 2.1 as an RPM and installed it into a CentOS Linux virtual machine. Within a few minutes, I had the Web interface up and running and manually added a few test devices. By far the most time-consuming part of the initial setup was configuring the servers and network devices in the test environment with the proper SNMP settings. Once that was done, getting them to be properly recognized by Zenoss was easy. If your environment is already configured correctly, you can use an automated network discovery feature to detect and model whole subnets en masse.
Slightly more difficult was getting the WMI functionality of Zenoss to operate properly. I found the WMI implementation to be sensitive to the case of the device name. In some cases it seemed to work with a capitalized name, and in other cases it would work only with a lowercase name -- regardless of the actual host name capitalization. Additionally, domain controllers required slightly different Windows user name syntax in order to function correctly. These wrinkles were easy to iron out in my test environment, but in a much larger Windows environment, they could take a significant amount of time to work through. Zenoss fixed many other WMI-related problems in previous minor releases, but it looks as though there are still a few left. But overall, once the WMI subsystem was configured properly, it worked well.
Once my Windows machine names were sorted out, Zenoss discovered all Windows services running on the servers and made it easy to enable monitoring of the services per device or globally. Even given the hitches, it still took far less time to deploy Zenoss than an agent-based installation would have required.
Not counting the time getting SNMP configured on the servers (the only real client-side task involved in deployment), I spent perhaps two or three hours working with Zenoss to get the bulk of the servers and network devices added, monitored, and alerting with a useful set of service and performance thresholds. Much more tweaking would be required to get Zenoss to report on all the various services I'd want to monitor and to tune the alerting rules properly, but still, a few hours from zero to monitoring is darn fast. Even the basic level of reporting I enabled could take days of tedious work with other tools I've used.
Perhaps the biggest value any open source monitoring package can bring to the table is extensibility and community support. If an organization owns a critical piece of equipment for which it needs extended performance data, above and beyond what Zenoss supports by default, it's not terribly difficult for someone with the right skills to add -- but no one else can take advantage of that work unless there's an easy way to share it. One of the new features found within Zenoss Core 2 is the concept of a ZenPack, a collection of performance and event monitoring settings and commands that can be easily defined and made portable. The addition of this framework will make it far easier for the community to share customized monitoring templates, and can only serve to fuel interest in Zenoss, just as plug-in and templating systems have done so successfully for projects such as Nagios and Cacti.
From a commercial perspective, Zenoss has taken a fairly unique approach to building the business. The Zenoss Core software is licensed under the GPL (General Public License) and will always be freely available as such. Zenoss makes money through the sale of support and maintenance contracts as well as training and consulting. Some users with previous Linux-based, open source monitoring experience may not see the need to maintain a support contract, and instead will rely on self-support. Zenoss recognizes this and hopes to draw more users into support contracts through the release of specialized, high-value Certified ZenPacks to comprehensively monitor services such as Microsoft Exchange and SQL Server.
In addition to the commercial support offerings available, Zenoss has also fostered an active user community through the use of mailing lists and forums that can be found on the company's Web site. In short, any organization can find the right balance of commercial or community support depending upon their needs and experience.
With the Version 2.0 and 2.1 releases, Zenoss Core continues to become a stronger product -- one that has the capability to draw a significant following from the open source community as well as from enterprises seeking alternatives to their legacy monitoring and management services. Whether or not Zenoss and open source projects like it (see the sidebar on GroundWork) will eventually break the stranglehold that the aging Goliaths have on network and systems management in the enterprise is yet to be seen. But there's no question that this latest release is a step in the right direction.
The Bottom Line: Zenoss Core 2.1Zenoss, zenoss.com
Overall score: Very Good, 8.4/10
Device support: 8/10
Cost: Zenoss Core: Free (GPL); Enterprise Support starts at US$66 per device per year for 50 devices. Enterprise Support with access to Certified ZenPacks, Enterprise Report Library, and third-party helpdesk integration starts at US$100 per device per year for 50 devices.
Platforms: Red Hat Enterprise Linux, Fedora Core, Mac OS X, Ubuntu, Suse, CentOS, and Windows (via VMware Appliance)
Bottom Line: Zenoss Core 2.1 open source enterprise management system provides a cheap yet scalable and easy-to-use replacement for legacy management architecture. Initial deployment is relatively easy even for Linux neophytes, and the product is backed by an active community. Some Windows monitoring capabilities require minor workarounds, but are otherwise stable.