Tale of two cloud-based management tools
- 07 October, 2013 10:51
Anturis and CloudPassage Halo are complementary products that attack infrastructure monitoring from different directions. Anturis is a cloud-based portal that monitors systems connectivity, systems, MySQL databases and websites. CloudPassage Halo monitors operating systems instances and the comparatively sticky compliance of instance state machines.
Both use portals, which contain the configuration lists of your infrastructure. The portals are designed to be external gatherers of information from instances. The products send alarms when there's an error condition, but neither "fixes" the problems it finds -- that's up to the administration and technical savvy of monitoring personnel.
[ALSO:Top 10 cloud tools]
Neither product talks to traditional enterprise systems management or help desk applications natively, although there's some programmability with both products. Instead, it's up to the administrator to deal with the alarm condition, and the underlying conditions that lead to the fault identified. Both are different than IPMI or SNMP protocol-based monitoring tools, and blissfully, both monitor Windows and Linux instances (but neither monitors MacOS X). CloudPassage Halo plays into fixed or cloud constructs, while Anturis is better poised (but not limited to) fixed infrastructure. Both can handle "multi-tenant" construction.
CloudPassage is all about configuration management and compliance, and watchful eyes on deployed systems. Halo comes in three forms, Basic, NetSec, and Professional. At minimum, it allows multi-cloud firewall automation and security alerting. The NetSec form adds access control (which we feel is mandatory), two-factor authentication, support, and account management. The Professional (and most costly version) is the most comprehensive and feature-complete version.
All three versions work in/on internal server infrastructure assets, cloud, or multi-cloud constructs. Halo can be deployed with cloud stacks, so that the compliance and firewalling automation/hardening can be automated for rapid change up and instance bursting. It has several facets, but it's a compliance reporting tool, and except for firewall hardening, it plays a passive role and doesn't fix problems that it finds.
It can, however, when used with active configuration control, help admins produce "gold images" of various combinations that are lean, have a smaller attack surface, are up-to-date, patch and fix-wise, and create a baseline image integrity. Were it to be an active controller, e.g. fixing tool as well, Halo would be totally dangerous in its place as a root/service-authorized tool -- easily explosive in hostile hands.
Halo polls/scans on queue, spawning a scan comparing instance software and settings against the NIST database of known Common Vulnerability and Exposures. It pipes this information to the CloudPassage Portal for reporting purposes, although it's also possible to use its server-based daemons in direct queries -- if you have its keys.
Keys are static, but could be admin-regenerated. The fact that the integrity can be viewed means that a static key, in our estimation, is a bad idea. Key life and potential key distribution to other staff, admins, or third-parties increases the potential for problems. It's a critical component to control.
To this end, CloudPassage also has selectable multi-authentication API, which closes the administrative ports on clients, opening them only when a user has a Yubikey (USB single key security device) and the CloudPassage password or has an SMS authentication token sent through GhostPorts. In our opinion, GhostPorts should be mandatory.
As mentioned, Halo doesn't fix problems it finds, as some can't be fixed and others require the discretion, skills, or active acknowledgement of the IT administrator. It's possible to use Puppet (tested), Chef, or Rightscale app logic to alter or manage image/instance settings to address all of the dirt that Halo finds. It's better, we found, to consider stripping images of vulnerable apps if they're not used and get rid of your sludge before you start -- retrofitting can become gruesome.
The testing/parsing/configuration-problems engine Halo uses is claimed by CloudPassage to meet various regulatory mandates for a multitude of compliance testing, as well as meeting various international systems benchmarks. As we don't perform these compliance tests, we can't vouch for Halo's claims. We do, however, like their methodology. We believe Halo goes a long way towards allying configuration and security efforts for systems administrators at the "street level".
We found it possible to perform a baseline scan of up to 10,000 objects per server. Objects can be files, folders, and even Windows Registry objects, and a SHA-256 hash is created in doing so. Files larger than a gigabyte can't be scanned, unfortunately, but objects like kernel files and most folders can be scanned for change in hash and in metadata (with some limitations). The integrity of important files, once the baseline is established, can then be the crux of errors and warnings. These are user-defined. Updates and patches and fixes, of course, will trigger problem management that must be resolved in terms of establishing new baselines for subsequent watches during the polled scans.
CloudPassage also supports the deployment of pre-built server hardening policies in all versions. These emerge from stock policies, added by editing/adding additional policies, then deploying the policies -- in either the free or "Professional" version.
How it works
Using Halo requires establishing an account with CloudPassage, then one downloads and configures the daemon software into either operating system. The daemon runs on schedule, collecting information, which then runs through an analysis engine at CloudPassage, then, by default, the results will then show all of the policy violations it found. And it finds plenty of them. We found for compliance sake, it's best to heavily examine excepting policies -- and this will be documented.
There is accessibility control to Halo. We could limit the authorized IPv4 (not IPv6!) addresses or a CIDR block of address that can login to the portal. The daemon registration keys can also be revoked and regenerated -- should someone leave an organization as mentioned. The Halo daemons/services check into Halo's portal as a sign of internal life (rather than an IPMG "ping" return) for the hosts via an assigned interval.
Procedurally, after authentication, download, and installation, the daemon/service verifies itself, then proceeds to examine its environment as a root system process on the platform where it's been installed and invoked. Data gets sent to the CloudPassage portal. The analysis engine is gruesomely tight, and is referenced to CVSS bulletins and detailed policies descriptions, which you can add to, or untighten at will by turning down the sensitivity of the policies, or by adding your own, or by turning down the Severity Levels of the CVSS Threat.
Halo has an extensive list of configuration settings and tests that it performs on the instance it resides in. No matter how perfect and clean we thought we were, Halo will rat out every single configuration mistake it can find, and it can find plenty of them. Then comes the devil of the details.
Configuration mistakes, often referencing CVEs are then listed, categorized by criticality, external vulnerability, often with a CVE bulletin citing chapter and verse of a known vulnerability. The lists are operating system version-specific, and the details are gruesome. No matter how clever you believe you are, Halo will out-you for your mistaken configuration. It's depressing, and horridly effective.
For some applications, there are no fixes for CVEs and the presence of software that triggers configuration vulnerabilities. One can turn down the noise level, but policy alterations are noted. Why wasn't that fixed? Ah -- we see you altered the policy to permit the deadly configuration we're not about to hang you by.
We attempted to find ways to make pleasing versions of both Linux and Windows that would satisfy Halo's obsessive-compulsive list. It's nigh impossible to do, and so changing the criticality of the problem cited is the key to apparent happiness and Halo's end goal of compliance. You will be subjected to policy control by Halo, and you will love to hate it.
Anturis offers individual component monitoring for both Windows (Vista+ and servers) and Linux (x86/x64) instances. It's a top-side view of major component functions, rather than the comparatively extreme configuration information rendered by CloudPassage Halo. Anturis watches everything from its portal once you've started your monitoring, and has the resources to give a "world-view" of how your site and its components are reacting to different parts of the world.
The components can be servers, databases, mail systems, and more. One of the most clever monitors is an interactive automated web server monitor that can be programmed to interact with a website to see if active pages respond correctly to get/post transactions.
The Anturis system contains public and private agents. The main portal is hosted in Germany, but agent hosts located in Dallas, Moscow, Vancouver B.C., Amsterdam and the HQ in Berlin can be used to do things like ping hosts for response time.
The private agents are daemons that report to the Anturis portal, on an administratively defined schedule. Public or privately spawned agent communications are in two states, problem, or not-problem -- meaning okay. One cannot acknowledge a problem and have it considered remedied.
External Monitors pages at the portal allowed us to view actions as seen by the portal or its agents. Internal monitors are daemons installed into servers and other components in turn, check things like CPU utilization, disk space, and other characteristics. Disk space too low on Server 21? Make it a problem, define what space is needed, and it will report when disk space is too low. Such monitoring traps are fairly rudimentary, but also often the crux of systems insanity.
Monitors can also check the presence of errors in log files. Sadly, no stock examples exist, a decided failing, we judged. However, you can build your own log file monitors via a wizard, by specifying the path (often in the same place in a file system, like /var/log in Linux or in the Event logs in Windows), then the specifics of the datum, and whether the log is a rolling file or if the entire file should be read, rather than twigging at the first found instance.
Agents can also test life in mail servers through SMTP and POP3 contacts externally. Internally, the same servers that host mail services can be checked for varying conditions. They're not as sophisticated as having Microsoft's System Center Configuration Manager, rather a quick health check, easily deployed, if a bit superficial.
Server processes are monitored by private agents, but the data points are very few compared to very fine-grained monitoring products like Microsoft's System-Center 2012. What's monitored could be: free physical memory, disk space; swap usage; aforementioned log file line item presence, CPU or disk use; an SNMP device get characteristic; or a custom check generated by a shell command. Also available are MySQL database checks -- more than 15 of them.
The shell command can look for the range or textual content return of an executed command or script on the targeted host. If you wanted to check an IP address, you could have the shell execute ifconfig/ipconfig then check for the substring find of the correct address. Or you can check for a range of numbers. Either way, you can specify the period of execution, and if the return from the executed command/script doesn't work, an alarm can be popped to remind you, as an example, to change tapes, while you're at the beach.
The MySQL monitor, which we only remedially tested, show failures triggered by a percentage rate for both Warning and Error conditions. Attributes rates checked are: slow queries, slow threads, heavy joins, table lock contention, on-disk temporary table rate, key cache misses, query cache misses, query cache pruning rate, innodb buffer pools misses, innodb pool wait rate, log cache wait rate, thread cache miss rates, innodb buffer pool utilization, and connections usage, all expressed as rates.
We could combine internal monitoring with agent monitoring for any given server we chose, and it's possible to link together components (ex: server inside, outside, and web characteristics depending on another server) to toggle dependency checks on services both inside servers, and on each other.
Perhaps the best sauce on the platter is a web server dynamic examination, optionally a logon under ssl/https. This monitor can examine certificates, then go through a series of get/posts, checking the results of posts and monitoring their response times. We could test this from our own server(s) or choose the response times from one of Anturis's hosts on the next click to check our logic. Once working, we could then poll our test site to see if transactions were completing within desired times.
If any monitor traps (fails), we could have Anturis email, SMS, or make an automated call. It's also possible to assign persons to groups, and divide turf into locations.
Anturis cuts through a myriad of sophisticated competitive products to deliver the basics, albeit with some sophisticated twists. It's a much different kind of beast than Cloud Passage, but oddly, they're two edges of the sword of day-to-day deployment and ops. For small organizations needing a handle on the question "are we up?" it does a very good job. Think of it as UptimeMonitoring-as-a-Service.
Henderson is principal researcher for ExtremeLabs, of Bloomington, Ind. He can be reached at email@example.com.