Three weeks ago, LWN looked at the renewed interest in dynamic tracing, with an emphasis on SystemTap. Tracing is a perennial presence on end-user wishlists; it remains a handy tool for companies like Sun Microsystems, which wish to show that their offerings (Solaris, for example) are superior to Linux. It is not surprising that there is a lot of interest in tracing implementations for Linux; the main surprise is that, after all this time, Linux still does not have a top-quality answer to DTrace - though, arguably, Linux had a working tracing mechanism long before DTrace made its appearance.
Even a casual reader of the kernel mailing list will have noticed that there are a lot of tracing-related patches in circulation at the moment. There are so many, in fact, that it is hard to keep track of them all. So this article will take a quick look at the code which has been posted in an attempt to make the various options a bit clearer.
SystemTap remains the presumptive Linux tracing solution of choice. It is hampered by a few problems, though, including usability issues, a complete lack of static trace points in the mainline kernel, and no user-space tracing capability. On the usability side, we are seeing a few more kernel developers trying to put SystemTap to work and posting about the problems they are having. If one takes as a working hypothesis the notion that, if kernel hackers cannot make SystemTap work, many other users are likely to encounter difficulties as well, then one might conclude that addressing the reported problems would be a priority for the SystemTap developers.
The SystemTap developers do seem to be interested in these reports, which is a good sign. There are other things happening in the SystemTap arena, including the release of version 0.7 on July 15. This release adds a number of new features and tapsets, and a substantial set of examples as well. Meanwhile, Anup Shan has posted an interesting integration of SystemTap and the fault injection framework, allowing tapsets to control fault injection and trace the results.
James Bottomley has been playing some with the SystemTap code; one result of that work is changes to SystemTap's internal relocation code in an attempt to make it more acceptable for mainline kernel inclusion. There can be no doubt that the out-of-tree nature of much of the SystemTap support code has made it harder for that code to progress, so any improvement which makes it more likely that some of this code will be merged is welcome.
Also by James is this patch implementing a new way to put markers into the kernel. The addition of markers (or static tracepoints) has always been problematic in that many of these markers, by their nature, need to go into some of the hottest code paths in the kernel. To support dynamic tracing, these markers need to be available on production systems, so they must work without creating any significant performance regressions. Quite a bit of work has gone into the static marker code which is in the kernel (but mostly unused) now, but some developers are still uncomfortable with putting them into performance-critical paths.
James's patch addresses these concerns by putting the tracepoints entirely outside of the code paths. Rather than add some sort of marker to the code, these markers just make a note of just where in the code the marker is supposed to be; this note is stored in a separate part of the kernel binary. That information is enough for a run-time tool to patch in an actual jump to a tracing function should somebody want to see the information from that tracepoint. An additional benefit is that these markers do not interfere with any optimizations done by the compiler. Other solutions can insert optimization barriers which, while they do make life easier for the tracing subsystem, also affect the speed of the code even when the trace points are not active.