Once upon a time, a Linux distribution would be installed with a /dev directory fully populated with device files. Most of them represented hardware which would never be present on the installed system, but they needed to be there just in case. Toward the end of this era, it was not uncommon to find systems with around 20,000 special files in /dev, and the number continued to grow. This scheme was unwieldy at best, and the growing number of hotpluggable devices (and devices in general) threatened to make the whole structure collapse under its own weight. Something, clearly, needed to be done.
Stories by Jonathan Corbet
The TALPA malware scanning API was covered in LWN in December, 2007. Several months later, TALPA is back - in the form of a patch set posted by a Red Hat employee. The resulting discussion has certainly not been what the TALPA developers would have hoped for; it is, instead, a good example of how a potentially useful idea can be set back by poor execution and presentation to the kernel community.
Three weeks ago, LWN looked at the renewed interest in dynamic tracing, with an emphasis on SystemTap. Tracing is a perennial presence on end-user wishlists; it remains a handy tool for companies like Sun Microsystems, which wish to show that their offerings (Solaris, for example) are superior to Linux. It is not surprising that there is a lot of interest in tracing implementations for Linux; the main surprise is that, after all this time, Linux still does not have a top-quality answer to DTrace - though, arguably, Linux had a working tracing mechanism long before DTrace made its appearance.
Even the most casual observer of the linux-kernel mailing must have noticed that, in the shadow of the firmware flame war, there is also a heated discussion over the management of security issues. There have also been some attempts to turn this local battle into a multi-list, regional conflict. Finding the right way to deal with security problems is difficult for any project, and the kernel is no exception. Whether this discussion will lead to any changes remains to be seen, but it does at least provide a clear view of where the disagreements are.
One of the fundamental data structures in the networking subsystem is the transmit queue associated with each device. The core networking code will call a driver's hard_start_xmit() function to let the driver know that a packet is ready for transmission; it is then the driver's job to feed that packet into the hardware's transmit queue.
The kernel can't know if you want those low-priority processes to use all the CPU power on the system, or if you want them to pile up on one CPU and save power on the rest. Developers debate the best way to set the system's power rules.
On June 23, HP announced that it was releasing the source for the "Tru64 Advanced Filesystem" (or AdvFS) under version 2 of the GPL. This is, clearly, a large release of code from HP. What is a bit less clear is what the value of this release will be for Linux. In the end, that value is likely to be significant, but it will be probably realized in relatively indirect and difficult-to-measure ways.
Arjan van de Ven's kernel oops report always makes for interesting reading; it is a quick summary of what is making the most kernels crash over the past week. It thus points to where some of the most urgent bugs are to be found. Sometimes, though, this report can raise larger issues as well. Consider the June 16 report, which notes that quite a few kernel crashes were the result of a not-quite-ready wireless update shipped by Fedora. Ingo Molnar was quick to jump on this report with a process-related complaint:
Andrew Morton is well-known in the kernel community for doing a wide variety of different tasks: maintaining the -mm tree for patches that may be on their way to the mainline, reviewing lots of patches, giving presentations about working with the community, and, in general, handling lots of important and visible kernel development chores. Things are changing in the way he does things, though, so we asked him a few questions by email. He responded at length about the -mm tree and how that is changing with the advent of linux-next, kernel quality, and what folks can do to help make the kernel better.
The AIM benchmark attempts to measure system throughput by running a large number of tasks (perhaps thousands of them), each of which is exercising some part of the kernel. Yanmin Zhang reported that his AIM results got about 40% worse under the 2.6.26-rc1 kernel. He took the trouble to bisect the problem; the guilty patch turned out to be the generic semaphores code. Reverting that patch made the performance regression go away - at the cost of restoring over 7,000 lines of old, unlamented code. The thought of bringing back the previous semaphore implementation was enough to inspire a few people to look more deeply at the problem.
All communities develop rituals over time. One of the enduring linux-kernel rituals is the regular heated discussion on development processes and kernel quality. To an outside observer, these events can give the impression that the whole enterprise is about to come crashing down. But the reality is a lot like the New Year celebrations the author was privileged enough to see in Beijing: vast amounts of smoke and noise, but everybody gets back to work as usual the next day.
The kernel developers are generally quite good about responding to security problems. Once a vulnerability in the kernel has been found, a patch comes out in short order; system administrators can then apply the patch (or get a patched kernel from their distributor), reboot the system, and get on with life knowing that the vulnerability has been fixed. It is a system which works pretty well.
The last couple of years have seen a renewed push within the kernel community to avoid regressions. When a patch is found to have broken something that used to work, a fix must be merged or the offending patch will be removed from the kernel. It's a straightforward and logical idea, but there's one little problem: when a kernel series includes over 12,000 changesets (as 2.6.25 does), how does one find the patch which caused the problem? Sometimes it will be obvious, but, for other problems, there are literally thousands of patches which could be the source of the regression. Digging through all of those patches in search of a bug can be a needle-in-the-haystack sort of proposition.
In general, the kernel's memory allocator does not like to fail. So, when kernel code requests memory, the memory management code will work hard to satisfy the request. If this work involves pushing other pages out to swap or removing data from the page cache, so be it. A big exception happens, though, when an atomic allocation (using the GFP_ATOMIC flag) is requested. Code requesting atomic allocations is generally not in a position where it can wait around for a lot of memory housecleaning work; in particular, such code cannot sleep. So if the memory manager is unable to satisfy an atomic allocation with the memory it has in hand, it has no choice except to fail the request.
Linux enthusiasts like to point out just how scalable the system is; Linux runs on everything from pocket-size devices to supercomputers with several thousand processors. What they talk about a little bit less is that, at the high end, the true scalability of the system is limited by the sort of workload which is run. CPU-intensive scientific computing tasks can make good use of very large systems, but database-heavy workloads do not scale nearly as well. There is a lot of interest in making big database systems work better, but it has been a challenging task. Nick Piggin appears to have come up with a logical next step in that direction, though, with a relatively straightforward set of core memory management changes.