Kernel space: kerneloops, read-mostly, I/O port 80

New kernel changes will help organize bug reports, use cache more efficiently, and prevent crashes on some x86-64 systems.

Kerneloops. Triage is an important part of a kernel developer's job. A project as large and as widely-used as the kernel will always generate more bug reports than can be realistically addressed in the amount of time which is available. So developers must figure out which reports are most deserving of their attention. Sometimes the existence of an irate, paying customer makes this decision easy. Other times, though, it is a matter of making a guess at which bugs are affecting the largest numbers of users. And that often comes down to how many different reports have come in for a given problem.

Of course, counting reports is not the easiest thing to do, especially if they are not all sent to the same place. In an attempt to make this process easier, Arjan van de Ven has announced a new site at kerneloops.org. Arjan has put together some software which scans certain sites and mailing lists for posted kernel oops output; whenever a crash is found, it is stuffed into a database. Then an attempt is made to associate reports with each other based on kernel version and the call trace; from that, a list of the most popular ways to crash can be created. As of this writing, the current fashion for kernel oopses would appear to be in ieee80211_tx() in current development kernels. Some other information is stored with the trace; in particular, it is possible to see what the oldest kernel version associated with the problem is.

This is clearly a useful resource, but there are a couple of problems which make it harder to do the job properly. One is that there is no distinctive marker which indicates the end of an oops listing, so the scripts have a hard time knowing where to stop grabbing information. The other is that multiple reports of the same oops can artificially raise the count for a particular crash. The solution to both problems is to place a marker at the end of the oops output which includes a random UUID generated at system boot time. Patches to this effect are circulating, though getting the random number into the output turns out to be a little harder than one might have expected. So, for 2.6.24, the "random" number may be all zeroes, with the real problem to be solved in 2.6.25.

Read-mostly. Anybody who digs through kernel source for any period of time will notice a number of variables declared in a form like this:

static int __read_mostly ignore_loglevel;

The __read_mostly attribute says that accesses to this variable are usually (but not always) read operations. There were some questions recently about why this annotation is done; the answer is that it's an important optimization, though it may not always be having the effect that developers are hoping for.

As is well described in What every programmer should know about memory, proper use of processor memory caches is crucial for optimal performance. The idea behind __read_mostly is to group together variables which are rarely changed so they can all share cache lines which need not be bounced between processors on multiprocessor systems. As long as nobody changes a __read_mostly variable, it can reside in a shared cache line with other such variables and be present in cache (if needed) on all processors in the system.

The read-mostly attribute generally works well and yields a measurable performance improvement. There are concerns, though, that this feature could be over-used. Andrew Morton expressed it this way:

So... once we've moved all read-mostly variables into __read_mostly, what is left behind in bss? All the write-often variables. All optimally packed together to nicely maximise cacheline sharing.

Combining frequently-written variables into shared cache lines is a good way to maximize the bouncing of those cache lines between processors - which would be bad for performance. So over-aggressive segregation of read-mostly variables to minimize cache line bouncing could have the opposite of the desired effect: it could make the kernel's cache behavior worse.

The better way, says Andrew, would have been to create a "read often" attribute for variables which are frequently used in a read-only mode. That would leave behind the numerous read-rarely variables to serve as padding keeping the write-often variables nicely separated from each other. Thus far, patches to make this change have not been forthcoming.

Join the newsletter!

Or
Error: Please check your email address.

More about LinuxSpeed

Show Comments