Kernel space: timerfd() and system call review

Linux system calls never change. What, never? Hardly ever

One of the fundamental principles of Linux kernel development is that user-space interfaces are set in stone. Once an API has been made available to user space, it must, for all practical purposes, be supported (without breaking applications) indefinitely. There have been times when this rule has been broken, but, even in the areas known for trouble (sysfs, for example), the number of times that the user-space API has been broken has remained relatively small.

Now consider the timerfd() system call, which was added to the 2.6.22 kernel. The purpose of this call is to allow an application to obtain a file descriptor to use with timer events, eliminating the need to use signals. The system call prototype, as found in 2.6.22, is:

 long timerfd(int fd, int clockid, int flags, struct itimerspec *utimer);

If fd is -1, a new timer file descriptor will be created and returned to the application. Otherwise, a timer will be set using the given clockid for the time specified in utimer. The TFD_TIMER_ABSTIME flag can be set to indicate that an absolute timer expiration is needed; otherwise the specified time is relative to the current time. The flags argument can also be used to request a repeating timer.

There is another aspect to the timerfd() API, though: a read on the timer file descriptor will return an integer value saying how many times the timer has fired since the previous read. If no timer expirations have happened, the read() call will block. In the 2.6.22 kernel, the returned value was 32 bits (on all architectures). It has since been decided that a 64-bit value would have been more appropriate, and a patch making that change has been merged for 2.6.23. The 2.6.22.2 stable update also contained the API change.

That is not the full story, though. Michael Kerrisk, while writing manual pages for the new system call, encountered a couple of other shortcomings with the interface. In particular, it is not possible to ask the system for the amount of time remaining on a timer. Other timer-related system calls allow for this sort of query, either as a separate operation or when changing a timer. Michael thought that the timerfd() system call should work similarly to those which came before.

Michael has now posted a patch fixing up the timerfd() interface. With this patch, the system call would now look like this:

long timerfd(int fd, int clockid, int flags, struct itimerspec *utimer,

struct itimerspec *outmr);

The new outmr pointer must be NULL when the file descriptor is first being created. In any other context, it will be used to return the amount of time remaining at any timerfd() call. So user space can query a timer non-destructively by calling timerfd() with a NULL value for utimer. If both timer pointers are non-NULL, the timer will be set to utimer, with its previous value being returned in outmr.

This is, of course, an entirely incompatible change to an API which has already been exported to user space; any code which is using timerfd() now will break if it is merged. By the rules, such a change should not be merged, but it appears that there is a good chance that the rules will be bent this time around. One can argue that, in a real sense, the API has not yet been made available to user space: there has been no glibc release which supports timerfd(). The number of applications using this system call must be quite low - if, in fact, there are any at all. So a change at this point, especially if it can get into 2.6.23, will improve the interface without actually causing any user-space pain.

Join the newsletter!

Error: Please check your email address.

More about Linux

Show Comments