The containers developers have what would seem to be a relatively straightforward problem: they would like to control access to devices on a per-container basis. Then containers could safely be granted access to specific devices without compromising the overall security of the system - even if a container has a root-capable process which can create new device files. Implementing this feature has been a longer journey than these developers had imagined, though, with the "device whitelist" feature being sent around to different kernel subsystems almost like one of those famous garbage barges from years past. A final resting place may have been found, though, and it may signal a change in how some security decisions are made in the kernel in the future.
The original version of the patch, posted by Pavel Emelyanov, set up a control group for the management of device accessibility within containers. The actual rules - and their enforcement - were stored deep within the device model subsystem. This drew an objection from Greg Kroah-Hartman, who suggested that, instead, this kind of access control should done either with udev or with the Linux security module (LSM) subsystem. Udev does not give the desired degree of control and, apparently, can be problematic for those wanting to run older distributions within containers, so it was not seriously considered. The LSM suggestion was, after some resistance, taken to heart, though.
The result was the device whitelist LSM patch, posted by Serge Hallyn. It was a stacking security module which made changes to a number of hooks. This is where James Morris came in and suggested that, instead, the whitelist should just be added to the existing capabilities security module. Then there would be no need for a separate module and things could be generally simplified.
Moving this logic into LSM means that instead of the cgroups security logic being called from one place in the main kernel (where cgroups lives), it must be called identically from each LSM (none of which are even aware of cgroups), which I think is pretty obviously the wrong solution.
Casey Schaufler also didn't like this idea:
When the next feature comes along are we going to stuff it into capabilities, too? Maybe we'll cram it into audit or CIPSO instead, but how long can this go on? Eventually we need a mechanism that allows more or less general mix-and-match, maybe with a few rules like "don't mix plaids and stripes" to keep things sane or these lesser facilities have no chance. Seems like we're still making LSM too hard to use.
At this point, the complaint was clearly not with just the device whitelist, but with the capabilities module as well. It seems that capabilities are a bit of a poor fit with the LSM idea as a whole. The fact that they exist at all is a bit of a historical artifact; some developers wanted to see them implemented that way to show the flexibility of the LSM interface and to let capabilities be omitted from embedded setups. As it happens, it's still not possible to remove capabilities, and they impose a bit of a cost on all other security modules.
The core problem is this: LSM, fundamentally, is a restrictive mechanism. An LSM hook can deny an action, but it can never empower a process to do something it would not have been allowed to do in the absence of the security module. The decision to disallow "authoritative hooks" was made explicitly back in 2001 as a way of restricting the scope of LSM modules and, hopefully, ensuring that those modules would not themselves become security problems.
But capabilities are an inherently authoritative mechanism - a capability check verifies the existence of a special permission which would otherwise not be there. The device whitelist is the same sort of thing: it grants access which would otherwise be denied. So it fits poorly with the LSM model.