The Kernel Column with Jon Masters
Jon Masters examines the latest goings-on in the Linux kernel community – and the release of the 3.6 kernel, hot off the press
Linus Torvalds announced the release of the 3.6 kernel, saying: “When I did the -rc7 announcement a week ago, I said I might have to do an -rc8, but a week passed, and things have been calm…” All in all, the release was uneventful, meaning that it successfully demonstrated the process working as intended.
As I mentioned last month, the feature that personally excites me the most about the 3.6 kernel (aside from improved Apple MacBook support) is the ‘suspend to both’ option. This feature will allow Linux systems to behave similarly to their Mac brethren. At the request of the user (by closing the lid, for example), the system can be configured to simultaneously write out a suspend image to the swap partition – as is already the case for hibernation – and also to RAM. Therefore the system will resume from RAM if possible, but fall back to using the disk image if the battery dies before the system is resumed. In the absence of bugs, this – in theory – allows for an infinite suspend period.
Linux 3.6 includes a number of changes to file systems and block devices. For starters, it should be ‘reliable’ to run swap over NFS, meaning that those truly diskless network-booted systems requiring swap can do so over the network. This was always possible to an extent, but now it is expected to work ‘reliably’ (this author uses quotation marks because there are still many things that can go wrong with this approach). In addition, 3.6 includes reworked quota support on both Btrfs and ext4 file systems. The former gains support for subvolume (smaller volumes made on the fly by users) quotas, and a new send/ recv serialisation technology that allows for fast incremental backup of snapshots, for example to keep a remote copy of a volume in sync. Not to be outdone, ext4 removes the special files it had been using for quota support and fully integrates quotas into the file system metadata.
As Linus says, he had considered an RC8. Typical release cycles take the form of the ‘merge window’ (period of time during which disruptive kernel changes are permitted), followed by multiple weeks of release candidates, roughly one per week. It is common to have the final release following the RC7 kernel, for a total of about eight weeks following the merge window. Sometimes an RC8 is necessary (2.6.38), and even more rarely release candidates beyond that (3.1 had an RC10 due to a couple of serious bugs being tracked down).
With the release of 3.6 comes the opening of the merge window for the 3.7 kernel cycle. It’s too early to know exactly what will be submitted for merging in 3.7 (more in the next issue!), but it is near certain that one of those feature requests will be support for the new 64-bit architecture from ARM, known as AArch64, or to the kernel community as ‘arm64’. The code has been through several rounds of public review and is considered ready for merging into the official kernel.org (Linus Torvalds) kernel. This was accompanied by a discussion in which the existing 32-bit maintainer (Russell King) expressed some frustration that he was not to be most active part of the kernel and that a single maintainer simply will not scale. He also noted, “In any case, no one with excellent coding skills such as yours is ever going to lack… opportunities in this community.” Truer words were seldom spoken. Russell has done an excellent job, and will continue to do so.
Jon Mason of Intel posted version 3 of an RFC (Request For Comments) patch series implementing support for non-transparent PCI bridges. A PCI Express non-transparent bridge (NTB) provides a point-to-point, electrically isolated, connection between two PCI Express- based systems. Neither side can see the entire memory addressing space of the other and so they operate independently, communicating through defined memory window apertures that are mirrored on both sides of the bridge. Communication can take the form, for example, of doorbell registers used to implement virtual networking, for which support could be implemented within higher level ‘client’ device drivers.
Rusty Russell proposed a more formal approach to virtio standardisation. Virtio is used within the Linux kernel for communication between the Linux kernel KVM hypervisor and guest systems. Up until now, compatibility has largely been assured through the relatively tight-knit upstream kernel community working on virtualisation technologies, and through diligence, but this will not scale in the longer term. Especially as Enterprises become increasingly reliant upon virtio, they will want the assurance of a standardised approach, and developers working on new features independently will benefit also. Rusty proposes that OASIS become the standards body and that the approach be as lightweight as can reasonably be achieved. In particular, by adopting PCI capabilities ASAP, there can be infinite feature bits for expressing possible future features that may be added to virtio.
Steven Rostedt pointed out a problem with the hwlat_detector originally written by this author. I wrote the hardware latency detector four or five years ago while working on a lot of real-time systems. In the case of real-time (such as systems used for stock trading) it is important to reduce overall system latencies – times when the system is doing non-real-time work. The preempt-rt kernel patch series adds many new features to the core kernel to reduce software- induced system latencies, but it originally overlooked the possibility that the system hardware vendor was periodically stealing CPU cycles to implement some ‘value add’ feature. On x86 systems in particular, vendors commonly use system management interrupts (SMIs) to place the system CPU(s) into a special mode in which they execute code hidden from the OS.
SMIs are convenient for emulation of hardware resources that are not physically present (but pretend to be so), or for working around hardware bugs by intercepting accesses and running replacement fixup code. They are also notorious for the feature-creep that vendors insert and their length of execution. It is possible to disable SMIs, but it’s not a good idea, because they are often used to implement system thermal and fan control. Users tend to get very upset when their systems physically melt. Therefore, the hwlat_detector aims only to detect the presence of SMIs and generate useful data that can be presented to system vendors in the form of a user demand to fix the problem. To detect SMIs, the detector makes use of the fact that certain system counters usually continue to increment even in the presence of SMIs. Therefore, time that cannot otherwise be accounted for must have been spent in a non- OS SMI handler. The existing detector would sample these internal system counters looking for a gap, but it failed to check if a gap was introduced between those samples. The fixed version is even more sensitive, just in time to be obsoleted by a brand new latency detector.
The ‘long term’ supported 3.2 series kernel is still alive. A new release was posted by Ben Hutchings, taking the series up to 3.2.29. Other releases included a number of more recent stable series kernels from Greg Kroah-Hartman, an updated Git utility (with few changes of note), and the return of stable RealTime (RT) tarballs from Steven Rostedt, for those who are following. There was some discussion about the return or otherwise of other content that had formally been on kernel.org prior to the security compromise of last year. For example, the ‘userweb’ webpage hosting service has still not returned and may never do so. The administrators suggest that the best they can do for now is to offer redirection of older URLs that might be used in MAINTAINERS files or similar. The security risk of running a general web hosting service is apparently not on the cards for the moment.
Finally this month, comic relief was provided in the form of a series of emails from ‘Rich Lawlman’, who among other things suggested that support for all architectures other than ARM be immediately dropped from the kernel, along with support for access to files other than through defined interfaces (iPhone style). Cruz Julian Bishop asked, “Do you guys get paid (or are part of a bet) when you make suggestions like these, or do you just do it for, as a friend tells me, ‘sh*ts and giggles’?” I think we know the answer already, Cruz.