The Kernel Column with Jon Masters – Developing Linux Kernel 3.5
Jon Masters summarizes the latest goings on in the Linux kernel community, including the release of the 3.5 kernel, and an unfortunately embarassing incident for Microsoft’s virtualization team.
Linus Torvalds announced the latest 3.5 Linux kernel following a quiet
Release Candidate (RC) 7 that had only trivial fixes and some MIPS
architecture cleanups (which Linus called out as demonstrating a “horrible
track record” of late breaking code churn). The 3.5 kernel includes new
support for Android-style “wake locks”, userspace probes (“Uprobes”), a
new system call level security filter (“seccomp” – “secure computing”),
metadata checksum support on ext4 filesystems similar to btrfs, and
support for Dan Magenheimer’s “frontswap”. Of the new features, the first
and last are particularly contentious, having taken many attempts to merge.
Frontswap allows Linux virtual machines to leverage the availability of
“transient” or “transcendent” memory – excess memory within a hypervisor that
is not directly addressable by guest Operating Systems (such as Linux) and
might disappear in the future, but is still faster than disk storage – to
store additional copies of data that will (also) be written to slow disks.
This allows a virtual machine to avoid many expensive reads from (virtual)
backing storage in the common case that the transcendent memory is not pulled
from under the virtual machine due to a more pressing need by the hypervisor.
The new Android-style “wake locks” are a means by which userspace applications
can signal that they need the system to stay awake. The kernel can then (on
certain systems) adopt a default aggressive global suspend policy in which the
system will try to sleep as much as possible, as is the case today in the
existing “suspend blockers” used by the Android kernel (an ongoing fork of the
official Linux kernel). Android’s “suspend blockers” weren’t loved or adopted
by Linux, but the adoption of something similar brings Linux and Android one
step closer to unification, something almost everyone thinks is a good idea.
The feature has some more aggressive options, such as the ability to default
to hibernating a system, but this has only very limited practical use.
Linux 3.5 finally gains support for “Uprobes”, something that has been
languishing for about 5 years without ever quite getting merged. At long
last, it will be possible to insert special “probes” into the memory for
userspace programs at runtime that will trigger whenever the kernel executes
a specific instruction, similar to a dynamic breakpoint. Unlike a breakpoint,
however, uprobes are associated with files and so affect every executable
instance of those files. This allows for some interesting use cases, in
particular around debugging. The perf events subsystem has also been
extended so that one can register for an event to trigger whenever a
particular piece of program code runs, something that could be of interest
both to developers and to system administrators and performance engineers.
With the release of Linux 3.5, the flood gates have opened for new features
to be added to the next kernel. This period of time, known as the merge
window, is th only point in the 3 month development cycle for a kernel
when Linus will take disruptive code changes into the kernel, and only then
after they have typically been in the “linux-next” test kernels for the
previous development cycle. New features merged in these early days have
included reworked CPU hotplugging support (with less overhead), lots of
changes to the DeviceTree code for ARM systems, and thousands of small
fixes, such as removal of (expensive) 64-bit division in the GFS2 driver
in favor of bitshifts after a recent public flaming over this by Linus.
In calling the merge window open, Linus added, “if you are a (probably
European) maintainer, and will be gone most of August, I’d rather you
just delay the whole thing until 3.7 rather than send me a merge request
for 3.6 and then effectively disappear for the next few weeks”.
The “0xB16B00B5” incident
Paolo Bonzini posted an email entitled “0xB16B00B5? Really?” in which he
pointed out that the Microsoft HyperV implementation within the Linux kernel
identified Linux guests using the magic number “0xB16B00B5”, which appears
to be a reference to female genitalia. Linux, like other Hypervisor-enabled
Operating Systems, can provide special features to guest Operating Systems
that run as virtual machines if they identify themselves as being such. In
a similar fashion, Microsoft’s Azure Cloud service can provide special
optimized features to guest Operating Systems when they provide an assigned
identifier that matches a known OS. It was this value that had been somewhat
unfortunately chosen by an anonymous engineer working within Microsoft.
The offending identification value is contained within a constant known as
“HV_LINUX_GUEST_ID_HI”, baked into all recent Linux kernels since support
for Microsoft’s HyperV hypervisor was integrated several releases ago (it
technically graduated from the unsupported “staging” kernel tree into the
officially supported codebase, but the code had been physically present
within the Linux kernel for a longer period). As such, it is a “magic”
value that is hard to change. Typically, such values are assigned by a
vendor (Microsoft) following a certain convention, and are rarely (if ever)
modified because dependencies upon that particular value are also baked
into other software (such as Microsoft’s Azure Cloud management software).
This was the concern that chief Microsoft Linux developer KY Srinivasan
expressed in responding to Paolo Bonzini’s message.
After a few days – and presumably some internal red faces at Microsoft – KY
came back with a message that the existing value “does not conform to the
MSFT guidelines on guest ID. MSFT currently does not specify Linux specific
guidelines. MSFT however has plans to publish Linux specific guidelines”.
Furthermore, he suggested that his corrected value did conform to the (as
yet unpublished) “guidelines”. One can’t help but wonder whether those
guidelines came into being very shortly after someone in senior management
heard about this indiscretion. While it is often seen as amusing to poke fun
at Microsoft, problems such as this one do arise from time to time. The
Linux community is (mostly) good at self restraint in the face of such
things, largely keeping the response on a professional level. This is
commendable, because at the end of the day we only gain by getting
Microsoft as a company more involved in Free and Open Source software.
Perhaps the most exciting development over the past month (at least to
this admitted ARM fanboy) is the posting of initial support for the new
“AArch64” architected processor state provided by the next generation
of 64-bit ARMv8 processors from ARM. As ARM moves into bigger systems,
and as cellphones and tablets become more complex, the need for a 64-bit
version of the architecture has become increasingly apparent. That new
architecture was announced at the end of last year, but it is only now
that code for Linux support has seen the light of day. Catalin Marinas
posted the initial “aarch64” patch series, which was cleanly designed
and implemented to live separately from the existing arch/arm code. The
only significant feedback to this point was around the name and the use
of a separate directory structure away from the existing ARM code, which
was eventually accepted by most due to the substantive differences from
the existing 32-bit ARM Architecture. The official name may be AArch64,
but many people will expect to see “arm64”, so Linux name may change.