The Kernel Column with Jon Masters – Linux Kernel 3.7
Jon Masters summarises the latest goings-on in the Linux kernel community, including a look at the features being merged for the upcoming 3.7 release
Linus Torvalds announced the release of the 3.6 kernel, saying that while the release did not contain earth-shattering new architectures or file systems, it did overall represent “solid progress”. We summarised some of the new features that landed in Linux 3.6 last issue. With the release of 3.6 came the traditional opening of the merge window for 3.7. This is the period of time during which Linus is willing to pull potentially disruptive patches (changes) into the kernel. This typically lasts for two weeks and is followed by a period of stabilisation, and multiple RC (release candidate) kernels are made available for testing. Linus gave a heads-up that he would be travelling for much of the merge window, but that didn’t seem to pose much of a problem.
Features pulled in during the merge window included a brand new architecture (AArch64, also known as ARMv8 or ‘arm64’ in the kernel community). This is the latest architecture revision from ARM, the company that powers about 90 per cent of all cellphones and has had its designs shipped in billions of processors so far this year alone. ARM has traditionally been an ‘embedded’ architecture. The billions of ARM-powered processors in use worldwide are typically found within gadgets, such as this author’s ‘fitbit’ personal step counter, or in washing machines and automotive control and entertainment systems. In this context, there are many different levels of ARM processor, from the more deeply embedded simpler cores without the ability to run a full OS, to higher- end multiprocessor cores running Linux on Android smartphones.
ARM is known for its focus on low energy, as well as the licensed nature of the architecture. ARM doesn’t make processors – it licenses its designs for use by the many others who do make processors. Linux has run on suitable 32-bit ARM-based systems for well over a decade, and in recent years has gained popularity as the foundation upon which most Android devices are built. And over the last few years, organisations such as Linaro have helped to drive the development of Linux support for ARM by bringing together a wider community of companies and ecosystem players involved. Over the past few years, a new opportunity has emerged to take advantage of the low-energy DNA that drives ARM by using these processors in server-class systems.
Servers can be 32-bit based, but many workloads require 64-bit support. That’s where the new AArch64 ARM architecture comes in. It brings many new features to ARM, not least of which is 64-bit addressing. The new support within the Linux kernel, contained within arch/arm64 (renamed after community debate around the original ‘aarch64’ choice of directory) enables the core architecture features but does not yet have support for any real processors. Those will come later. The initial support was merged after several months of review on the Linux Kernel Mailing List by upstream maintainers such as Arnd Bergmann, who is responsible for many of the de facto standards required of new architecture code added to Linux.
Another feature pulled into Linux 3.7 is support for the ‘supervisor mode access prevention’ on Intel processors. This aims to prevent kernel code from being able to compromise the running system, even if it is exploited by passing bad values in from user space (such as were done by various example ‘NULL’ pointer kernel exploits several years ago). By toggling a special bit in the CPU control registers, the kernel can effectively instruct the hardware to prevent the kernel from accessing user space (regular user process) memory except under explicit control. Therefore various classes of exploit are removed because even though the kernel has the power to disable the protection again, a simple pointer access to user space cannot simultaneously disable the SMAP protection, so exploit code has no straightforward way to use such simple attacks.
One final feature of particular note in 3.7 is the removal of udev from the critical path of loading some system firmware. The kernel’s built-in firmware loader will now always attempt to load firmware files directly from user space (from the file system) without invoking udev. Udev (the user-space device management daemon) typically handles firmware loading, as well as device driver requests, and new hardware device detection, by receiving messages from the kernel over a special netlink socket and reacting according to various customisable rules. Unfortunately, recent changes to udev to restructure its approach to parallelised loading of drivers frustrated Linus into having the kernel handle this itself by default. Udev can still handle firmware loading, but the kernel will first attempt to load files itself, from /lib/firmware.
Alignment faults in 3.6
Linux supports many different architectures, some of which behave quite differently from the x86 Linus originally used way back in 1991. In particular, many modern RISC architectures embrace the notion of simple being better by having limited support for ‘misaligned’ memory accesses. On these architectures (such as ARM), it is not possible to directly perform an operation on a memory location. Instead, the architecture behaves in a ‘load store’ fashion such that all memory locations must be loaded into a register, then manipulated, then the result stored back.
Alignment is a natural property of all data types. A 4-byte integer value, the default on many systems, has a natural alignment on a 4-byte memory boundary. So, for example, attempting to load or store such a value at an odd-numbered memory address would be in clear violation of the natural alignment requirement of this type. Many modern architectures hide such alignment issues by having the hardware perform expensive multi- load operations behind the scenes under such circumstances, while others will generate an alignment fault and insist that the programmer (or the compiler) do the right thing and fix the underlying code. ARM is one such architecture that started life with very strict requirements, and has relaxed more recently.
Modern ARM systems do include (limited) support for unaligned data access. Though they are more expensive (in terms of performance overhead), they are handled behind the scenes. There are some circumstances under which this is not possible due to specific instructions being used. In such cases, the hardware will generate an ‘alignment fault’, which will be handled by the kernel. The kernel typically performs a more expensive version of the intended load or store ‘transparently’, optionally recording a warning about the inefficient waste of processor resources. During this operation, it may make a call to the kernel’s schedule() algorithm to give another process time to run. Unfortunately, there are some situations wherein the scheduler must not be called. These include certain critical ‘atomic’ parts of the kernel itself. In the case of Linux 3.6, it appeared as if this required was being violated, with warnings of ‘scheduling while atomic’ being emitted.
It ultimately turned out that certain device drivers were exposing a problem in the alignment handler. By accessing misaligned IP header fragments, the driver concerned was triggering an alignment exception within an atomic-critical section of kernel code, which was then resulting in the scheduler being called from within the alignment handler. Although the driver was later fixed to improve performance (by using only aligned data), the problem with the alignment handler itself did require fixing to prevent unwanted system crashes. A patch has been successfully tested and will be merged.
Finally this month, there has been an ongoing discussion around ext4 file system corruption that can occur under very specific circumstances involving a system crash during an update to an ext4 file system running with journal checksums turned on. This is not the default, and it is a rare situation, but all users are advised to update their systems.