The kernel column with Jon Masters
Jon Masters explores the latest happenings in the Linux kernel community as the merge window for Linux 4.5 closes
Linus Torvalds announced the first few release candidate (RC) kernels for what will become Linux 4.5. In his announcement,
he noted that 4.5 was shaping up to be “a fairly normal release – neither unusually big or unusually small”. One element he specified in the 4.5-rc1 announcement was the tremendous work done over the past five years by the 32-bit ARM Linux community towards “multi-platform” kernel support that has culminated in 4.5. Linus giving the ARM community praise is a far cry from his outbursts just a few years ago about the state of platform support.
The 4.5 kernel includes many fixes, though not wholesale subsystem changes of the kind we have sometimes seen in previous cycles. Instead, there are incremental features, including support for the MADV_ FREE application memory address space “shrinker” interface flag option to the “madvise” system call. This new feature allows applications to register some of their virtual memory ranges as “volatile”. Such regions of memory may be arbitrarily reclaimed by the kernel when the system is running low on unused memory without breaking the application. Many of us have been looking to and discussing such a feature (long a part of Windows) for years, so it is good to see it finally land in Linux 4.5.
Other features that are promising in Linux 4.5 include support for direct memory access to persistent memory devices – the newer non-volatile memory technologies being advanced and promoted chiefly by Intel – and the next wave of efforts to restrict root level access from userspace applications to /dev/mem (system RAM). Such accesses, while requiring security privileges from the root user, can nonetheless be dangerous to system stability if not performed with great care.
The advent of large, direct-mapped “persistent memory” devices means that the “mistake surface” for erroneous accesses to /dev/mem by errant applications is significantly increased. To mitigate this, the latest set of patches will prevent access to IO memory regions via /dev/mem unless those regions are marked as “idle” (not associated with a device driver), making it not possible to accidentally write to persistent memory devices without unbinding the Linux persistent memory driver first.
As we go to press, Linus had unleashed “a valentine for everybody” in the form of several more RCs. He recommended that “in between romancing your significant other, go out and test”. If things remain on track, the final 4.5 kernel will come in mid-March, with the 4.6 merge window closing just in time for an Easter Sunday surprise.
The ARM Linux community has grown into one of the largest contingent sets of kernel developers, pumping out support for a wealth of innovative and exciting devices. At the same time, the advent of the 64-bit ARM architecture (part of ARMv8) has lead to many new opportunities for ARM outside of its traditional embedded scope.
Yet, for all the rise of 64-bit computing, there remain a great many 32-bit ARM devices on the market today, and many more are still to come. These ARMv6 and ARMv7 32-bit architectures (as well as, technically, the 32-bit AArch32 state of ARMv8) are supported by the kernel’s arch/arm directory (arch/arm64 contains the core for the newer, 64-bit architectural state). Early 32-bit ARM devices were generally embedded machines for which a dedicated kernel was compiled, complete with a static configuration and many built-in assumptions about the specific platform upon which that kernel would later run.
Over time, ARM devices became more complex and feature rich, and users sought to run mainstream Linux distributions on those devices. But distros had a problem: they are used to shipping one kernel for a given architecture, not one kernel for each different shipping system (and shipping configuration). The latter gives rise to many hundreds of possible kernel builds that could be used, a number that is far too unreasonable.
Combine this with a desire to make Linux platform configuration a runtime configurable option and you will see some of the reasons for the creation of the “DeviceTree” specification (a derivation of OpenFirmware with many additional bindings added that were not part of the original POWER/PowerPC specification), which leads to the “dts” and “dtb” files you may see on embedded ARM boards, including the RPi. A DeviceTree describes a 32-bit ARM system in a flexible markup language that the kernel interprets at boot time to determine how it should be configured. Yet a DeviceTree alone won’t guarantee the desired “one kernel to rule them all” single binary.
Enter the ARM multi-platform work. This effort, led by Arnd Bergmann et al, sought to clean up the ARM kernel code such that the many varied built-in assumptions around specific combinations of devices were removed. Instead of individual device drivers and core code assuming that the very fact they were built means that they should be used, the many thousands of changes over the past five years have led to robust, flexible kernel code that can determine at runtime whether it should bind to any specific devices or go unused on a given platform. It is a direct result of this work that you can now get single binary pre-built Linux distributions that support a wide range of ARMv6 and v7 devices.
Planning for the upcoming Linux Storage Filesystem and MM Summit (LSF/MM) to be held in Raleigh, North Carolina, USA, from April 18-19 is ongoing. A number of proposals have been made for session tracks via the Linux Kernel Mailing List. One proposal came from core VM developer Rik van Riel, who wanted to gauge potential interest in discussing Virtual Machine (VM) containers.
The growth in container technology (such as Docker, and the broader Open Container Initiative, or OCI) has piqued interest in combining containers with virtual machines to get the best of both worlds – the isolation between OS instances that comes from true virtualisation, and the speed, low overhead, and convenience of Linux application containers.
Lee Jones posted an updated patch building upon a longstanding series of conversations related to “critical clocks” in embedded systems. Such systems expose a lot of platform-specific information to the operating system, including information about all of the clock networks (pulses that drive the individual components on mobile System-on-Chip, or “SoC” processors) connected to devices such as IO controllers.
This information is included in the same DeviceTree structures mentioned under the multi-platform work above. A problem exists, however, on contemporary Linux systems. The kernel will generally attempt to save power by powering down the clocks that are connected to currently unused devices (known as clock gating). However, Linux doesn’t always know which of these
Above The Raspberry Pi uses DeviceTree to auto-configure HAT modules, among other things, clocks drive things that really cannot ever be shut down safely at runtime without crashing the system. The new patches add a CLK_IS_CRITICAL flag that tells Linux to leave certain clocks well alone.
Dave Airlie noted that problems persist with the handling of GPUs integrated into laptops (and other devices) with Windows 10. On these systems, the ACPI driven methods for powering off the GPU differ from those used in previous OS releases, with the net result that many users are having problems with their hardware failing to power down correctly. Work is ongoing to address this through changes to the kernel graphics code.
On the subject of devices, Linus Walleij noted that Alan Cox is no longer maintaining the “Linux Assigned Numbers Authority” (LANA) that provides unique numbers for character devices used by the Linux kernel. He instead sent a patch which updates the Linux kernel documentation to reflect that this is now a collective document maintained by the overall community.
Finally, Andrey Ponomarenko posted to let everyone know he is working on a “new database of computer hardware configurations running Linux”. He has collected just shy of 5,000 entries so far and may use these to produce a coordinated catalogue of support devices: http://linux-hardware.org.