The kernel column #88
Last month Linus Torvalds released the final 2.6.34 kernel, following a bumpy few weeks that saw a major virtual memory (the subsystem responsible for memory management in-kernel) glitch, the usual round of regressions, and a power outage that knocked vger.kernel.org
Last month Linus Torvalds released the final 2.6.34 kernel, following a bumpy few weeks that saw a major virtual memory (the subsystem responsible for memory management in-kernel) glitch, the usual round of regressions, and a power outage that knocked vger.kernel.org – the LKML list server – offline for a couple of days. The latest kernel includes the two new file systems mentioned the previous month: Ceph (a distributed file system) and LogFS (used especially in embedded devices, potentially including future mobile phones running Google’s Android); support for asynchronous suspend and resume (also mentioned previously); and faster virtualised networking in KVM and other VMs (virtual machines) thanks to the great work that has gone into vhost net support. You can see a very detailed breakdown of new features on the 2.6.34 kernel on the kernelnewbies.org website.
With the release of 2.6.34 came the opening of the merge window for 2.6.35, during which time we can expect a lot of very interesting patches, which will have have gone through the daily linux-next testing development source tree already. The new features in 2.6.34 and beyond are of course interesting, and there will be more to say about these in future issues, but perhaps the most important thing to come out of this most recent development cycle was a tremendous feeling of satisfaction from many core kernel developers that a particularly nasty VM (virtual memory) glitch was tracked down and fixed as quickly as it was. Linus himself was likely particularly happy to take a break from maintaining the tree to do a lot of very heavy thought and reasoning about VM behaviour.
The problem reports first began to show up a matter of weeks ago, in particular from Borislav Petkov, who would find that every time he resumed his laptop from a suspend-to-disk operation, there would be a nasty kernel crash. He didn’t have a lot to go on at first, and it took a few days of instrumenting kernel code with diagnostic messages, and rebooting over and over to figure out roughly what was happening, but not the why of the matter. The what requires that you understand anonymous VMAs (virtual memory areas). VMAs are the kernel’s means to represent regions of allocated memory in tasks (known as processes to users) and they are what you see in /proc/pid/maps (where pid is any normal user process ID for some running application). The anonymous part refers to VMAs that are not representing a memory mapped file – created with a call to a kernel feature such as mmap – but are instead representing pure ‘anonymous’ memory used by the running process for some data structure, stack, etc.
What was happening during suspend on Borislav’s laptop was that the tracking structures – known as anon_vma_chains – that link related anon_vma structs within the kernel were becoming corrupted such that they were linked in the wrong order. Usually, anon_vma_chains should be wired together from the oldest parent task to the youngest child so that a child process that terminates can release its link in the chain. Instead, they were wired backwards during the resume operation, such that when a child process terminated, the anon_vma its parent was referencing would cease to exist and silent corruption of kernel memory would ensue. The reasons for this corruption stemmed from the new introduction of anon_vma_chains into recent kernels, but the problem case here had been overlooked by everyone involved (virtual memory code is tricky).
After a few days of rolling up his sleeves and getting dirty, Linus was able to track down the problem, give a reason, and thank Borislav as well as the many others in the very long discussion on the LKML for their assistance. While we’re on the subject of virtual memory, the issue of kernel stack size raised its ugly head again last month. The Linux kernel uses a fixed size stack for its own purposes – with a separate stack for interrupt handling on some systems – because automatically growing the kernel stack (as would happen with regular user applications without them even noticing) from within the kernel can be very tricky (especially in low-memory situations). This isn’t usually a problem, but today’s complex uses of the kernel often see very deeply nested unction calls within the kernel – especially when using layered file systems on top of LVM, iSCSI and so forth – and these can push the limits of the fixed size stack, especially if the system is otherwise under what is known as memory pressure as it runs low on available RAM and needs to enter a phase of ‘direct reclaim’ during one of these nested function calls. The usual suggestions have been made to grow the kernel stack, and others (including Mel Gorman) have been working on patches to reduce kernel stack use in general, but it seems that this discussion is not yet over.
Five years of Git
Christian Ludwig noted that it’s now been five years since Linus Torvalds – frustrated by the fallout from use of the proprietary BitKeeper software – wrote the guts of the Git distributed revision control system in the space of about a week. A lot more effort has gone in since then, and Junio C Hamano (as well as many others) continue to do an excellent job further developing and maintaining Linus’s original invention. Git (now at version 1.7.1) is used by a vast number of different open source projects, and tools like gitweb, github and others make collaboration between developers easier than ever before. Christian Ludwig has made a fun video showing Git kernel history, here.
Jon Masters is a Linux kernel hacker who has been working on Linux for almost 14 years, since he first attended university at the age of 13. Jon lives in Cambridge, Massachusetts, and works for a large enterprise Linux vendor. He publishes a daily Linux kernel mailing list summary at kernelpodcast.org.
Click here to see what else featured in issue 88 of Linux User & Developer magazine…