The kernel column #95 by Jon Masters
Jon Masters talks about features in the 2.6.37 Linux kernel and describes debugging a kernel problem using the Git bisection feature…
We’re now free of the Big Kernel Lock (in many configurations – it’s a config option that will hide other not-yet-compatible options when used) and I have been running systems BKL-free for some time now. Arnd Bergman and others have done an excellent job to rid us of this last vestige of truly ancient non-scalable Linux and unless you need a V4L (Video-4-Linux – TV tuner, webcam etc) device, you can probably run BKL-free today too. It is hoped that V4L will be fixed soon, maybe in time for 2.6.37. You probably won’t notice a huge performance benefit of running without the BKL unless you happen to have something more high end than a desktop, but it’s still pretty cool to know that you could get higher performance if only you could afford to have a system with dozens of CPUs to take advantage of it.
Scalability is great on the high end, but another more impressive feature for those working with more down-to-earth systems is (at last) near-complete support for running as a Xen ‘Dom0’ (or host kernel) under the Xen hypervisor. For years, the support for Xen host kernels lived in patches separate from the mainline kernel and had to be added separately. This constrained which Xen kernels could be used and made life more difficult for those using it in their virtualisation setups. It’s not (yet) possible to fully run an official Linus kernel without any patches as a Dom0 host kernel, but the remaining extra driver pieces and other work should be complete in time for 2.6.38. This incidentally prompted some folks in the Fedora kernel community to wonder about scheduling. They would like this to land in Fedora 15 (for those who want to use Xen instead of hardware-based KVM virtualisation) but are unwilling to accept large patches for things not yet in the official kernel (especially given historical experiences with having to maintain large patches for Xen). Only time can tell what will happen there.
Virtualisation also came up in the context of 3D graphics this month. Specifically, the idea is being floated that it might be time to implement a fully virtualised GPU (graphics processor) that provides a hybrid of OpenGL and DirectX to virtual machines, safely translating their operations on it into operations on the host GPU. Right now, there are some options for virtual machines to do 3D graphics, but they can involve more direct use of the host GPU, and this is not always safe against abuse, since graphics chips were never designed with such use cases in mind. Most of the current options involve hacks like giving each virtual machine its own GL context and trying hard not to allow guests to interfere with each other. But until the problem is solved nicely, various interim solutions such as virtio-gl do make sharing the host GPU a little easier to pull off in some kind of a clean fashion.
Google’s Android ‘suspend blockers’ code came up in discussion several times this month. These are a feature unique to Android wherein the Android kernel will attempt to suspend aggressively (in order to save as much power on the mobile embedded devices it has been designed primarily for use within) and driver code must explicitly request not to be suspended when necessary – for example, when in the middle of a phone call. The existing heavyweight Google solution was not liked by upstream developers, but several new alternative suggestions have started to bubble up and we might yet see something happen wherein a palatable solution can be found for enough developers to bring a generic Linux solution Android can also use in the future. There was more fun from Google in the way of nifty kernel features – look for coverage in my write-up of the 2010 Linux Plumbers Conference, also in this issue.
Finally, it looks as if those of us in embedded Linux space are finally headed in a similar direction to those in the enterprise Linux space by declaring certain releases of the upstream kernel to be ‘long term supported’ (LTS) for use in various projects (and products). Sony, Google, and the MeeGo and Linaro projects have gotten behind an effort spearheaded by Tim Bird, who is a Sony employee and former head of the Consumer Electronics Linux Forum (now part of the Linux Foundation) and who terms such supported versions to be ‘embedded flag’ releases. 2.6.35 is the first such kernel release.
Debugging regressions in the Intel graphics drivers
As one who has an interest in kernel development, I recently acquired a netbook intended solely to run the latest upstream kernels and distributions. The Eee PC model 1015PEM features a slightly more exciting revision to the Intel Atom processor (the Atom 470), Intel graphics and the SSD disk I shoehorned into the ‘non-user serviceable’ system after much yanking with screwdrivers. It runs Fedora Rawhide, the unstable work-in-progress distribution, and whatever kernel Linus (Torvalds) has just finished shoving out the door. In spite of all that apparently instability, it really doesn’t break horribly all that often.
Recently, I installed the 2.6.37-rc1 kernel on the netbook, only to find that the display was shifted about an inch (2.5cm) down, such that the top had a blank bar and the bottom was cut off. The display is a TFT panel attached to an Intel i915 within the Atom processor over a LVDS (low-voltage differential signalling) link. Modern displays built this century all implement the EDID specification, which is a protocol used by graphics chipsets like the i915 to read display geometry and colour information from a TFT panel or other screen device. EDID (one day maybe replaced by DisplayID) traditionally used several spare lines on a VGA connector to do I2C signalling and carried information like the ‘dotclock’ (maximum scan rate of a CRT) and manufacturer detail.
These days, we don’t worry about magnets and cathode rays, or the maximum speed at which we can drive them without them catching fire. Instead, we now deal in pixels and the maximum rate at which we can update them. But the display still needs to convey its mode (pixel resolution etc) capabilities to the graphics chipset, which needs to inform the driver, which needs to inform the kernel of the possible valid choices for mode to set. This was clearly not working correctly in my case, as reported by running the xrandr and other EDID parsing tools on my system. I knew there was a bug in the driver (i915) or a related part of the kernel GPU subsystem – as evidenced by the visual – but I didn’t have any idea where to look for this. Fortunately, Linux kernel engineers have a secret weapon in the fight against such
Far from being merely the kernel source code version control utility, Git can also be used to do many more advanced things, like tracking down bugs through binary bisection of code, repeating until a bad patch is found. In a bisection, Git will take a last known good kernel and produce a source tree that represents the halfway point between good and bad. Depending upon the results of a subsequent test build and boot, Git will then pick another halfway point, repeating until a single bad source code commit (patch) is found. This does require many kernel compiles/test boots, but it’s a good way to track down where a problem was introduced, and compile times are now fast enough as to make the entire exercise of building 10+ kernels doable in under an hour of effort. This is especially true when using a config made with ‘make localmodconfig’ to only build actually needed drivers.
Bisection really requires only three commands:
[sourcecode language=”bash”]# git bisect start
# git bisect good
# git bisect bad[/sourcecode]
The first starts a bisection, the second marks the current point as good, and the third indicates a ‘bad’ kernel. A ‘git bisect reset’ resets back to the master or head of the Git tree from whence you started bisecting. Using this process, it became apparent that the bug I was seeing had been added in some cleanups to the i915 driver cacheing logic (which attempts to cache EDID data read from fixed panels like my netbook display that should not change). Unfortunately, that cleanup was incomplete, but with the aid of Git and a few resulting emails to the maintainer, a fix was soon in place. I really recommend you try out Git bisection sometime. Steven Rostedt even posted a script recently, called ktest.pl, that can be used to automate the process of running the build and boot tests, making bisection even easier to do than it already was, helping you nail down the next annoying bug.