NickName:Light_handle Ask DateTime:2010-01-09T02:08:23

Debugging an Operating System

I was going through some general stuff about operating systems and struck on a question. How will a developer debug when developing an operating system i.e. debug the OS itself? What tools are available to debug for the OS developer?

Copyright Notice：Content Author:「Light_handle」，Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/2029639/debugging-an-operating-system

Answers

Max Caceres 2010-01-08T19:19:10

Remote debugging with kernel debuggers, which can also be done via virtualization.",

Andres Jaan Tack 2010-01-08T18:53:21

Debugging a kernel is hard, because you probably can't rely on the crashing machine to communicate what's going on. Furthermore, the codes which are wrong are probably in scary places like interrupt handlers.\n\nThere are four primary methods of debugging an operating system of which I'm aware:\n\n\nSanity checks, together with output to the screen.\n\nKernel panics on Linux (known as \"Oops\"es) are a great example of this. The Linux folks wrote a function that would print out what they could find out (including a stack trace) and then stop everything.\n\nEven warnings are useful. Linux has guards set up for situations where you might accidentally go to sleep in an interrupt handler. The mutex_lock function, for instance, will check (in might_sleep) whether you're in an unsafe context and print a stack trace if you are.\nDebuggers\n\nTraditionally, under debugging, everything a computer does is output over a serial line to a stable test machine. With the advent of virtual machines, you can now wire one VM's execution serial line to another program on the same physical machine, which is super convenient. Naturally, however, this requires that your operating system publish what it is doing and wait for a debugger connection. KGDB (Linux) and WinDBG (Windows) are some such in-OS debuggers. VMWare supports this story explicitly.\n\nMore recently the VM developers out there have figured out how to debug a kernel without either a serial line or kernel extensions. VMWare has implemented this in their recent stuff.\n\nThe problem with debugging in an operating system is (in my mind) related to the Uncertainty principle. Interrupts (where most of your hard errors are sure to be) are asynchronous, frequent and nondeterministic. If your bug relates to the overlapping of two interrupts in a particular way, you will not expose it with a debugger; the bug probably won't even happen. That said, it might, and then a debugger might be useful.\nDeterministic Replay\n\nWhen you get a bug that only seems to appear in production, you wish you could record what happened and replay it, like a security camera. Thanks to a professor I knew at Illinois, you can now do this in a VMWare virtual machine. VMWare and related folks describe it all better than I can, and they provide what looks like good documentation.\n\nDeterministic replay is brand new on the scene, so thus far I'm unaware of any particularly idiomatic uses. They say it should be particularly useful for security bugs, too.\nMoving everything to User Space.\n\nIn the end, things are still more brittle in the kernel, so there's a tremendous development advantage to following the Nucleus (or Microkernel) design, where you shave the kernel-mode components to their bare minimum. For everything else, you can use the myriad of user-space dev tools out there, and you'll be much happier. FUSE, a user-space filesystem extension, is the canonical example of this.\n\nI like this last idea, because it's like you wrote the program to be writeable. Cyclic, no?\n",

Seva Alekseyev 2010-01-08T18:12:38

In a bootstrap scenario (OS from scratch), you'd probably have to introduce remote debugging capabilities (memory dumping, logging, etc.) in the OS kernel early on, and use a separate machine. Or you could use a virtual machine/hypervisor.\n\nWindows CE has a component called KITL - Kernel Independent Transport Layer. I guess the title speaks for itslf.",

Draemon 2010-01-08T18:12:08

You can use a VM: eg. debug ring0 code with bochs/gdb\nor Debugging NetBSD kernel with qemu\n\nor a serial line with something like KDB.",

NotMe 2010-01-08T18:13:00

printf logging\nattach to process\nserious unit tests\netc..",