How do you debug your kernel? [problems with GDB etc.]

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
vvaltchev
Member
Member
Posts: 274
Joined: Fri May 11, 2018 6:51 am

How do you debug your kernel? [problems with GDB etc.]

Post by vvaltchev »

Hi guys,
I'd like to open a discussion about the best ways for debugging our kernels and to share with you some problems I've hit with GDB.

So, first of all: printk(). As probably all the other kernels, mine too has a printk() implementation and I'm super happy with it, but:
  • it cannot work when the console layer has a bug
  • it certainly cannot allow me to step by instruction by instruction
  • it certainly cannot offer me things like hardware watch-points
Therefore, my main debugging tool is GDB.
Typical scenario: my i686 kernel is running on a 32-bit x86 QEMU VM launched with -s, which makes qemu to run a gdb-compatible debug server. In a console I run gdb and then I attach to the VM with:

Code: Select all

target remote localhost:1234
I worked this way for years, and I was able to debug super-nasty bugs thanks to both QEMU and GDB. Do anybody debug their kernel this way?

Actually, no, I used to run a 64-bit VM and just stayed all the time in 32-bit protected mode, which is perfectly fine. Well, in some cases (UEFI) I used to boot in 64-bit mode and than switch back to 32-bit PM, in order to run my kernel. Anyway, Qemu used to tell gdb (via their protocol) the arch based on the current CPU mode and not based on the machine type. This way I was able to debug both 32-bit and 64-bit code but just having to manually change the arch in gdb. It was a bit hacky, but it worked. It was cool being able to debug my code both before and after a mode switch.

Then, at some point, I don't remember when exactly, the QEMU guys decided that such behavior was too hacky and removed that. Some people and I commented about that on this bug:

https://bugs.launchpad.net/qemu/+bug/1686170

Anyway, I was able to kind of work-around the problem by using only 32-bit VMs: this way I could debug my kernel, but certainly couldn't debug anymore the real-world case where my HW machine is 64-bit, but it runs in 32-bit protected mode. I survived, but I wasn't happy.

Now, recently I noticed that GDB started to behave weirdly in the same scenario it used to work perfectly (32-bit VM, gdb remote debugging via TCP), after a regular package update.
What happens is that GDB, running on my 64-bit host, tries somehow to extend the 32-bit pointers coming from QEMU and that messes up everything when using breakpoints (continue don't work anymore), while the dump of variables etc. works.

Therefore, I reverted the update on the gdb package and everything started to work again as before. Clearly, I filed a bug describing the problem:

https://bugs.launchpad.net/ubuntu/+sour ... ug/1846557

The bug almost certainly has been introduced by:

http://launchpadlibrarian.net/431301516 ... .1.diff.gz

But, apparently, very few people care about it, even if it prevents everybody working on any project in the 32-bit VM -> 64-bit gdb scenario to debug.

Now, my question is: does this issue affect any of you? If not, why? It might be a defect existing only in the debian/ubuntu gdb package, but if you're using those distros it should necessarily affect you. Does anybody know a work-around for this problem other than just keep using the older GDB version?

Also: given the previous "bug" due to a change in QEMU, I'm starting to believe that my scenario seems to be day by day somehow less supported and I'm concerned. Afterall the whole Linux kernel (major project) still supports i686 [recently it just dropped i486] and it's going to support i686 for many years in the future, without mentioning all the other 32-bit operating systems that exist. I mean, is anybody concerned about the risk for loosing any support for those scenarios?

Thanks,
Vlad
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck
User avatar
eekee
Member
Member
Posts: 892
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: How do you debug your kernel? [problems with GDB etc.]

Post by eekee »

vvaltchev wrote:Then, at some point, I don't remember when exactly, the QEMU guys decided that such behavior was too hacky and removed that.
Qemu maintainers have made some odd decisions over the years. I keep considering sticking to an older version. I haven't done it yet because I'm not really OSdeving yet. I've just started planning, but my plans don't include long mode for a long time.
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
vvaltchev
Member
Member
Posts: 274
Joined: Fri May 11, 2018 6:51 am

Re: How do you debug your kernel? [problems with GDB etc.]

Post by vvaltchev »

eekee wrote: Qemu maintainers have made some odd decisions over the years.
Agree.

By the way, there are good news: the Ubuntu-gdb bug has been finally fixed, after a guy pointed out that the bug affected the debugging of any 32-bit program on 64-bit Ubuntu.

See:
https://bugs.launchpad.net/ubuntu/+sour ... ug/1848200
https://bugs.launchpad.net/ubuntu/+sour ... ug/1846557
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck
User avatar
Velko
Member
Member
Posts: 153
Joined: Fri Oct 03, 2008 4:13 am
Location: Ogre, Latvia, EU

Re: How do you debug your kernel? [problems with GDB etc.]

Post by Velko »

Another interesting technique is to implement GDB Stub in the kernel itself. You do not have to worry about Qemu's weirdness, but then again - it my not work if there is a bug in the stub itself :)

For me it took quite a while to get it right, as sample code from GDB did not even compile. I can count ~30 commits of refactoring and fixes in my repo.

But I can't say I'm using it much, as lately I'm debugging most of the code on Linux, before integrating it in kernel. It's a great architectural exercise - getting the code compile and work both in unit-tests and kernel.
If something looks overcomplicated, most likely it is.
vvaltchev
Member
Member
Posts: 274
Joined: Fri May 11, 2018 6:51 am

Re: How do you debug your kernel? [problems with GDB etc.]

Post by vvaltchev »

Velko wrote:Another interesting technique is to implement GDB Stub in the kernel itself. You do not have to worry about Qemu's weirdness, but then again - it my not work if there is a bug in the stub itself :)
I totally agree! But it's the only thing (except for special debugging hardware etc.) we can use for debugging our kernels on bare-metal. Fortunately, until now I've been able to find bugs with gdb+QEMU or on real hardware using stacktraces and, sometimes, ad-hoc code instrumentation.
Velko wrote: For me it took quite a while to get it right, as sample code from GDB did not even compile. I can count ~30 commits of refactoring and fixes in my repo.
I totally believe that. That's why I'm not sure I want to do that for Tilck any time soon.
Velko wrote: But I can't say I'm using it much, as lately I'm debugging most of the code on Linux, before integrating it in kernel. It's a great architectural exercise - getting the code compile and work both in unit-tests and kernel.
I'm doing often the same thing, when it's convenient. I love unit tests. But sometimes, you know, making the code testable both in the real kernel and in the "unit test environment" requires a lot of effort. For complicated algorithmic code I always do that, for the rest, not so much. But I have system tests for that as user mode programs on the OS + kernel self-tests.

You know one thing that helped me a lot? The decision I've made from the beginning to be compatible at binary level with i686 Linux. Sure, writing a lot of tests is great, but I cannot afford to have 100% line coverage. Also, while writing tests, I can make wrong assumptions and then make the kernel to behave that (wrong) way. There's always the risk of a bias. While, the decision to be compatible with Linux made by life better (but also terrible at the same time). I can test my kernel with code never meant to run on it, like the Busybox suite, the ASH shell, vi, and other applications specifically written and compiled for Linux i686.

I'm wondering: does anybody else do the same thing?

I mean, I have no intention to re-write Linux at all. Just, sharing a common subset of syscalls with it, makes total sense to me: no custom libc, no custom syscall interface, nothing. I use a pre-compiled libmusl gcc toolchain.
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck
nullplan
Member
Member
Posts: 1798
Joined: Wed Aug 30, 2017 8:24 am

Re: How do you debug your kernel? [problems with GDB etc.]

Post by nullplan »

For printing, I just use the serial line. Therefore, I can do with my main screen what I want and the line will still be free. A bare-bones, write-only polling serial line driver is maybe a dozen lines. Pretty much foolproof.

For debugging, I have pretty much sworn off source level debugging, as it requires you to not optimize your build, and that is where you often find the most interesting problems. The machines I work with have debugging features built in, so why not use them directly? If I am tracking a memory corruption, for instance, why not just use the debug registers to alert me of any attempt to write to the area? No need for GDB. Then I'll just write some debug info to serial line whenever the watchpoint hits. If need be, I can halt the system.

This requires a working interrupt system, of course, but that is one of the first things we have to add, anyway.
Carpe diem!
Post Reply