How do you debug your kernel? [problems with GDB etc.]
Posted: Thu Oct 17, 2019 3:40 pm
Hi guys,
I'd like to open a discussion about the best ways for debugging our kernels and to share with you some problems I've hit with GDB.
So, first of all: printk(). As probably all the other kernels, mine too has a printk() implementation and I'm super happy with it, but:
Typical scenario: my i686 kernel is running on a 32-bit x86 QEMU VM launched with -s, which makes qemu to run a gdb-compatible debug server. In a console I run gdb and then I attach to the VM with:
I worked this way for years, and I was able to debug super-nasty bugs thanks to both QEMU and GDB. Do anybody debug their kernel this way?
Actually, no, I used to run a 64-bit VM and just stayed all the time in 32-bit protected mode, which is perfectly fine. Well, in some cases (UEFI) I used to boot in 64-bit mode and than switch back to 32-bit PM, in order to run my kernel. Anyway, Qemu used to tell gdb (via their protocol) the arch based on the current CPU mode and not based on the machine type. This way I was able to debug both 32-bit and 64-bit code but just having to manually change the arch in gdb. It was a bit hacky, but it worked. It was cool being able to debug my code both before and after a mode switch.
Then, at some point, I don't remember when exactly, the QEMU guys decided that such behavior was too hacky and removed that. Some people and I commented about that on this bug:
https://bugs.launchpad.net/qemu/+bug/1686170
Anyway, I was able to kind of work-around the problem by using only 32-bit VMs: this way I could debug my kernel, but certainly couldn't debug anymore the real-world case where my HW machine is 64-bit, but it runs in 32-bit protected mode. I survived, but I wasn't happy.
Now, recently I noticed that GDB started to behave weirdly in the same scenario it used to work perfectly (32-bit VM, gdb remote debugging via TCP), after a regular package update.
What happens is that GDB, running on my 64-bit host, tries somehow to extend the 32-bit pointers coming from QEMU and that messes up everything when using breakpoints (continue don't work anymore), while the dump of variables etc. works.
Therefore, I reverted the update on the gdb package and everything started to work again as before. Clearly, I filed a bug describing the problem:
https://bugs.launchpad.net/ubuntu/+sour ... ug/1846557
The bug almost certainly has been introduced by:
http://launchpadlibrarian.net/431301516 ... .1.diff.gz
But, apparently, very few people care about it, even if it prevents everybody working on any project in the 32-bit VM -> 64-bit gdb scenario to debug.
Now, my question is: does this issue affect any of you? If not, why? It might be a defect existing only in the debian/ubuntu gdb package, but if you're using those distros it should necessarily affect you. Does anybody know a work-around for this problem other than just keep using the older GDB version?
Also: given the previous "bug" due to a change in QEMU, I'm starting to believe that my scenario seems to be day by day somehow less supported and I'm concerned. Afterall the whole Linux kernel (major project) still supports i686 [recently it just dropped i486] and it's going to support i686 for many years in the future, without mentioning all the other 32-bit operating systems that exist. I mean, is anybody concerned about the risk for loosing any support for those scenarios?
Thanks,
Vlad
I'd like to open a discussion about the best ways for debugging our kernels and to share with you some problems I've hit with GDB.
So, first of all: printk(). As probably all the other kernels, mine too has a printk() implementation and I'm super happy with it, but:
- it cannot work when the console layer has a bug
- it certainly cannot allow me to step by instruction by instruction
- it certainly cannot offer me things like hardware watch-points
Typical scenario: my i686 kernel is running on a 32-bit x86 QEMU VM launched with -s, which makes qemu to run a gdb-compatible debug server. In a console I run gdb and then I attach to the VM with:
Code: Select all
target remote localhost:1234
Actually, no, I used to run a 64-bit VM and just stayed all the time in 32-bit protected mode, which is perfectly fine. Well, in some cases (UEFI) I used to boot in 64-bit mode and than switch back to 32-bit PM, in order to run my kernel. Anyway, Qemu used to tell gdb (via their protocol) the arch based on the current CPU mode and not based on the machine type. This way I was able to debug both 32-bit and 64-bit code but just having to manually change the arch in gdb. It was a bit hacky, but it worked. It was cool being able to debug my code both before and after a mode switch.
Then, at some point, I don't remember when exactly, the QEMU guys decided that such behavior was too hacky and removed that. Some people and I commented about that on this bug:
https://bugs.launchpad.net/qemu/+bug/1686170
Anyway, I was able to kind of work-around the problem by using only 32-bit VMs: this way I could debug my kernel, but certainly couldn't debug anymore the real-world case where my HW machine is 64-bit, but it runs in 32-bit protected mode. I survived, but I wasn't happy.
Now, recently I noticed that GDB started to behave weirdly in the same scenario it used to work perfectly (32-bit VM, gdb remote debugging via TCP), after a regular package update.
What happens is that GDB, running on my 64-bit host, tries somehow to extend the 32-bit pointers coming from QEMU and that messes up everything when using breakpoints (continue don't work anymore), while the dump of variables etc. works.
Therefore, I reverted the update on the gdb package and everything started to work again as before. Clearly, I filed a bug describing the problem:
https://bugs.launchpad.net/ubuntu/+sour ... ug/1846557
The bug almost certainly has been introduced by:
http://launchpadlibrarian.net/431301516 ... .1.diff.gz
But, apparently, very few people care about it, even if it prevents everybody working on any project in the 32-bit VM -> 64-bit gdb scenario to debug.
Now, my question is: does this issue affect any of you? If not, why? It might be a defect existing only in the debian/ubuntu gdb package, but if you're using those distros it should necessarily affect you. Does anybody know a work-around for this problem other than just keep using the older GDB version?
Also: given the previous "bug" due to a change in QEMU, I'm starting to believe that my scenario seems to be day by day somehow less supported and I'm concerned. Afterall the whole Linux kernel (major project) still supports i686 [recently it just dropped i486] and it's going to support i686 for many years in the future, without mentioning all the other 32-bit operating systems that exist. I mean, is anybody concerned about the risk for loosing any support for those scenarios?
Thanks,
Vlad