[Solved] QEMU and GDB: "Can't compute CFA for this frame."

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Mikumiku747
Member
Member
Posts: 64
Joined: Thu Apr 16, 2015 7:37 am

[Solved] QEMU and GDB: "Can't compute CFA for this frame."

Post by Mikumiku747 »

Hello,

Up until recently, I've been using the Bochs emulator to test my OS, but I got sick of not having a proper debugger, so I finally decided to switch to QEMU and GDB. I managed to get the debugger working mostly fine, I can set breakpoints and everything seems to run normally, except for when I actually try to debug code. I run into the problem of GDB being unable to "compute CFA for this frame" whenever it tries to compute the values of any local variables (ones which are stored on the stack, right?). I'll admit I have a weak-ish understanding of the internals of the debugger, but from what I can tell, it's telling me that it can't figure out the layout of the stack and whereabouts my local variables should be on the stack. At least, that's what I think it is, but again, not 100% sure, debuggers aren't really second nature for me yet.

In any case, this problem is stopping me from debugging my code easily, and for now I have to resort back to my old hacky method (which was to put values like 0xDEADBEEF into values I wanted to monitor, break before a the local variable gets it's value changed, and find my marker value's location in a dump of the stack memory. Ugly and time consuming, I know). I have a feeling it has something to do with the Stack base pointer register (I think it's usually called the EBP register on 32 bit x86 machines?), which helps maintain a stack "frame" of sorts. I know it's important for debuggers and traceback functions, but no matter where I pause execution, it always seems to be strange values like 0 or garbage values (nowhere near any code or stack memory, I thought it was supposed to be pointing to a value on the stack normally). It seems that for some reason, the C compiler doesn't seem to be properly maintaining the stack frame, which is what the debuggers/tracebacks use to keep track of which function the CPU is currently in and what the local variables are (and also what GDB uses to calculate offsets in the stack for local variables, it seems).

So, is there a way to make sure the C compiler actually uses the EBP register as intended? (That is, pushing the old EBP on the stack and pushing the stack pointer into the EBP register, or something like that, I'm not much of an expert on the internal workings of C function preludes either.) I'm using a normal GCC cross compiler exactly like in the tutorial on the Bare Bones page, and my bootstrap ASM code is very similar to Bare Bones too in terms of how it calls my kernel's main function. So I thought that it would behave fairly normally and actually use the EBP register like it would normally do, but I guess that's something that's turned off in a cross compiler.

For completeness and reference, here's my build options and build lines (I've also been passing the -g option to the makefile through the command line while I've been setting up GDB and QEMU by running [CFLAGS="-std=c99 -Wall -Wextra -O2 -g" make ] ):

Code: Select all

# Tool info
TOOL-PREFIX=i686-elf-
AS=nasm
ASFLAGS?=-f elf32
LD=$(TOOL-PREFIX)ld
LDFLAGS?= -ffreestanding -lgcc
CC=$(TOOL-PREFIX)gcc
CFLAGS?=-std=c99 -Wall -Wextra -O2 
CFLAGS:=$(CFLAGS) -ffreestanding

# Pattern recipie to build core kernel files
$(OBJDIR)/%.o: $(CDIR)/%.c $(OBJDIR)
	$(CC) $(CFLAGS) -c $< -o $@
$(OBJDIR)/%.o: $(ASMDIR)/%.asm $(OBJDIR)
	$(AS) $(ASFLAGS) $< -o $@

# Kernel compile and link recipie
$(BINDIR)/kernel.bin: $(KERNEL_DEPENDS) $(SRCDIR)/linker.ld $(BINDIR)
	$(CC) $(LDFLAGS) -T $(SRCDIR)/linker.ld -o $(BINDIR)/kernel.bin -nostdlib -lgcc $(KERNEL_DEPENDS)

Just to make your life easier, here's an example line from the compile and link of the kernel, so you don't have to evaluate that makefile stuff yourself:

Code: Select all

# Kernel object file compile
i686-elf-gcc -std=c99 -Wall -Wextra -O2  -ffreestanding -Iinclude/kernel -c src/kernel/c/kernel.c -o obj/kernel.o

# Kernel Linking
i686-elf-gcc -ffreestanding -lgcc -T src/kernel/linker.ld -o bin/kernel.bin -nostdlib -lgcc  obj/mem.o  obj/input.o  ...Loads of object files...  obj/keyboard_asm.o  obj/io.o  obj/bootstrap.o  obj/gdt.o
Finally, here's a little paste of what I get in the debugger when I try to use it (At this point I've definitely loaded the file, that's how I got it to break at a line number in the first place. This should give you an idea of what's actually going wrong (because I know that I'm pretty bad at explaining things to people.) Ignore the stuff about paging, it's not enabled yet, I'm still in just 32-bit protected mode, paging off:

Code: Select all

Breakpoint 1, initialise_kernel_paging (kernel_start=<error reading variable: can't compute CFA for this frame>, 
    kernel_end=<error reading variable: can't compute CFA for this frame>) at src/kernel/c/paging.c:30
30	{
Please respond if you think you know anything that might help, I'd really like to take full advantage of GDB for debugging, since it really is just so much better than debug print statements and memory dumps. If it seems like a common or easy to stumble on problem, and it's not just me doing something wrong, I'll probably add a little note about how to fix this on the Wiki page for QEMU (in the section about debugging with GDB). The wiki's been a big help, especially for common pitfalls, and I'd love to help give back to it as well. :)

- Mikumiku747
Last edited by Mikumiku747 on Thu Jul 14, 2016 7:51 am, edited 1 time in total.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by linuxyne »

The commit shows that the error message you get is likely from a version of gdb which throws just the plain 'can't compute' error.

The commit fixed the function by removing the conditions about dwarf2, and then adding a new condition about the validity of the stack.

Assuming that these lines are the ones which print the error message on your console, and I did not miss any places where the error is printed/thrown, it seems that you might be hitting the dwarf2 condition (the one which is no longer present in the code).

If necessary, I will try to setup a repro, and then update to the latest gdb sources to see if anything changes. If the problem is with the other conditions, the enhanced error messages, about memory, registers or frame-base, will help differentiate between them.


Edit0: The downloadable sources at gdb-7.11.tar.xz contain a .pot file. The only 'can't compute CFA' error messages that are present in the .pot file are these two below. It does not contain the vanilla dwarf2 'can't compute' error.

Code: Select all

#: dwarf2-frame.c:1517
msgid ""
"can't compute CFA for this frame: required registers or memory are "
"unavailable"
msgstr ""

#: dwarf2-frame.c:1522
msgid "can't compute CFA for this frame: frame base not available"
msgstr ""
There's also the one below, which is the exact message format in the error output, though it does seem that the driver for this one is dwarf2-frame.c shown above.

Code: Select all

#: cp-valprint.c:326 guile/scm-pretty-print.c:640 mi/mi-cmd-stack.c:562
#: python/py-framefilter.c:473 python/py-prettyprint.c:295 stack.c:293
#, possible-c-format
msgid "<error reading variable: %s>"
msgstr ""
The earlier conclusion still stands - the cause could be the now-removed dwarf2 condition.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by Combuster »

You do pass -O2 to the compiler, telling it to optimise. If possible values should be held preferably in registers because that's faster. Especially for small functions everything is pulled off the stack into registers, never to be put back again.

Try to communicate that behaviour to the debugger - it's difficult.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by linuxyne »

Combuster wrote:You do pass -O2 to the compiler, telling it to optimise. If possible values should be held preferably in registers because that's faster. Especially for small functions everything is pulled off the stack into registers, never to be put back again.

Try to communicate that behaviour to the debugger - it's difficult.
In this case, should the debugger not show at least the incorrect values for the parameters, probably read from the stack by believing that the parameters are on the stack when they are not?

On the other hand, if gdb thinks that the parameters are passed inside registers instead of on the stack, it can simply show the contents of the registers mapped (incorrectly) to individual parameters. Here too it will be showing incorrect values.

In both these cases, gdb has no reason to simply error out.

Also, there's calling conventions to adhere to, and gcc, despite optimizations, should not be violating the calling conventions which are in effect either because of the default settings or because the programmer specifically asked for a particular convention.
User avatar
Mikumiku747
Member
Member
Posts: 64
Joined: Thu Apr 16, 2015 7:37 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by Mikumiku747 »

I'll try a different version of gdb in a couple hours just to be sure, but I'm certain that it doesn't have something to do with optimization, since the compiler understands when something gets optimized away, and correctly displays "optimized out" for those variables. But at the very least, I'd think it wouldn't be optimizing away the arguments of a function, and also, it should be adhering to calling convention.

That is, unless I need to use a special version of gdb to target i386-elf-none machines. The thought never even occurred to me, do I need a specially compiled version of gdb or do I need to tell it what target triplet the remote machine is using? I load in the elf file for my kernel when running gdb, does that give it enough info?

In any case, like combuster said, if it's doing those kinds of optimizations, is there even a way to tell the debugger about it? Is there some option that the normal cross compiler uses which makes it ignore calling convention. Gcc has about a million options, I wouldn't be surprised if there's a way to get it back to normal, or at least, to something predictable.

Also later on, I'll actually disassemble a few of the functions in question and see if they bother to use the base stack pointer register or not. I can't right now, I'm away from my build machine.

Thanks everyone for the responses, I'll be sure to let you know how it goes soon.
User avatar
Mikumiku747
Member
Member
Posts: 64
Joined: Thu Apr 16, 2015 7:37 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by Mikumiku747 »

Ok, progress report, I've tried compiling the latest version of GDB, but this time, specifying the target as "i686-elf" ("i686-elf-none" didn't work, bfd didn't support it, and I suspect the other components of gdb wouldn't either). It was about time I got a proper cross-debugger anyway, since it's probably a good idea to have the debugger know which architecture it's supposed to be working on. Here's proof I'm actually using a good gdb now:

Code: Select all

(gdb) show configuration
This GDB was configured as follows:
   configure --host=i686-pc-linux-gnu --target=i686-elf
             --with-auto-load-dir=$debugdir:$datadir/auto-load
             --with-auto-load-safe-path=$debugdir:$datadir/auto-load
             --with-expat
             --with-gdb-datadir=/home/daniel/OSdev/cross/share/gdb (relocatable)
             --with-jit-reader-dir=/home/daniel/OSdev/cross/lib/gdb (relocatable)
             --without-libunwind-ia64
             --without-lzma
             --without-guile
             --with-separate-debug-dir=/home/daniel/OSdev/cross/lib/debug (relocatable)
             --without-babeltrace

("Relocatable" means the directory can be moved with the GDB installation
tree, and GDB will still find it.)
However, now the problem is even more mysterious, so perhaps something much worse is going wrong in the debugger than I suspected. And, same as always, EBP seems to be zero:

Code: Select all

(gdb) break initialise_kernel_paging
Breakpoint 1 at 0x101970: file src/kernel/c/paging.c, line 30.
(gdb) c
Continuing.

Breakpoint 1, initialise_kernel_paging (kernel_start=<unavailable>, kernel_end=<unavailable>) at src/kernel/c/paging.c:30
30	{
(gdb) p kernel_start
$1 = <unavailable>
(gdb) p $ebp
$2 = (void *) 0x0
(gdb) p $esp
$3 = (void *) 0x10f678
So yeah, the good news is I have a proper cross debugger, the bad news is that now the error is even more mysterious, although I suspect it's the same error, just with a different message, because I'm using a much newer gdb than the system (The system is 7.7.1, I'm just using debian linux, and my cross debugger is 7.11).

I also disassembled the function, to see if there was anything at all that used the ebp, and sure enough...

Code: Select all

   0x00101970 <+0>:	sub    $0x10,%esp
=> 0x00101973 <+3>:	xor    %eax,%eax
   0x00101975 <+5>:	mov    0x18(%esp),%ecx
   0x00101979 <+9>:	lea    0x0(%esi,%eiz,1),%esi
   0x00101980 <+16>:	movl   $0x1a,0x109000(,%eax,4)
   0x0010198b <+27>:	movl   $0x1a,0x10a000(,%eax,4)
   0x00101996 <+38>:	add    $0x1,%eax
   0x00101999 <+41>:	cmp    $0x400,%eax

..Nothing. That first instruction is what allocates space for local variables, but nothing at all in there about EBP, it seems to only reference values on the stack relative to the stack pointer.

So yeah, still waiting to see if there's any way to get it to use the regular calling convention.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by linuxyne »

To me, it seems that the new error is more benign than the 'can't compute cfa' error.

The function probably got compiled with frame pointer omission, and gdb is unable to determine the parameters, although IMHO it should be able to handle the situation if enough debug info has been emitted by the compiler.

Edit0: There also should be a way to keep the optimizations and selectively disable the frame pointer omission for debug purposes. i.e. -Oxxx implies the blanket optimizations which may also imply FPO, while another switch selectively disables FPO.
User avatar
Mikumiku747
Member
Member
Posts: 64
Joined: Thu Apr 16, 2015 7:37 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by Mikumiku747 »

Thanks to linuxyne, I've discovered that the source of the problem is indeed frame pointer omission, which is being enabled by optimizations. I've temporarily disabled optimizations and it now works perfectly. However, this is a temporary solution, and it would be nice to have optimizations. Apparently, according to the gcc manual, "-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging." However, it's become obvious that on this architecture, it does interfere with debugging, but is enabled anyway. It does mention in the description of -fomit-frame-pointer that it might break debugging, so that's what's happening here.

So now, the final question is, how to have optimizations enabled, but also disable frame pointer omission. Again, according to the manual, "The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag." But I'm not really sure what a machine description macro is, so I have no idea how to turn that on. Does anybody know how to do that? It looks like something you configure when preparing to compile GCC, but I looked at the gcc manual page relating to machine descriptions and it's all too heavy for me. Could somebody explain where I need to define this macro so that gcc understands that my cross compile target needs base pointers?
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by Octocontrabass »

Mikumiku747 wrote:So now, the final question is, how to have optimizations enabled, but also disable frame pointer omission.
-fno-omit-frame-pointer
User avatar
Mikumiku747
Member
Member
Posts: 64
Joined: Thu Apr 16, 2015 7:37 am

Re: QEMU and GDB: "Can't compute CFA for this frame."

Post by Mikumiku747 »

Oh... That's actually pretty intuitive, maybe I should have tried that before I posted. In any case, that's it, thanks for the help everyone. I'll add a little note on the QEMU wiki page about it, so hopefully other people don't end up making the same mistake as me.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: [Solved] QEMU and GDB: "Can't compute CFA for this frame

Post by linuxyne »

I hope it is okay to add another solution, for the original vanilla 'can't compute cfa' error, to this solved problem. One was to update the gdb to the latest, as that error path about dwarf2 is no longer present. The second one was just found, and is below.


This page suggests forcing the compiler to emit dwarf2 instead of the newer default formats. It can be reasoned that the older formats might be lacking in some respects, but should at least allow for the basic debugging operations to work.

They link this patch to force dwarf2 debug format.
quadrant
Member
Member
Posts: 74
Joined: Tue Apr 24, 2018 9:46 pm

Re: [Solved] QEMU and GDB: "Can't compute CFA for this frame

Post by quadrant »

=D> Had the same problem, and that patch saved my sanity!
Post Reply