Strange behaviour in Bochs debugger / Library call [Solved]

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Strange behaviour in Bochs debugger / Library call [Solved]

Post by xenos »

I recently discovered some strange behaviourin Bochs debugger (64 bit) which has been bugging me for quite some time, and I still have no clue what's going on there.

Let me start with a module loaded by GRUB. At address 0x42, I have a function like this:

Code: Select all

FloppyDriver::FloppyDriver(char* cmd)
{
	TestFunc();
	while(1);
}
It calls some dummy function in the kernel. This function looks like this:

Code: Select all

extern "C"
{
	SECTION(".user") unsigned int TestFunc(void)
	{
		unsigned long ret;
		asm volatile ("cpuid" : "=a" (ret) : "0" (0));
		return(ret);
	}
}
and is mapped to 0xffff800000000000. This is just the beginning of user space functions in my kernel. The module is dynamically linked to the kernel. I wanted to check whether everything works fine, so I enable instruction tracing in Bochs debugger and let my code run from some point before the kernel call to some point after. I should mention the module is running in ring 3, and so is the library part of the kernel.

Here is a commented excerpt from a Bochs debugger (64 bit) session with instruction tracing enabled (yes, this is the complete sequence of instruction, as reported by Bochs, nothing removed):

Code: Select all

Here my function starts - so far, it looks fine. It calls the PLT entry for TestFunc().
(0).[151134490] [0x03fdf042] 002b:0000000000000042 (unk. ctxt): mov rax, 0x00000000000000b4 ; 48b8b400000000000000
(0).[151134491] [0x03fdf04c] 002b:000000000000004c (unk. ctxt): call rax                  ; ffd0

So, we're in the PLT. Time to look up the GOT entry for TestFunc() and call it. This should have been relocated properly.
(0).[151134492] [0x03fdf0b4] 002b:00000000000000b4 (unk. ctxt): jmp qword ptr ds:[rip+0x00000000000001c6] ; ff25c6010000

Here we go! Great, let's do some stuff...
(0).[151134493] [0x00104000] 002b:ffff800000000000 (unk. ctxt): xor eax, eax              ; 31c0
(0).[151134494] [0x00104002] 002b:ffff800000000002 (unk. ctxt): cpuid                     ; 0fa2

...and return.
(0).[151134495] [0x00104004] 002b:ffff800000000004 (unk. ctxt): ret                       ; c3

WTF? We should have returned to 0x4e, since that was pushed to the stack at the last call... Instead, we go 2(!!!) bytes forward?
(0).[151134496] [0x00104006] 002b:ffff800000000006 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134497] [0x00104008] 002b:ffff800000000008 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134498] [0x0010400a] 002b:ffff80000000000a (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134499] [0x0010400c] 002b:ffff80000000000c (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134500] [0x0010400e] 002b:ffff80000000000e (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134501] [0x00104010] 002b:ffff800000000010 (unk. ctxt): add byte ptr ds:[rax], al ; 0000

Even stranger, now we even jump 10 bytes in a 2 byte instruction.
(0).[151134502] [0x0010401a] 002b:ffff80000000001a (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134503] [0x00104024] 002b:ffff800000000024 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134504] [0x00104026] 002b:ffff800000000026 (unk. ctxt): add byte ptr ds:[rax], al ; 0000

Huh? How did we get here? There is no jump before, just zeroes. Instead of returning to 0x4e, we end up at 0x50? OK, let's go further.
(0).[151134505] [0x03fdf050] 002b:0000000000000050 (unk. ctxt): xor esi, esi              ; 31f6
(0).[151134506] [0x03fdf052] 002b:0000000000000052 (unk. ctxt): mov rdi, 0x0000000000001290 ; 48bf9012000000000000
(0).[151134507] [0x03fdf05c] 002b:000000000000005c (unk. ctxt): mov r11, 0x0000000000000030 ; 49bb3000000000000000
(0).[151134508] [0x03fdf066] 002b:0000000000000066 (unk. ctxt): jmp r11                   ; 41ffe3

So, this jump seems to work.
(0).[151134509] [0x03fdf030] 002b:0000000000000030 (unk. ctxt): mov rax, 0x0000000000000090 ; 48b89000000000000000
(0).[151134510] [0x03fdf03a] 002b:000000000000003a (unk. ctxt): sub rsp, 0x0000000000000008 ; 4883ec08
(0).[151134511] [0x03fdf03e] 002b:000000000000003e (unk. ctxt): xor esi, esi              ; 31f6
(0).[151134512] [0x03fdf040] 002b:0000000000000040 (unk. ctxt): call rax                  ; ffd0

OK, and another call... No problem here.
(0).[151134513] [0x03fdf090] 002b:0000000000000090 (unk. ctxt): mov qword ptr ds:[rdi], rsi ; 488937
(0).[151134514] [0x03fdf093] 002b:0000000000000093 (unk. ctxt): ret                       ; c3

Now the return works as expected. So we're back at 0x42, everything should repeat. Again we call some stub in the PLT.
(0).[151134515] [0x03fdf042] 002b:0000000000000042 (unk. ctxt): mov rax, 0x00000000000000b4 ; 48b8b400000000000000
(0).[151134516] [0x03fdf04c] 002b:000000000000004c (unk. ctxt): call rax                  ; ffd0

Wait! What happened to the jmp into the address stored in the GOT? This is just rubbish.
(0).[151134517] [0x03fdf0b4] 002b:00000000000000b4 (unk. ctxt): lahf                      ; 9f
(0).[151134518] [0x03fdf0b5] 002b:00000000000000b5 (unk. ctxt): and eax, 0x000001c6       ; 25c6010000
(0).[151134519] [0x03fdf0ba] 002b:00000000000000ba (unk. ctxt): push 0x0000000000000000   ; 6800000000
(0).[151134520] [0x03fdf0bf] 002b:00000000000000bf (unk. ctxt): jmp .+0xffffffffffffffe0 (0x00000000000000a4) ; e9e0ffffff

So... This is some other entry in the PLT...
(0).[151134521] [0x03fdf0a4] 002b:00000000000000a4 (unk. ctxt): push qword ptr ds:[rip+0x00000000000001c6] ; ff35c6010000
(0).[151134522] [0x03fdf0aa] 002b:00000000000000aa (unk. ctxt): jmp qword ptr ds:[rip+0x00000000000001c8] ; ff25c8010000

...which points nowhere.
(0).[151134523] [0x03fdf000] 002b:0000000000000000 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134524] [0x03fdf002] 002b:0000000000000002 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134525] [0x03fdf004] 002b:0000000000000004 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134526] [0x03fdf006] 002b:0000000000000006 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134527] [0x03fdf008] 002b:0000000000000008 (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134528] [0x03fdf00a] 002b:000000000000000a (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134529] [0x03fdf00c] 002b:000000000000000c (unk. ctxt): add byte ptr ds:[rax], al ; 0000
(0).[151134530] [0x03fdf00e] 002b:000000000000000e (unk. ctxt): add byte ptr ds:[rax], al ; 0000
Any ideas what might be going on here? This simply doesn't make sense to me, so I'm at wit's end.
Last edited by xenos on Sun Apr 19, 2009 2:13 am, edited 1 time in total.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Strange behaviour in Bochs debugger / Library call

Post by bewing »

Possibility 1: your stack pointer is screwed during that first call/ret. So the ret opcode is actually working, but the "address" being popped happens to be 6. -- This possibility seems unlikely, considering just how screwey the rest of your trace is going.

Possibility 2: your version of bochs is screwed. How old is it? It seems likely that the CPUID function model in your bochs is borking everything. Have you tried commenting out the cpuid inline asm call?

Do these operations happen in exactly the same way if you are singlestepping through the code, instead of using the instruction trace?
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by xenos »

bewing wrote:Possibility 1: your stack pointer is screwed during that first call/ret. So the ret opcode is actually working, but the "address" being popped happens to be 6. -- This possibility seems unlikely, considering just how screwey the rest of your trace is going.
That was my first idea. But how could this happen? There is no memory write between the call and ret, the RSP is unchanged, there is only one CPU, no DMA... Strange enough: Also the instruction at 0xb4 changes...
Possibility 2: your version of bochs is screwed. How old is it?
Bochs x86 Emulator 2.3.7, Build from CVS snapshot, on June 3, 2008 - I will compile a new one from CVS and try again as soon as I'm at home tonight. But if it's really a problem with Bochs, I wonder why this never happened before...

I also tested the code with QEMU, but I could not figure out how to singlestep through the code / print an instruction trace, since GDB support for mixed 32 / 64 bit code seems to be broken... Hopefully on Monday I'll be back at my AMD64 machine (I currently don't have one here) where I can test the code in SimNow!.
It seems likely that the CPUID function model in your bochs is borking everything. Have you tried commenting out the cpuid inline asm call?
Yes, but that doesn't change anything. (At first I tried rdtsc since I wanted to implement a simple system clock, then I tried cpuid, but even if the function is just a bare ret, this strange thing happens.) I don't think the cpuid is broken since it works fine when I use it for feature detection.
Do these operations happen in exactly the same way if you are singlestepping through the code, instead of using the instruction trace?
Yes, it's exactly the same.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Strange behaviour in Bochs debugger / Library call

Post by bewing »

(screwed stack pointer) That was my first idea. But how could this happen? There is no memory write between the call and ret, the RSP is unchanged, there is only one CPU, no DMA... Strange enough: Also the instruction at 0xb4 changes...
Well, the point would be if RSP is pointing to an invalid memory location BEFORE the call happens.
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by xenos »

Good point. I just checked that - RSP points right to the end of the stack, exactly where it should be. The stack is mapped to a valid memory locaton. The call pushes the correct return address (0x4e) to the stack and it is still there when the ret comes. And RSP still points to the right location on the stack. It just doesn't return.

I downloaded and compiled the newest Bochs from CVS - it doesn't make things better. The behaviour is exactly the same up to the point when it hits the PLT entry at 0xb4 the second time. From then, it just repeats the same loop again and again, showing random instructions at 0xb4, that always cause a jump to 0xffff800000000000.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by xenos »

I just tried the same code with AMD SimNow! - everything works as expected, the library function returns properly. It also works fine in QEMU - the library function gets called and returns as it should. (I figured out how to enable instruction tracing.) The strange behaviour only occurs in Bochs...
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by stlw »

XenOS wrote:I just tried the same code with AMD SimNow! - everything works as expected, the library function returns properly. It also works fine in QEMU - the library function gets called and returns as it should. (I figured out how to enable instruction tracing.) The strange behaviour only occurs in Bochs...
Could you send me some test example, I wanna try and see if Bochs if ok ?

Thanks,
Stanislav
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by xenos »

Here are some files showing the strange behaviour. I just tested them with the new Bochs 2.4 Pre1, but the same strange things happen...

http://manuelhohmann.dyndns.org/nos.tar.gz
  • bootdisk.img: The floppy image to boot.
  • bochsrc.txt, bochsrc_mp.txt: Bochsrc files for single CPU / multiple CPU.
  • debugcmd.txt: Debug commands I used.
  • trace.txt: Debugger output I got.
To reproduce this behaviour, run Bochs debugger with the single CPU Bochsrc file and boot from the floppy image. Set a breakpoint at linear address 0x42. Start the simulation. When GRUB shows up, choose the "NOS x86_84" kernel. The simulation will stop at the breakpoint. Finally, single step through the following function calls.

Please let me know if you have any questions regarding my code or if you find any errors.

Thanks for testing!
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by stlw »

XenOS wrote:Here are some files showing the strange behaviour. I just tested them with the new Bochs 2.4 Pre1, but the same strange things happen...

http://manuelhohmann.dyndns.org/nos.tar.gz
  • bootdisk.img: The floppy image to boot.
  • bochsrc.txt, bochsrc_mp.txt: Bochsrc files for single CPU / multiple CPU.
  • debugcmd.txt: Debug commands I used.
  • trace.txt: Debugger output I got.
To reproduce this behaviour, run Bochs debugger with the single CPU Bochsrc file and boot from the floppy image. Set a breakpoint at linear address 0x42. Start the simulation. When GRUB shows up, choose the "NOS x86_84" kernel. The simulation will stop at the breakpoint. Finally, single step through the following function calls.

Please let me know if you have any questions regarding my code or if you find any errors.

Thanks for testing!
Cool. This is exactly the way I dream to see bug reports !
Yes, this is Bochs bug related to the fact you jumping from 0xb4 to other side of the universe to 0xffff800000000000.
The 'xor eax, eax' is already fetched incorrectly and ret of course too.
Will be fixed before 2.4 final.

Thanks,
Stanislav
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by xenos »

You're welcome! I'm already looking forward to the final release :)
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by stlw »

XenOS wrote:You're welcome! I'm already looking forward to the final release :)
You could start playing with CVS :)

Stanislav
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Strange behaviour in Bochs debugger / Library call

Post by xenos »

Great, now it works, thanks! :D

Code: Select all

(0).[211170914] [0x03fdf042] 002b:0000000000000042 (unk. ctxt): mov rax, 0x00000000000000b4 ; 48b8b400000000000000
(0).[211170915] [0x03fdf04c] 002b:000000000000004c (unk. ctxt): call rax                  ; ffd0
(0).[211170916] [0x03fdf0b4] 002b:00000000000000b4 (unk. ctxt): jmp qword ptr ds:[rip+0x00000000000001c6] ; ff25c6010000
(0).[211170917] [0x00104000] 002b:ffff800000000000 (unk. ctxt): xor eax, eax              ; 31c0
(0).[211170918] [0x00104002] 002b:ffff800000000002 (unk. ctxt): ret                       ; c3
(0).[211170919] [0x03fdf04e] 002b:000000000000004e (unk. ctxt): jmp .+0xfffffffffffffffe (0x000000000000004e) ; ebfe
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
Post Reply