Page 1 of 3

Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 2:00 am
by Bonfra
Since now I've debugged my OS with QEMU but, after discovering a weird bug in the boot process in a real computer, I started to test also in other VMs. I tried to use the gdb stub with VMware but I couldn't manage to make it work so I decided to test with VirtualBox and its integrated debugger. I've followed this page of the wiki to debug the bootloader code. So I've placed an infinite loop at the start of the code and stopped the execution from the integrated command-line. I've increased the value in RIP and I started single-stepping with the `p` command. The first part works grate and I achieve to set the rip value to the correct instruction but whenever I single-step I don't know what happens but it gets to a completely wrong address in memory and it hangs there.
Am I doing something wrong or is VirtualBox bugged?
Is there a way to debug properly with VirtualBox?
Is there a way to configure QEMU to behave exactly like VirtualBox so that I can use GDB?
These are my questions.

PS:
VirtualBox hangs in what I think is the same spot where it hangs in my testing hardware (judging from the printed text)

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 4:36 am
by iansjack
Rather than using an infinite loop, and then changing the ip (which is potentially error prone) would it not be better to set a breakpoint at the address that you want to start single-stepping from?

Potential causes of errors are an assumption that uninitialised RAM is zeroed and assumptions about the value of segment registers. In both cases, unless the items are specifically set they may differ between different real and virtual machines.

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 7:56 am
by Bonfra
iansjack wrote:Rather than using an infinite loop, and then changing the ip (which is potentially error prone) would it not be better to set a breakpoint at the address that you want to start single-stepping from?
Absolutely but even if the virtual machine does not boot up when the window opens and I can type `bp 0x7C00` when I hit play the breakpoint that I just set isn't triggered.
iansjack wrote: Potential causes of errors are an assumption that uninitialised RAM is zeroed and assumptions about the value of segment registers.
I'm pretty sure about the segment registers since they are all (except cs) set to zero by the MBR before jumping to the VBR (which is the one that causes the problem btw) but about the assumption of empty memory, I'm not certain... May I ask you to take a look at the code? this is the main asm file and this is the file that contains all the code for file loading with FAT16. I'm really tight with memory since I tried to squish everything in the 512-byte limit so it's a bit messy

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 8:37 am
by sj95126
I've found that Bochs turns out to be the most suited for debugging problems in my bootsect (like yours, goes from real mode to long mode). It's not the easiest to use, and has some bugs (like identifying a 64-bit TSS as 32-bit) but it has some handy tricks like magic breakpoint (xchg bx,bx) that help out a lot.

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 11:18 am
by Octocontrabass
Bonfra wrote:The first part works grate and I achieve to set the rip value to the correct instruction but whenever I single-step I don't know what happens but it gets to a completely wrong address in memory and it hangs there.
Is that address somewhere in the BIOS ROM? Does the instruction right before the jump have a 32-bit memory operand? QEMU doesn't emulate segment limits, so an instruction that works fine in QEMU could be causing an exception in VMware (and everywhere else).
Bonfra wrote:I'm pretty sure about the segment registers since they are all (except cs) set to zero by the MBR before jumping to the VBR
If you boot from USB, the firmware might completely skip the MBR and run the VBR directly.

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 12:45 pm
by feryno
sj95126 32-bit TSS in legacy mode as well 64-bit TSS in long mode have the same values (0x9=available, 0xB=busy)
so does the BOCHS report 64-bit TSS when you run a code in legacy mode or 32-bit TSS while running in long mode?

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 1:24 pm
by sj95126
feryno wrote:sj95126 32-bit TSS in legacy mode as well 64-bit TSS in long mode have the same values (0x9=available, 0xB=busy)
so does the BOCHS report 64-bit TSS when you run a code in legacy mode or 32-bit TSS while running in long mode?
It reports a 32-bit TSS when running in long mode. It's a known bug, they just haven't fixed it the last time I looked.

Code: Select all

<bochs:5> info gdt 1
Global Descriptor Table (base=0xffff800000007e20, limit=55):
GDT[0x0008]=Code segment, base=0x00000000, limit=0x00000000, Execute-Only, Non-C
onforming, Accessed, 64-bit
<bochs:6> info gdt 5
Global Descriptor Table (base=0xffff800000007e20, limit=55):
GDT[0x0028]=32-Bit TSS (Busy) at 0x01fec000, length 0x00067

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 3:31 pm
by Bonfra
sj95126 wrote:I've found that Bochs turns out to be the most suited for debugging problems in my bootsect [...] it has some handy tricks like magic breakpoint (xchg bx,bx) that help out a lot.
Sounds interesting, tomorrow I'll download it to make some tests. Thanks for the advice
Octocontrabass wrote: Is that address somewhere in the BIOS ROM?
Not really, it gets in an infinite loop around address f000:000017ae. I still do not use that chunk of memory during the execution of that code or, at least, it could be part of the stack since I placed the stack topo at 0x7C00.
Octocontrabass wrote: If you boot from USB, the firmware might completely skip the MBR and run the VBR directly.
I boot from HDD, anyway I placed a little print statement in the MBR so I'm sure it is executed
Octocontrabass wrote: Does the instruction right before the jump have a 32-bit memory operand?
If you mean the `jump $` instruction to trigger VirtualBox, it is the first instruction after the BPB, so I'll just post the bytes before the jump instruction

Code: Select all

eb58 906d 6b66 732e 6661 7400 0204 0400
0200 0200 00f8 4000 2000 4000 0000 0000
0000 0100 8000 2908 9997 9142 6f6e 734f
5320 2020 2020 4641 5431 3620 2020 0e1f
be5b 7cac 22c0 740b 56b4 0ebb 0700 cd10
5eeb f032 e4cd 16cd 19eb ebfe
ebfe being the `jmp $`

Re: Debugging the OS with VirtualBox

Posted: Sat Mar 20, 2021 6:36 pm
by Octocontrabass
Bonfra wrote:Not really, it gets in an infinite loop around address f000:000017ae.
That address is inside the BIOS ROM.
Bonfra wrote:If you mean the `jump $` instruction to trigger VirtualBox, it is the first instruction after the BPB, so I'll just post the bytes before the jump instruction
No, I mean the instruction that causes the jump into the BIOS ROM.

Re: Debugging the OS with VirtualBox

Posted: Sun Mar 21, 2021 2:00 am
by Bonfra
Octocontrabass wrote: No, I mean the instruction that causes the jump into the BIOS ROM.
It is the first instruction of a function:

Code: Select all

    cmp ax, -1
    je .done ; file not found

.loadFilePre:
    call PrepareFile

[...]

PrepareFile:
    ; get starting cluster
    mov ax, word [dword edi + 0x20000 + 0x001A]   ; retrive cluster from root entry <--- this is the line that, when executed, jumps to the BIOS ROM
This piece of code assumes that the root table is already loaded in memory (by this function)

Re: Debugging the OS with VirtualBox

Posted: Sun Mar 21, 2021 7:22 pm
by Octocontrabass
Bonfra wrote:

Code: Select all

mov ax, word [dword edi + 0x20000 + 0x001A]
Does EDI have a value between 0xFFFDFFE6 and 0xFFFEFFE4? If not, EDI + 0x20000 + 0x001A will be above the segment limit (which is 0xFFFF in real mode) and cause #GP. QEMU doesn't emulate segment limits, so it won't cause #GP in QEMU.

You'll need to either move the buffer below your segment limit or choose a segment base that puts the buffer within the limit.

Re: Debugging the OS with VirtualBox

Posted: Mon Mar 22, 2021 1:15 am
by Bonfra
Octocontrabass wrote: Does EDI have a value between 0xFFFDFFE6 and 0xFFFEFFE4? If not, EDI + 0x20000 + 0x001A will be above the segment limit (which is 0xFFFF in real mode) and cause #GP. QEMU doesn't emulate segment limits, so it won't cause #GP in QEMU.
EDI contains the location of the file in the root entry, so in this specific case is 0x20.
Octocontrabass wrote:You'll need to either move the buffer below your segment limit or choose a segment base that puts the buffer within the limit.
So if I move the buffer under 0x7C00, let's say I put it at 0x4600, it'll be ok?
Otherways to move the segment base which register should I change?

Re: Debugging the OS with VirtualBox

Posted: Mon Mar 22, 2021 9:35 am
by Octocontrabass
Bonfra wrote:So if I move the buffer under 0x7C00, let's say I put it at 0x4600, it'll be ok?
How big is the buffer? If there's enough room for it, it'll work.
Bonfra wrote:Otherways to move the segment base which register should I change?
DS, ES, FS, or GS.

Re: Debugging the OS with VirtualBox

Posted: Mon Mar 22, 2021 4:39 pm
by Bonfra
Octocontrabass wrote: How big is the buffer? If there's enough room for it, it'll work.
That memory area should store the root table of the fat16 so it can be really small or really big, when I planned a memory scheme for the first mib of memory I gave it a lot of space but theoretically if I avoid putting a lot of file in the root directory it can be pretty small. If possible I'd like to put leave it at 0x20000 but if tinkering with segments occupy too much space and I can't fit anymore the code in the bootsector I'll move it to a lower address
Octocontrabass wrote:
Bonfra wrote:Otherways to move the segment base which register should I change?
DS, ES, FS, or GS.
So I'm a bit retarded when I need to code in assembly... To translate the line to use segment addressing I should do something like [es:edi] or something like this... Can you give me a hint?

Re: Debugging the OS with VirtualBox

Posted: Mon Mar 22, 2021 5:39 pm
by Octocontrabass
Bonfra wrote:That memory area should store the root table of the fat16 so it can be really small or really big, when I planned a memory scheme for the first mib of memory I gave it a lot of space but theoretically if I avoid putting a lot of file in the root directory it can be pretty small. If possible I'd like to put leave it at 0x20000 but if tinkering with segments occupy too much space and I can't fit anymore the code in the bootsector I'll move it to a lower address
The root directory is almost always 0x4000 bytes for FAT16. The size is set when the volume is formatted and cannot be changed by adding or removing files, only by reformatting.
Bonfra wrote:Can you give me a hint?
Pick a segment register, set the segment register to 0x2000, then use that segment register when you access the buffer.

If the segment you choose is the default segment for the instruction that accesses the buffer, you don't need an override:

Code: Select all

mov ax, 0x2000
mov ds, ax
mov ax, [di + 0x1A] ; uses DS by default
However, if the segment you choose is not the default, you must use an override:

Code: Select all

mov ax, 0x2000
mov ds, ax
mov ax, [ds:bp + 0x1A] ; uses SS by default, override to use DS instead