General Protection Fault in VirtualBox Only
-
- Member
- Posts: 77
- Joined: Tue Nov 20, 2012 4:45 pm
- Contact:
General Protection Fault in VirtualBox Only
Hi,
I am getting a GPF when running in VirtualBox. It happens basically immediately (i.e. after there's a GDT and IDT, it faults). I can check this by enabling/disabling interrupts at various times and hanging.
However, the problem does not happen at all in Bochs. In fact, as far as I can tell, Bochs isn't reporting any problem of any kind.
What could be causing this?
Thanks,
I am getting a GPF when running in VirtualBox. It happens basically immediately (i.e. after there's a GDT and IDT, it faults). I can check this by enabling/disabling interrupts at various times and hanging.
However, the problem does not happen at all in Bochs. In fact, as far as I can tell, Bochs isn't reporting any problem of any kind.
What could be causing this?
Thanks,
Last edited by Geometrian on Tue Dec 04, 2012 5:36 pm, edited 1 time in total.
- JackScott
- Member
- Posts: 1032
- Joined: Thu Dec 21, 2006 3:03 am
- Location: Hobart, Australia
- Mastodon: https://aus.social/@jackscottau
- GitHub: https://github.com/JackScottAU
- Contact:
Re: General Protection Fault in VirtualBox Only
If you could provide us with links to your source code, a disk image we could run, and (if possible, not sure if VirtualBox does this) a file of debugging output. That will make things much easier all around.
- xenos
- Member
- Posts: 1121
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Re: General Protection Fault in VirtualBox Only
VirtualBox has only very limited debugging possibilities - have a look at http://www.virtualbox.org/manual/ch12.html for some help.
Maybe you could explain in more detail how you figured out when exactly the GPF happens? And as JackScott already suggested, provide some code, at least the part the the GPF seems to happen.
Maybe you could explain in more detail how you figured out when exactly the GPF happens? And as JackScott already suggested, provide some code, at least the part the the GPF seems to happen.
-
- Member
- Posts: 77
- Joined: Tue Nov 20, 2012 4:45 pm
- Contact:
Re: General Protection Fault in VirtualBox Only
Hi,
My OS is now hosted on Google Code: http://code.google.com/p/ianmallett-moss/source/checkout. There's a lot of configuration scripts that are pretty specific to my machine, but the source is current and the prebuilt build/disk_img.bin exhibits the problem in VirtualBox, but not in Bochs. Most of the comments are fairly current, or what's happening is obvious in context.
I had tried to track the problem down by enabling and disabling interrupts. In the most extreme test, as soon as I entered the kernel, I disabled interrupts immediately, set up the GDT and IDT, then reenabled them; the kernel immediately jumps to the designated ISR, printing a nice debug message on the screen telling me it's a GPF. So, no, I wasn't able to pinpoint the problem--as soon as the IDT existed, it was being used.
I don't know what exactly happens when no IDT is available (the CPU is using the IVT instead? --which does nothing? --maybe?) so the problem might be happening previously.
Thanks!
My OS is now hosted on Google Code: http://code.google.com/p/ianmallett-moss/source/checkout. There's a lot of configuration scripts that are pretty specific to my machine, but the source is current and the prebuilt build/disk_img.bin exhibits the problem in VirtualBox, but not in Bochs. Most of the comments are fairly current, or what's happening is obvious in context.
I had tried to track the problem down by enabling and disabling interrupts. In the most extreme test, as soon as I entered the kernel, I disabled interrupts immediately, set up the GDT and IDT, then reenabled them; the kernel immediately jumps to the designated ISR, printing a nice debug message on the screen telling me it's a GPF. So, no, I wasn't able to pinpoint the problem--as soon as the IDT existed, it was being used.
I don't know what exactly happens when no IDT is available (the CPU is using the IVT instead? --which does nothing? --maybe?) so the problem might be happening previously.
Thanks!
Last edited by Geometrian on Wed Jan 31, 2024 5:35 pm, edited 1 time in total.
- xenos
- Member
- Posts: 1121
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Re: General Protection Fault in VirtualBox Only
If you have a working GPF handler, you can use it to figure out the reason for the GPF, i.e, you can check the faulting instruction and the error code to see whether it was an internal / external event, access violation etc.
-
- Member
- Posts: 77
- Joined: Tue Nov 20, 2012 4:45 pm
- Contact:
Re: General Protection Fault in VirtualBox Only
The error code is 5126 (decimal). Where can I find what that means?XenOS wrote:If you have a working GPF handler, you can use it to figure out the reason for the GPF, i.e, you can check the faulting instruction and the error code to see whether it was an internal / external event, access violation etc.
- xenos
- Member
- Posts: 1121
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Re: General Protection Fault in VirtualBox Only
The Intel docs have a chapter devoted to interrupts - and explain the error code pretty well.Geometrian wrote:Where can I find what that means?
- Kazinsal
- Member
- Posts: 559
- Joined: Wed Jul 13, 2011 7:38 pm
- Libera.chat IRC: Kazinsal
- Location: Vancouver
- Contact:
Re: General Protection Fault in VirtualBox Only
GPF error codes are the selector index in which the fault occurred plus some extra information.Geometrian wrote:The error code is 5126 (decimal). Where can I find what that means?
-
- Member
- Posts: 77
- Joined: Tue Nov 20, 2012 4:45 pm
- Contact:
Re: General Protection Fault in VirtualBox Only
Hmmm, okay, so the error code is 1526 -> 0x1406 -> 0001010000000 11 0Blacklight wrote:GPF error codes are the selector index in which the fault occurred plus some extra information.
-Internal exception (i.e. from the OS)
-Selector Index references a descriptor in the IDT
-Selector index is 0x0280 -> 640, though I think it might actually be LSB first: 0x0041-> 65
Re: General Protection Fault in VirtualBox Only
Hi,
Basically, the error code you got is impossible - either you got the wrong value from the stack, or the code you used to display it is buggy.
Cheers,
Brendan
Erm, no. It's "little endian", which means the bytes on the stack would've been 0x06, 0x14, 0x00, 0x00; but it's still the 32-bit value 0x00001406, and you can't just swap one group of 4-bits with another in the hope it might make more sense.Geometrian wrote:Hmmm, okay, so the error code is 1526 -> 0x1406 -> 0001010000000 11 0Blacklight wrote:GPF error codes are the selector index in which the fault occurred plus some extra information.
-Internal exception (i.e. from the OS)
-Selector Index references a descriptor in the IDT
-Selector index is 0x0280 -> 640, though I think it might actually be LSB first: 0x0041-> 65
Basically, the error code you got is impossible - either you got the wrong value from the stack, or the code you used to display it is buggy.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 77
- Joined: Tue Nov 20, 2012 4:45 pm
- Contact:
Re: General Protection Fault in VirtualBox Only
I checked the display code with some hardcoded values--it works perfectly, so it's likely the former.Brendan wrote:Basically, the error code you got is impossible - either you got the wrong value from the stack, or the code you used to display it is buggy.
The ISRs' code is taken from an example:
Code: Select all
;Stub for an ISR which does NOT pass its own error code (adds a dummy errcode byte)
%macro ISR_NOERRCODE 1
[GLOBAL isr%1]
isr%1:
cli ;Disable interrupts
push byte 0 ;Push a dummy error code
push byte %1 ;Push the interrupt number
jmp isr_common
%endmacro
;Stub for an ISR which passes its own error code
%macro ISR_ERRCODE 1
[GLOBAL isr%1]
isr%1:
cli ;Disable interrupts
push byte %1 ;Push the interrupt number
jmp isr_common
%endmacro
ISR_NOERRCODE 0
ISR_NOERRCODE 1
ISR_NOERRCODE 2
ISR_NOERRCODE 3
ISR_NOERRCODE 4
ISR_NOERRCODE 5
ISR_NOERRCODE 6
ISR_NOERRCODE 7
ISR_ERRCODE 8
ISR_NOERRCODE 9
ISR_ERRCODE 10
ISR_ERRCODE 11
ISR_ERRCODE 12
ISR_ERRCODE 13
ISR_ERRCODE 14
ISR_NOERRCODE 15
ISR_NOERRCODE 16
ISR_NOERRCODE 17
ISR_NOERRCODE 18
ISR_NOERRCODE 19
ISR_NOERRCODE 20
ISR_NOERRCODE 21
ISR_NOERRCODE 22
ISR_NOERRCODE 23
ISR_NOERRCODE 24
ISR_NOERRCODE 25
ISR_NOERRCODE 26
ISR_NOERRCODE 27
ISR_NOERRCODE 28
ISR_NOERRCODE 29
ISR_NOERRCODE 30
ISR_NOERRCODE 31
;This saves the processor state, sets up for kernel mode segments, calls the C-level fault handler, and finally restores the stack frame.
isr_common:
pusha ;Pushes edi,esi,ebp,esp,ebx,edx,ecx,eax
mov ax, ds ;Lower 16-bits of eax = ds.
push eax ;save the data segment descriptor
mov ax, 0x10 ;Load the kernel data segment descriptor
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
call isr_handler
;jmp $
pop ebx ;Reload the original data segment descriptor
mov ds, bx
mov es, bx
mov fs, bx
mov gs, bx
popa ;Pops edi,esi,ebp...
add esp, 8 ;Cleans up the pushed error code and pushed ISR number
sti
iret ;Pops 5 things at once: CS, EIP, EFLAGS, SS, and ESP
Code: Select all
typedef struct registers {
uint32 ds; //Data segment selector
uint32 edi, esi, ebp, esp, ebx, edx, ecx, eax; //Pushed by pusha
uint32 int_no, err_code; //Interrupt number and error code (if applicable)
uint32 eip, cs, eflags, useresp, ss; //Pushed by the processor automatically
} registers_t;
extern "C" void isr_handler(registers_t regs) {
CONSOLE::Console::draw(5,5,"received interrupt: ");
CONSOLE::Console::draw(5,6,get_interrupt_description(regs.int_no));
CONSOLE::Console::draw(5,7,regs.err_code);
CONSOLE::Console::draw(5,8,5126u); //just a test to demonstrate the console (it works)
CONSOLE::Console::draw(5,9,"DONE");
}
Re: General Protection Fault in VirtualBox Only
Hi,
You're going to need to learn how to debug.
Start by getting hold of a decent emulator with a good debugger (e.g. Bochs with its inbuilt debugger enabled); and put a breakpoint or something (even just "jmp $") as the first instruction in those macros that create the interrupt stubs. Once you've done that the OS will stop when any interrupt occurs, and you can use the debugger to examine the raw data that the CPU put on the interrupt handler's stack (before any of your normal interrupt handling has a chance to mess any of it up).
This alone should tell you exactly what causes the first interrupt/exception.
Next; disable all optimisation and see if the problem goes away (hopefully it won't). Then you want to disassemble the code (especially for the "isr_handler()" function) to use as a reference; and single-step through the code one instruction at a time (from the start of the interrupt stub until you get all the way back to the IRET) with the debugger while checking *everything* contains what you think it should at each step. In general, the more complex a high level language and compiler is the more likely it is that it's doing something you wouldn't have expected.
Alternatively
Consider the following code:
Is this code brilliant, or a retarded joke? Can you think of a way to "optimise out" the entirely stupid "switch(number) { ... }" and the entire "bork()" function?
Now think about what your stubs and "common" interrupt handler will actually need to do once you start having different behaviour for different exceptions handlers. For example, the debugging exception and breakpoint exception will eventually send something to a debugger (e.g. GDB); the general protection fault handler (and probably a few others) might send signals to the process that caused the problem; the page fault handler is going to have a whole pile of virtual memory management stuff banged into it (things like "copy on write", swap space support, memory mapped files, etc); the invalid opcode exception handler is going to have a lot of code to determine what the instruction was and emulate it (so that code designed for a more recent CPU still runs on older CPUs). The NMI, double fault and machine check exceptions are going to need some very special handling. The device not available exception might contain support for "delayed FPU/MMX/SSE state save" logic. Then you're probably going to have radically different interrupt handlers for things like the kernel API, and spurious IRQs, and the scheduler's timer, and maybe IPIs (Inter-Processor Interrupts sent from other CPUs), and maybe performance monitoring, and maybe thermal status.
Also note that some of those interrupt stubs will want to use "trap gates", and some will want "interrupt gates" (and some might want "task gates"). The interrupt stub for the page fault handler should save CR2 as soon as possible (in case a second page fault occurs and trashes the first page fault's CR2). When you start looking at adding support for debuggers you'll realise that half of them (those corresponding to "fault class" exceptions) will want to clear the RF flag to avoid issues. So...
Is the idea of having a "common interrupt handler" any less idiotic than the "bork()" function in my example above?
There will be no common code in your "common interrupt handler", except for maybe some kernel panic code that is nowhere near the critical path. Unless you're only writing a tutorial (and therefore don't have a reason to care if the code is sane or not as long as it helps explain things); there's no point having a "common interrupt handler" for anything other than actual IRQs. Instead, just have a "kernel panic" function that anything (including exception handlers) can call; and keep all of the very different exception handlers (and other interrupts) separate. Note: if your kernel detects something "impossible" (for a simple example, maybe it's an attempt to release a re-entrancy lock that hasn't been acquired) then it could call the "kernel panic" function even though no exception (or interrupt) was involved at all.
As a general rule of thumb; it's a waste of time fixing code if that code needs to be redesigned/rewritten anyway. Is your exception handling code worth fixing?
Cheers,
Brendan
You're going to need to learn how to debug.
Start by getting hold of a decent emulator with a good debugger (e.g. Bochs with its inbuilt debugger enabled); and put a breakpoint or something (even just "jmp $") as the first instruction in those macros that create the interrupt stubs. Once you've done that the OS will stop when any interrupt occurs, and you can use the debugger to examine the raw data that the CPU put on the interrupt handler's stack (before any of your normal interrupt handling has a chance to mess any of it up).
This alone should tell you exactly what causes the first interrupt/exception.
Next; disable all optimisation and see if the problem goes away (hopefully it won't). Then you want to disassemble the code (especially for the "isr_handler()" function) to use as a reference; and single-step through the code one instruction at a time (from the start of the interrupt stub until you get all the way back to the IRET) with the debugger while checking *everything* contains what you think it should at each step. In general, the more complex a high level language and compiler is the more likely it is that it's doing something you wouldn't have expected.
Alternatively
Consider the following code:
Code: Select all
void foo1(void) {
bork(1);
}
void foo2(void) {
bork(2);
}
void foo3(void) {
bork(3);
}
void bork(int number) {
switch(number) {
case 1:
bar1();
break;
case 2:
bar2();
break;
case 3:
bar3();
break;
}
}
void bar1(void) {
// Special code specifically designed to handle the first case
}
voida bar2(void) {
// Special code specifically designed to handle the second case
}
void bar3(void) {
// Special code specifically designed to handle the third case
}
Now think about what your stubs and "common" interrupt handler will actually need to do once you start having different behaviour for different exceptions handlers. For example, the debugging exception and breakpoint exception will eventually send something to a debugger (e.g. GDB); the general protection fault handler (and probably a few others) might send signals to the process that caused the problem; the page fault handler is going to have a whole pile of virtual memory management stuff banged into it (things like "copy on write", swap space support, memory mapped files, etc); the invalid opcode exception handler is going to have a lot of code to determine what the instruction was and emulate it (so that code designed for a more recent CPU still runs on older CPUs). The NMI, double fault and machine check exceptions are going to need some very special handling. The device not available exception might contain support for "delayed FPU/MMX/SSE state save" logic. Then you're probably going to have radically different interrupt handlers for things like the kernel API, and spurious IRQs, and the scheduler's timer, and maybe IPIs (Inter-Processor Interrupts sent from other CPUs), and maybe performance monitoring, and maybe thermal status.
Also note that some of those interrupt stubs will want to use "trap gates", and some will want "interrupt gates" (and some might want "task gates"). The interrupt stub for the page fault handler should save CR2 as soon as possible (in case a second page fault occurs and trashes the first page fault's CR2). When you start looking at adding support for debuggers you'll realise that half of them (those corresponding to "fault class" exceptions) will want to clear the RF flag to avoid issues. So...
Is the idea of having a "common interrupt handler" any less idiotic than the "bork()" function in my example above?
There will be no common code in your "common interrupt handler", except for maybe some kernel panic code that is nowhere near the critical path. Unless you're only writing a tutorial (and therefore don't have a reason to care if the code is sane or not as long as it helps explain things); there's no point having a "common interrupt handler" for anything other than actual IRQs. Instead, just have a "kernel panic" function that anything (including exception handlers) can call; and keep all of the very different exception handlers (and other interrupts) separate. Note: if your kernel detects something "impossible" (for a simple example, maybe it's an attempt to release a re-entrancy lock that hasn't been acquired) then it could call the "kernel panic" function even though no exception (or interrupt) was involved at all.
As a general rule of thumb; it's a waste of time fixing code if that code needs to be redesigned/rewritten anyway. Is your exception handling code worth fixing?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 510
- Joined: Wed Mar 09, 2011 3:55 am
Re: General Protection Fault in VirtualBox Only
The OP has already stated that the problem doesn't show up in Bochs. The problem only occurs in Virtual Box (see the thread title), so he's limited to the facilities available there in diagnosing it.Brendan wrote:Hi,
You're going to need to learn how to debug.
Start by getting hold of a decent emulator with a good debugger (e.g. Bochs with its inbuilt debugger enabled);
Re: General Protection Fault in VirtualBox Only
Hi,
Cheers,
Brendan
VirtualBox has it's own inbuilt debugger, and also lets you attach GDB to it.linguofreak wrote:The OP has already stated that the problem doesn't show up in Bochs. The problem only occurs in Virtual Box (see the thread title), so he's limited to the facilities available there in diagnosing it.Brendan wrote:Start by getting hold of a decent emulator with a good debugger (e.g. Bochs with its inbuilt debugger enabled);
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 77
- Joined: Tue Nov 20, 2012 4:45 pm
- Contact:
Re: General Protection Fault in VirtualBox Only
I realize that a common exception handler isn't necessarily a good design in a mature OS, but I would like to point out that I don't have a file system, protected mode disk IO, processes, a full C library, dynamic memory, let alone a GUI. I can barely get keyboard input. It does not make sense to spend time overengineering a collection of industry-quality individualized interrupt handlers when I can't even get a trivial example to work properly!Brendan wrote:Consider the following code:
...
Is this code brilliant, or a retarded joke? Can you think of a way to "optimise out" the entirely stupid "switch(number) { ... }" and the entire "bork()" function?
Now think about what your stubs and "common" interrupt handler will actually need to do once you start having different behaviour for different exceptions handlers. . . . Is the idea of having a "common interrupt handler" any less idiotic than the "bork()" function in my example above?
And while I do grasp that it's important to not misdesign something from the beginning, you'll notice that there aren't any functions analogous to your bari()--I am deliberately redirecting all interrupts to the same place because I want to print the same kind of diagnostic information about each. And while I got that idea from a tutorial, I think as a design principle, at this stage, it's fundamentally sound and actually ideal for learning what's happening.
Brendan wrote:Start by getting hold of a decent emulator with a good debugger (e.g. Bochs with its inbuilt debugger enabled);
Thank you.linguofreak wrote:The OP has already stated that the problem doesn't show up in Bochs. The problem only occurs in Virtual Box (see the thread title)
This is a good idea. I had hoped someone might immediately spot the problem a priori, but if no one has any other suggestions, I will try attacking VirtualBox's debugging facilities again.Brendan wrote:. . . and put a breakpoint or something (even just "jmp $") as the first instruction in those macros that create the interrupt stubs. Once you've done that the OS will stop when any interrupt occurs, and you can use the debugger to examine the raw data that the CPU put on the interrupt handler's stack (before any of your normal interrupt handling has a chance to mess any of it up).
Thanks,