triple fault after context switch

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
szhou42
Member
Member
Posts: 67
Joined: Thu Apr 28, 2016 12:40 pm
Contact:

triple fault after context switch

Post by szhou42 »

So, after my kernel setup all the memory management and file system stuff, I decided to load a usermode entry program(flat binary, not elf) from hard disk and run it under user mode.

This is how I do it in detail
In kernel mode, load a program called "userentry.bin", which is basically a program that loops infinitely

Code: Select all

load_program("userentry.bin");
In load_program:

Code: Select all

    vfs_node_t * program = file_open(filename, 0);
    if(!program) {
        printf("Fail to open %s, does it even exist?\n", filename);
        return;
    }
    uint32_t size = vfs_get_file_size(program);
    char * program_code = kcalloc(size, 1);
    vfs_read(program, 0, size, program_code);
    // create a process, all the context registers are zeroed out by calloc(), set eip to the start of program code, also set IF flag 
    pcb_t * p1 = kcalloc(sizeof(pcb_t), 1);
    memcpy(p1, current_process, sizeof(pcb_t));
    p1->regs.eip = (uint32_t)program_code;
    p1->regs.eflags = 0x200; // enable interrupt
    // Insert the process into process list
    p1->self = list_insert_front(process_list, p1);
    // call yield via a system call, but actually we're still in kernel mode..
    asm volatile("mov $1, %eax");
    asm volatile("int $0x80");
The yield system call will then call the scheduler function, which finds next function to run, which has to be the process created in load_program().
Then scheduler does a context switch to the program, and the program does start running.

However, a triple fault occur after 1 second.
I've tried to set breakpoints on the interrupt or exception handler, but the kernel always triple fault before any exception/interrupt happens
I also tried to insert a int 0x80 instruction into the program, and it will immediately triple fault and reset the machine.

I understand that my way of loading a process is kind of weird because I did not even create a separate address space for the new program, but i'm just experimenting running a few programs concurrently in user mode

I suspect that the context I manually created for the process leads to the triple fault..
Can someone explain what's going on ??
my os code is here for reference: https://github.com/szhou42/osdev/tree/master/src

Any help would be appreciated! Thanks!
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: triple fault after context switch

Post by iansjack »

What have you done in the way of debugging? Are you running under a debugger? Do you have exception handlers that will stop execution when there is an exception and display the contents of important registers? Are you using paging?

You really need to treat this as a chance to hone your debugging skills so that you at least know where the program is failing and what the contents of registers and memory are at that point. It should then be fairly simple to find the error. A likely cause is a corrupted stack.
User avatar
Ch4ozz
Member
Member
Posts: 170
Joined: Mon Jul 18, 2016 2:46 pm
Libera.chat IRC: esi

Re: triple fault after context switch

Post by Ch4ozz »

I dont see you allocating memory for the stack at all.
You are currently using the stack of the kernel thread it seems.
szhou42
Member
Member
Posts: 67
Joined: Thu Apr 28, 2016 12:40 pm
Contact:

Re: triple fault after context switch

Post by szhou42 »

iansjack wrote:What have you done in the way of debugging? Are you running under a debugger? Do you have exception handlers that will stop execution when there is an exception and display the contents of important registers? Are you using paging?

You really need to treat this as a chance to hone your debugging skills so that you at least know where the program is failing and what the contents of registers and memory are at that point. It should then be fairly simple to find the error. A likely cause is a corrupted stack.
Thanks for your quick reply! but please just don't assume we post here because we're too lazy to debug ourselves. My situation is a bit different, it resets the whole machine before any useful info is printed out, and it's not as easy as single stepped and find exactly where it cause the triple fault. yes i am using paging,

I‘ve been debugging using gdb(both source and machine mode) and stepped through every assembly instruction until the loaded program is actually running.

Code: Select all

loop:
xchg bx,bx
xchg bx,bx
xchg bx,bx
xchg bx,bx
xchg bx,bx
jmp loop
What I've found is that, if I disable interrupt, nothing happen, the program just loops infinitely. However, if interrupt is enabled, triple fault will occur in 1 second.
I've tried to set breakpoints around my exception and interrupt handler, but the program seems to always cause a triple fault before any exception or interrupt handler. but I am still sure that it has something to do with exception, otherwise the program wouldn't have run normally with IF flag disabled.

I tried inserting a int 0x80 in the program to trigger an exception, to see if there is something wrong with exception. and the int 0x80 does immediately cause a triple fault, so I am now suspecting that the current manually created context(registers) has prevented the CPU from triggering an exception. I will look into some documentation about this now..
szhou42
Member
Member
Posts: 67
Joined: Thu Apr 28, 2016 12:40 pm
Contact:

Re: triple fault after context switch

Post by szhou42 »

Ch4ozz wrote:I dont see you allocating memory for the stack at all.
You are currently using the stack of the kernel thread it seems.
Hi Ch4ozz, you're right. but the program does not use the stack at all, i m just trying out loading and running a program.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: triple fault after context switch

Post by onlyonemac »

szhou42 wrote:What I've found is that, if I disable interrupt, nothing happen, the program just loops infinitely. However, if interrupt is enabled, triple fault will occur in 1 second.
I've tried to set breakpoints around my exception and interrupt handler, but the program seems to always cause a triple fault before any exception or interrupt handler. but I am still sure that it has something to do with exception, otherwise the program wouldn't have run normally with IF flag disabled.
What's happening is that the timer interrupt is coming along and somehow the CPU can't find the interrupt handler for it, so the CPU tries to fault but can't find the fault handler, so the CPU tries to double-fault but can't find the double-fault handler, so the CPU triple-faults. Most likely, the CPL has been messed up somewhere and your interrupt handler can't be called from whatever CPL your "userspace binary" is running in.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: triple fault after context switch

Post by iansjack »

szhou42 wrote:
iansjack wrote:Thanks for your quick reply! but please just don't assume we post here because we're too lazy to debug ourselves. My situation is a bit different, it resets the whole machine before any useful info is printed out, and it's not as easy as single stepped and find exactly where it cause the triple fault. yes i am using paging,
Rest assured that I assume nothing. This is why I need to ask when you give no details of what debugging steps you have taken.

Now that you have given that information, I would suggest that you write handlers for all exceptions. At their simplest they will just halt the processor. Then you can use gdb to determine exactly where the program is faulting and which exception it is throwing; this should give you a good idea of what the problem is. The triple fault will never occur as you will catch the very first exception.
szhou42
Member
Member
Posts: 67
Joined: Thu Apr 28, 2016 12:40 pm
Contact:

Re: triple fault after context switch

Post by szhou42 »

iansjack wrote:
szhou42 wrote:
iansjack wrote:Thanks for your quick reply! but please just don't assume we post here because we're too lazy to debug ourselves. My situation is a bit different, it resets the whole machine before any useful info is printed out, and it's not as easy as single stepped and find exactly where it cause the triple fault. yes i am using paging,
Rest assured that I assume nothing. This is why I need to ask when you give no details of what debugging steps you have taken.

Now that you have given that information, I would suggest that you write handlers for all exceptions. At their simplest they will just halt the processor. Then you can use gdb to determine exactly where the program is faulting and which exception it is throwing; this should give you a good idea of what the problem is. The triple fault will never occur as you will catch the very first exception.
This is weird, i've written handler for every exception/irq interrupts, and i've set breakpoints on each and everyone of them. but none are caught.
but anyway, you are right that it's caused by stack corruption, the esp was accidentally set to zero when the user program was running. So i guess it somehow triple faults when the CPU was not able to push any data to the stack.
Thank you very much, i've learned more about how to debug a triple fault now :).
szhou42
Member
Member
Posts: 67
Joined: Thu Apr 28, 2016 12:40 pm
Contact:

Re: triple fault after context switch

Post by szhou42 »

onlyonemac wrote:
szhou42 wrote:What I've found is that, if I disable interrupt, nothing happen, the program just loops infinitely. However, if interrupt is enabled, triple fault will occur in 1 second.
I've tried to set breakpoints around my exception and interrupt handler, but the program seems to always cause a triple fault before any exception or interrupt handler. but I am still sure that it has something to do with exception, otherwise the program wouldn't have run normally with IF flag disabled.
What's happening is that the timer interrupt is coming along and somehow the CPU can't find the interrupt handler for it, so the CPU tries to fault but can't find the fault handler, so the CPU tries to double-fault but can't find the double-fault handler, so the CPU triple-faults. Most likely, the CPL has been messed up somewhere and your interrupt handler can't be called from whatever CPL your "userspace binary" is running in.
Hi onlyonemac, I fixed the bug eventually. you're right that the CPU triple fault during the timer irq, but it's actually due to esp being set to 0 accidentally instead of CPU unable to find the handler. Thanks for your help :D !
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: triple fault after context switch

Post by onlyonemac »

szhou42 wrote:Hi onlyonemac, I fixed the bug eventually. you're right that the CPU triple fault during the timer irq, but it's actually due to esp being set to 0 accidentally instead of CPU unable to find the handler. Thanks for your help :D !
I see. Yes, that would also prevent the CPU from being able to execute the interrupt handler and subsequent exception handlers.
Last edited by onlyonemac on Wed Aug 10, 2016 8:38 am, edited 1 time in total.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: triple fault after context switch

Post by iansjack »

One of the joys of 64-bit mode is that you can set different stacks for different interrupts/exceptions, so you can guarantee that there is always a valid stack for exceptions even if there isn't a change in privilege level.
Post Reply