Page 1 of 1

new scheduler breaks in qemu-kvm but not elsewhere

Posted: Tue May 17, 2011 6:34 pm
by berkeleyrc
hey guys,

I just finished some minor upgrades on my scheduler -- mostly simple refactoring. However now it breaks when I use qemu with kvm. One one hand I'm not very worried because everything seems to work right on real hardware and with qemu (no kvm) and bochs. However I still want to support qemu-kvm; it seems most operating systems can work around its shortcomings.

I get a general protection fault - sometimes with code 0 and sometimes with a seemingly random code. It breaks in two situations: returning to kernel code (e.g. the interrupt was during a syscall - probably my write() function) when three processes are running, and returning to kernel code when only two processes are running after one process has been stopped. The code always breaks on the 'iret' statement in my 'restore_kernel' label.

The only thing I'm not really comfortable about is where the stack for the irq0 is placed. When the user tasks are interrupted it takes ss0:esp0 from the TSS right? Each of my user processes has two stacks allocated: one for kernel space (that goes in the TSS, esp0) and one for user space. That's all that's necessary to prevent conflicts, right? I've seen people sometimes use more than this, but I don't understand why (I see how this could allow recursive system calls if one wanted to trigger int 80 from a system call -- but I don't see the need for this since inside the kernel I can call any kernel function I want...).

Also, I realize my kernel switching could probably be much more optimal (don't bother to save registers to the struct -- just keep them on the stack), but I'm not quite there yet :).



My GDT is setup like this:
0x08 code descriptor, DPL 0, no reads
0x10 data descriptor, DPL 0, read/write
0x18 code descriptor, DPL 3, no reads
0x20 data descriptor, DPL 3, read/write
0x28 TSS descriptor (for all my processes)
0x30 TSS descriptor (for double-fault task gate)

Currently the scheduler only runs when IRQ0 is fired. I have no problems with its timing. The ISR I use (with extraneous stuff that doesn't relate commented out) is,

Code: Select all

isr_irq0_wrap:
    pushad
    cli
    cld
    ;mov     eax, [irq0_clock_tics]    ; this just counts tics
    ;add     eax, 1
    ;mov     [irq0_clock_tics], eax
    call    scheduler_save
    popad
    ;call    scheduler_tic   ;/*update sleeping threads */
    call    scheduler_pick  ;/*pick next process to run*/
    mov     ebx, eax
    push    0               ;/* signal end of irq0 */
    call    pic_eoi
    add     esp,4
    mov     eax, ebx
    jmp     switch_restore  ;/*bring the process back */
    ret                 ;/* ret should never run */

scheduler_save takes the registers that are pushed and saves them into a structure. The structure is,

Code: Select all

struct process
{
    /* DO NOT CHANGE THE FIRST PART OF THIS STRUCT */

    /* 0x00 circularly linked list */
    struct process* next_process;
    struct process* prev_process;

    /* 0x08 gp registers */
    uint32_t eax;
    uint32_t ecx;
    uint32_t edx;
    uint32_t ebx;

    /* 0x18 */
    uint32_t esi;
    uint32_t edi;

    /* 0x20 stack registers */
    uint32_t esp;
    uint32_t ebp;

    /* 0x28 special registers */
    uint32_t eip;
    uint32_t eflags;

    /* 0x30 segments */
    uint32_t cs;
    uint32_t ss; /*same as the other data descriptors */

    /* 0x38 kernel stack */
    uint32_t ss0;
    uint32_t esp0; /* the kernel stack */
    
    /* 0x40 paging */
    uint32_t cr3;

    /* CHANGES AFTER THIS LINE SHOULD BE OKAY */

    /* 0x44 permissions */
    uint16_t real_uid;
    uint16_t effective_uid;
    uint16_t saved_uid;

    /* 0x4A scheduling info */
    uint8_t state;
    uint32_t sleep_time;
    uint16_t pid; 

};
and scheduler_save,

Code: Select all

void scheduler_save(uint32_t edi, uint32_t esi, 
        uint32_t ebp, uint32_t esp2, uint32_t ebx, uint32_t edx, 
        uint32_t ecx, uint32_t eax, uint32_t eip, uint32_t cs, 
        uint32_t eflags, uint32_t esp, uint32_t ss)
{
    if(active_process != 0)
    {
        /* 1: save the state of the active process */

        if(cs == 0x8)
        {
            /* we came from kernel space; need to use other values */
            esp = esp2+0xC;
            ss = 0x10;
        }

        active_process->eax = eax;
        active_process->ecx = ecx;
        active_process->edx = edx;
        active_process->ebx = ebx;
        active_process->esi = esi;
        active_process->edi = edi;
        active_process->esp = esp;
        active_process->ebp = ebp;
        active_process->eip = eip;
        active_process->eflags = eflags;
        active_process->cs = cs;
        active_process->ss = ss;

    }
}

All scheduler_pick returns the address of the next process to be scheduled, sets the esp0 in the TSS

Code: Select all

struct process* scheduler_pick()
{

    if(active_process != 0)
        active_process = active_process->next_process;

    if(active_process == 0) 
    {
        if(start_process == 0)
            return 0; /* this schedules the idle task */
        else
            active_process = start_process;
    }

    /* (prepare to) Restore State */

    /*this clears the busy bit on the TSS and sets the esp0 */
    *((char*)GDT_TSS_TYPE_POSITION) = GDT_TSS_TYPE_VALUE;
    *gdt_esp0 = active_process->esp0;

/*   just some debugging code
    if(active_process->eip == 0)
    {
        printk("process %d has an EIP of 0!!!\n",active_process->pid);
    }
    serial_putchar('{');
    serial_putchar(active_process->pid + '0');
    if(active_process->cs == 0x08)
    {
        serial_putchar('+');
        if(active_process->ss != 0x10)
            printk("BAD SS\n");
    }
    serial_putchar('}');
*/

    return active_process;



}
Finally, here's the important stuff:

Code: Select all

;/* takes eax as an argument, not the stack */
switch_restore:
    prefetcht0   [eax+0x30];
    prefetcht0   [eax+0x10];
    cmp     eax, 0
    je      restore_idle

    mov     ecx, dword [eax+0x30] ;/* new cs*/
    cmp     ecx, 0x8
    je      restore_kernel

restore_user:
    ;/* we're returing to user space */
    push    dword [eax+0x34]    ;/*ss*/
    push    dword [eax+0x20]    ;/*esp*/
    push    dword [eax+0x2C]    ;/*eflags*/
    push    dword [eax+0x30]    ;/*cs*/
    push    dword [eax+0x28]    ;/*eip*/

    ;/* general purpose registers */
    push    dword [eax+0x08]    ;/*eax*/
    push    dword [eax+0x0C]    ;/*ecx*/
    push    dword [eax+0x10]    ;/*edx*/
    push    dword [eax+0x14]    ;/*ebx*/
    push    0                   ;/*4 needless bytes*/
    push    dword [eax+0x24]    ;/*ebp*/
    push    dword [eax+0x18]    ;/*esi*/
    push    dword [eax+0x1C]    ;/*edi*/

    ;/* load segment selectors */
    mov     ecx, 0x23    ;/*userspace stack*/
    mov     ds, cx
    mov     es, cx
    mov     fs, cx
    mov     gs, cx
    mov     ecx, 0x2A
    ltr     cx

    sti
    popad
    iret


restore_kernel:
    ;/* we're returning to kernel space */
    mov     esp,  [eax+0x20]   ;/*esp*/

    push    dword [eax+0x2C]    ;/*eflags*/
    push    dword [eax+0x30]    ;/*cs*/
    push    dword [eax+0x28]    ;/*eip*/

    ;/* general purpose registers */
    push    dword [eax+0x08]    ;/*eax*/
    push    dword [eax+0x0C]    ;/*ecx*/
    push    dword [eax+0x10]    ;/*edx*/
    push    dword [eax+0x14]    ;/*ebx*/
    push    0                   ;/*4 dumb bytes*/
    push    dword [eax+0x24]    ;/*ebp*/
    push    dword [eax+0x18]    ;/*esi*/
    push    dword [eax+0x1C]    ;/*edi*/

    ;/* reload task register */
    mov     ecx, 0x2A
    ltr     cx

    sti
    popad
    iret
restore_idle:
    hlt
    sti
restore_idle_loop:
    hlt
    jmp restore_idle_loop
I appreciate any thoughts on this. Let me know if you want to see anything else.

Re: new scheduler breaks in qemu-kvm but not elsewhere

Posted: Wed May 18, 2011 11:35 am
by Combuster
I get a general protection fault
What's the exact cause and origin (analyse the values on the stack and everything they refer to)? What's the corresponding source code - there are many irets to choose from? Is your assembler sensitive to the difference between iret and iretd?