Multitasking problems

matthias · Post by **matthias** » Sun Aug 27, 2006 7:59 am

I've a big problem with multitasking, first I had a lot of General Protection Faults, even triple faults. After changing the code I finally got the code not to fault but do something else instead :-X
It just hangs after the first task switch ::)

My scheduler code:

Code: Select all

unsigned long ticks = 0;

extern void switch_to(unsigned long);

thread* prev;

void schedule(struct regs* r)
{
   /* get next thread */
   SchedulerGetNextNode();

   if(ticks % 10000 == 0)
   {
      puts("1 second passed\n");
   }

   ticks++;

   int sw = 0;

   if(node != prev)
   {
      sw = 1;
   }

   prev = node;

   if(sw)
   {
      switch_to(node->gdt); /* after this nothing happens */
   }
}

extern _schedule

; 32: IRQ0
_irq0:
    cli
    pusha
    push ds
    push es
    push fs
    push gs

    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov eax, esp

    push eax
    mov eax, _schedule
    call eax
    pop eax

    pop gs
    pop fs
    pop es
    pop ds
    popa
    add esp, 8
    sti
    iret

.globl _switch_to

_switch_to:
   ljmp *(%esp)
        ret

As an attachment I give you the source file. (Or this file)

p.s. have been debugging a whole week now :-[

Habbit · Post by **Habbit** » Sun Aug 27, 2006 3:22 pm

I really don't know a lot about multitasking, as I haven't yet reached that point of my kernel, but I can see a major flaw in your code: you jump to the task while you're still inside the interrupt handler.

I will explain myself:

Code: Select all

IRQ0:
    cli
    pusha and other pushes
    call _schedule <<-------- C function schedule
        {
            account_for_passed_ticks;
            find_next_thread();
            switch_to(tread->gdt);   <<---------- ASM proc switch_to
                ljmp *(%esp) <<--- jump to the procedure
                ret
        }
    other pops and popa
    sti
    iret

And the flaws are...

No really a flaw, but you're mixing intel syntax (proc IRQ0) with AT&T (switch_to)... What for? Use nasm or gas, but not both (well, do what you want, just an opinion)
You're using the hardware task switching method. Again, do what you want, but even if hardware task management is far easier for the OSdever, it 1) is way slower than a good designed software management, and 2) won't work for AMD64 processors, so it is not "future compatible"
And the real flaw: you're jumping to the task while still inside the IRQ0 handler, as this language-mixed, call-nested view shows. So, when you jump to the next thread, INTERRUPTS ARE STILL DISABLED (execution has not yet arrived to STI). So, even when the task switch succeeds (that I don't know), the next clock interrupt will never arrive and the scheduler won't be called again.

So what can you do with the real flaw? Well, you could, for example, modify the return information in the stack of IRQ0.
This works for software task management, but I don't know if it will for the hardware, TSS-based version. Basically, the idea is this: the CS:EIP in force when the interrupt was detected is stored in the stack, and used by IRET to return to it. Just replace it with the address you want to jump to.
Assuming that information was stored just before calling the handler (that I don't know, look for it at the intel/amd manuals), the code would be like this:

Code: Select all

IRQ0:
    cli
    push %ebp
    mov %esp, %ebp <<---- return address would be at 4(%ebp)
    pusha and other pushes
    <<-- save current thread info from the stack to the thread table
    call _schedule <<---- C functions return value is at %EAX
    mov %eax, 4(%ebp) <<-- modify return address
    <<-- if the task will change, load the new state from the thread table to the stack
    other pops and popa
    pop %ebp
    sti
    iret <<-- this will return to the modified return address


void* schedule()
{
    account_for_ticks;
    nextThread = GetNextThread();
    return nextThread->eip;
}

This code is 100% na?ve, does not account for changes in the code segment of apps, has probably more errors than your code, and a lot of etceteras, but I think (correct me please, OSdevers of the forum) these are the basics of multitasking

[edit]aargh! i always go crazy with the stack direction T_T[/edit]

matthias · Post by **matthias** » Sun Aug 27, 2006 3:32 pm

Habbit wrote: lots of text

Well, I based this code on another's OS scheduling code. Some polish guys were working on an OS called ChaOS. 2 years ago I sort of copy&pasted their code into another kernel of mine. Now 2 years later I ported it to my new kernel with many changes regarding the scheduling lists and I didn't really changed a lot in the switching method. Or even the task creation. That code from my previous kernel worked like a choochoo. As for the interrupts, putting a sti before the switch doesn't help

Tried it but failed. Also my eflags are also setup ok, as in INTERUPTABLE bit is set. It has been days mangling with my code now -_-.

As far as for the x86-64 stuff I do not intend to make such an OS yet, it will support it in future, so in 32-bits mode I prefer the security of tss-based switching and in 64-bits I'll have to obey the power of software switching. I have reasons for tss-based-switching, which are for instance security and just testing.

Also to do hardware task switching you'll have to do a call, jmp or emulate an interrupt (i.e. syscall) to force context switches (task gates, interrupt gates, etc.)

Habbit · Post by **Habbit** » Sun Aug 27, 2006 6:18 pm

matthias wrote: As for the interrupts, putting a sti before the switch doesn't help Tried it but failed.

How is the IRQ0 handler getting called? Through a task gate? If that is the case, a STI won't suffice: you will actually have to IRET from the IRQ0 handler or the BUSY bit in the task gate will be active, forbidding reentrance (and thus another call to the scheduler). Besides, the NT (nested task) bit in EFLAGS (or CRsomething, dunno) can be responsible.

Or so I think I read somewhere... ;D

matthias · Post by **matthias** » Mon Aug 28, 2006 5:05 am

Habbit wrote:
matthias wrote: As for the interrupts, putting a sti before the switch doesn't help Tried it but failed.
How is the IRQ0 handler getting called? Through a task gate? If that is the case, a STI won't suffice: you will actually have to IRET from the IRQ0 handler or the BUSY bit in the task gate will be active, forbidding reentrance (and thus another call to the scheduler). Besides, the NT (nested task) bit in EFLAGS (or CRsomething, dunno) can be responsible.

Or so I think I read somewhere... ;D

I setup the idt entry as:

Code: Select all

idt_set_gate(32, (unsigned)irq0, 0x08, 0x8E);

idt code:

Code: Select all

/* Use this function to set an entry in the IDT. Alot simpler

*  than twiddling with the GDT ;) */

void idt_set_gate(unsigned char num, unsigned long base, unsigned short sel, unsigned char flags)

{

    /* The interrupt routine's base address */

    idt[num].base_lo = (base & 0xFFFF);

    idt[num].base_hi = (base >> 16) & 0xFFFF;



    /* The segment or 'selector' that this IDT entry will use

    *  is set here, along with any access flags */

    idt[num].sel = sel;

    idt[num].always0 = 0;

    idt[num].flags = flags;

}

About the eflags, I flushed the eflags register so I'm certain the nested bit is not '1'. Even without flushing the eflags it does not work.

Also when I don't do a switch_to call every time a second passes it writes it to screen so my interrupt handlers are ok. The only problem is that when it gets to the instruction to jump to the tss it just doesn't do anything at all. I've tested some other nasm code:

Code: Select all

global _switch_to
_switch_to:
        push    ebp
        mov     ebp, esp
        jmp far [ss:ebp+4]
        pop     ebp
        ret

The code gets to the point of jmp far [ss:ebp+4] and then it's over with the fun

And yes I do send an EOI

Ryu · Post by **Ryu** » Mon Aug 28, 2006 1:48 pm

Habbit wrote: [*]You're using the hardware task switching method. Again, do what you want, but even if hardware task management is far easier for the OSdever, it 1) is way slower than a good designed software management, and 2) won't work for AMD64 processors, so it is not "future compatible"

I actually recommend starting with hardware switches because this allows you to get familiar with the processor behavoir which will lead to a tuned up software switching and scheduling eventually. But even software switching on intel based processors you still need a TSS to switch into differnt rings. Is this really "future compatible"?, I've not yet looked into AMD prcoessors but if they are differnt I plan to create an entirely new task switcher and most likely the entire scheduler if the differnts if so great such as if theres no intel's equivalent "Task Gates". But I mean if this is the case then you still have to make two differnt software task switchers which doesn't tell me a software switcher will be compitable in any case.

Habbit · Post by **Habbit** » Mon Aug 28, 2006 2:16 pm

Ryu wrote: But even software switching on intel based processors you still need a TSS to switch into differnt rings. Is this really "future compatible"?, I've not yet looked into AMD prcoessors but if they are differnt I plan to create an entirely new task switcher and most likely the entire scheduler if the differnts if so great such as if theres no intel's equivalent "Task Gates".

There are still TSSs in AMD64. What is not available is the task switching mechanism. That is, the TSS is just an information storage. In fact, it has been revamped to that purpose: its fields to store CR3 and the GPR have been removed to store the new "interrupt stack table", a set of clean stacks for certain interrupts (used for something like what the hardware switcher was: providing a clean stack for the double fault - or, by the way, any - interrupt handler). There are many more changes, so you should take a look at either the AMD or the Intel manual (or even better, at both)

matthias · Post by **matthias** » Mon Aug 28, 2006 2:56 pm

I redesigned the IRQ handling, PIC remapping (auto EOI on master) and IDT, but I still don't get it to switch

Any ideas you guys?

matthias · Post by **matthias** » Fri Sep 01, 2006 6:21 am

I detected some strange behaviour. If I try the taskswitch on a real system, it just hangs, however in a virtual machine (MS VPC) it reboots and says it encountered a fault. What can be concluded from this?

And another question I had reading upon the topic:

http://www.mega-tokyo.com/forum/index.php?board=1;action=display;threadid=10129

Could my problem be of the same issue?

Once again my IRQ routine (updated)

Code: Select all

; 32: IRQ0
_irq0:
    cli
    push byte 0
    push byte 32
    jmp irq_common_stub

extern _irq_handler

irq_common_stub:
    pusha
    push ds
    push es
    push fs
    push gs

    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov eax, esp

    push eax
    mov eax, _irq_handler
    call eax
    pop eax

    pop gs
    pop fs
    pop es
    pop ds
    popa
    add esp, 8
    iret

irq_handler() calls the schedule() function

Combuster · Post by **Combuster** » Fri Sep 01, 2006 7:51 am

The stack is probably the problem here - whats missing in the taskswitch is the following:

(you push registers and all, good)
- save current esp, so that the next time its scheduled you can restore it
- determine next task to run
- load the task's context (like page directory, esp0 in the tss, iopl bits)
- load the task's corresponding esp
(return, pop registers and iret back to execution)

By jumping to code you are simply throwing away your tasks previous context - you'll never know what its esp was before you suspended it

modified excerpt from my taskswitcher

Code: Select all

; have your next task ready in [nexttask], 
; maybe on the current stack or somewhere in global memory 
; both should work
MOV EAX, [scheduler_curtask] ; load current task info pointer
MOV [EAX+STACK], ESP ; store ESP
MOV EAX, [nexttask] ; get next task info pointer
MOV [scheduler_curtask], EAX ; set current task info pointer
MOV ESP, [EAX+STACK]  ; load new ESP

Note: If you do userspace threads this way you'll have to set ESP0 in your TSS as well
For the rest, it is wise to check what is on a thread's stack before you try to switch to it the first time. Bochs'll help you big time with that.

Pype.Clicker · Post by **Pype.Clicker** » Fri Sep 01, 2006 8:13 am

matthias wrote: I redesigned the IRQ handling, PIC remapping (auto EOI on master) and IDT, but I still don't get it to switch Any ideas you guys?

i fear you've been fooled. The "auto EOI" feature cannot be used on a PC: you _have_ to send the EOI. that's how 80x86 processors expect the PIC to work, and programming the PIC to work in another way won't solve the issue.

matthias · Post by **matthias** » Fri Sep 01, 2006 8:16 am

I already noticed that, but I'm sending EOI already, so this is not the problem.

matthias · Post by **matthias** » Sun Sep 03, 2006 5:01 am

This is my new routine:

Code: Select all

tmp_stack: dd 0

_irq0:
    cli
    cld
    pushad
    push ds
    push es
    push fs
    push gs

    mov [tmp_stack], esp ; save current stack value
    mov esp, _tss_stack ; move to new stack
    
    call _schedule

    mov esp, [tmp_stack] ; restore stack
    
    pop gs
    pop fs
    pop es
    pop ds
    popad
    
    mov al, 0x20
    out 0x20, al
    sti
    iretd

SECTION .bss
    resb 4096               ; This reserves 4KBytes of memory here
[global _tss_stack]
_tss_stack:

This works ok, but the task still doesn't run

Pype.Clicker · Post by **Pype.Clicker** » Mon Sep 04, 2006 2:58 am

may i ask what's supposed to be on the outgoing stack ? i mean, the first thing you'll do there will be popping out registers values and the like. You do have values to pop in first place, don't you?

btw,
1. doing a CLI in the start of a IRQ routine usually mean you're using a trap gate instead of an interrupt gate in your IDT, but it's (imho) bad practice.
2. doing a STI at the end of an IRQ routine is hazardous. If you've received an IRQ during the handler, it will be processed _before_ you had the option to completely clean the stack from the pushed CS and EIP. after a while, that can make your stack growing and growing

3. i haven't read all the above posts in details, but i fail to see why you *need* the "_tss_stack" in first place. handling _schedule on the incoming stack would be as fine, no?

OSDev.org

Multitasking problems

Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems

Re:Multitasking problems