Page 1 of 1

Problems with task switching

Posted: Sat Jun 23, 2007 3:33 pm
by mmiikkee12
I decided my old task switcher was too... well, it sucked. So I attempted to rewrite it. So far I've started a task for the kernel and gotten it to jump to it. But as soon as the timer fires again, the scheduler gets called and promptly crashes the computer with a general protection fault, error code=0xf7cc.

Here's the setup code, and the worst scheduler ever created (currently it will just keep switching to the kernel/idle task, more tasks will be added later when this works)

Code: Select all

#define EFLAGS_INTERRUPTS (1 << 9)
#define EFLAGS_DEFAULTS (1 << 1)

namespace Tasks
{
	uint_32 next_pid;
	uint_32 cpid = 0;

	Task tasks[NUM_TASKS];

	void idle()
	{
		while (1)
			cdebug << "Idling\n";
	}

	extern "C" void do_bootstrap_task_switch(uint_32 newesp);
	void setup()
	{
		int n = 3;
		for (int i = 0; i < NUM_TASKS; i++)
		{
			GDT::set(GDT::gdt[n], (uint_32)&tasks[i].ldt, sizeof(tasks[i].ldt) - 1, 0x20, 0);
			n++;
		}
		
		// give the kernel its own task
		tasks[0].name = "[kernel]";
		
		// set up the ldt
		GDT::set(tasks[0].ldt[0], 0, 0xFFFFFFFF, 0x9A, 0xCF);
		GDT::set(tasks[0].ldt[1], 0, 0xFFFFFFFF, 0x92, 0xCF);
		
		// and the stack
		uint_32 *stack = new uint_32[PROCESS_STACK_SIZE / 4];
		assert(stack != NULL);
		
		tasks[0].stack = stack;
		int stack_index = (PROCESS_STACK_SIZE / 4) - 1;

		stack[stack_index--] = EFLAGS_DEFAULTS | EFLAGS_INTERRUPTS; // eflags
		stack[stack_index--] = 0x08;	// cs
		stack[stack_index--] = (uint_32)(&idle); // eip
		stack[stack_index--] = 0;	// ebp
		stack[stack_index--] = 0;	// not used for anything
		stack[stack_index--] = 0;	// edi
		stack[stack_index--] = 0;	// esi
		stack[stack_index--] = 0;	// edx
		stack[stack_index--] = 0;	// ecx
		stack[stack_index--] = 0;	// ebx
		stack[stack_index--] = 0;	// eax
		stack[stack_index--] = 0x10;	// ds
		stack[stack_index--] = 0x10;	// es
		stack[stack_index--] = 0x10;	// fs
		stack[stack_index--] = 0x10;	// gs
		stack[stack_index--] = 0x10;	// ss
		
		tasks[0].esp = (uint_32)&stack[stack_index + 1];
		do_bootstrap_task_switch(tasks[0].esp);
	}
	
	extern "C" void do_task_switch(uint_32 *oldesp, uint_32 eip, uint_32 newesp, uint_32 newldt);
	void task_switch(IDT::regs_t *regs)
	{
		do_task_switch(&tasks[cpid].esp, regs->eip, tasks[cpid].esp, (3 + cpid) << 3);
	}
};
And the assembly part:

Code: Select all

global do_task_switch
do_task_switch:
	; save old task's state
	pushf
	push cs
	push dword [esp+8+12]
	pusha
	push ds
	push es
	push fs
	push gs
	push ss
	mov eax, [esp+64+16]
	mov [eax], esp

	; and load the new one
	lldt [esp+64+4]
	mov esp, [esp+64+8]
	pop ss
	pop gs
	pop fs
	pop es
	pop ds
	popa
	iret

global do_bootstrap_task_switch
do_bootstrap_task_switch:
	mov esp, [esp+4]
	pop ss
	pop gs
	pop fs
	pop es
	pop ds
	popa
	iret
What am I doing wrong?

(By the way, I'd also like to be able to run all user tasks in ring 3, and services (this is a microkernel) in ring 0 or 1. Can I do that without messing with TSSes?)

Posted: Sat Jun 23, 2007 5:37 pm
by Kevin McGuire
I can not understand how the stack is not becoming corrupted if you are calling void task_switch(IDT::regs_t *regs), which calls the assembly function do_task_switch since it leaves all sorts of mubo jumbo on the stack which is never poped. Your thread was normally executing along before the timer interrupt:
[thread's data on stack]
Then the IRQ fires and you jump into the kernel which does something like:
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)]
Next, you start calling your C routine which make use of the stack.
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack]
Finally it appears you try to push the current thread state on top of all this.
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack][thread state]
When you load it seems all this garbage is present and never gets poped which looks like this.
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack][.......thread's data on stack(corrupted)....]
Where that thread's data on stack should have been _way_ back in the stack. You are most likely getting by since the thread does nothing crucial or noticable with it's stack since all it's data might be stored in the registers which you will succefully reload from the thread state, but after a while the stack overflows for the/each thread and causes a exception by violating the LDT/GDT or causing a page fault.

You could make it a simplier and slimmer. The processor already saved the code segment, and I have no idea what those other values are you are pushing and poping from over 64 bytes back in the stack.

mov esp, [current_thread->esp]
pushf
pushad
push ds, es, fs, gs, ss
mov [current_thread->esp], esp

lldt [next_thread->ldt]
mov esp, [next_thread->esp]
pop ss, gs, fs, es, ds
popad
popf
iret


The (next_thread || current_thread)->esp needs to be saved right after you enter the interrupt handler. This way you start at:
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)]
And have:
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack]
You jump back to:
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)]
And start pushing the thread state with pushf; pushad, ..... then you overwrite that with the new stack position containing the state, cpu pushed argument, and the thread original stack state.

Posted: Sun Jun 24, 2007 11:13 am
by mmiikkee12
Your thread was normally executing along before the timer interrupt:
[thread's data on stack]
I doubt it had anything on the stack, being that simple, but yes...
Then the IRQ fires and you jump into the kernel which does something like:
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)]
Plus a bunch of other stuff (I push all the registers onto the stack, then push esp so I'll get a pointer to an IDT::regs_t.)

Code: Select all

; Two bytes are already pushed before this: the error code and interrupt number.
isr_common:
	pusha
	push ds
	push es
	push fs
	push gs
	push ss
	mov eax, 0x10
	mov ds, eax
	mov es, eax
	mov fs, eax
	mov gs, eax
	mov ss, eax
	push esp
	call isr_handler
	add esp, 4
	pop ss
	pop gs
	pop fs
	pop es
	pop ds
	popa
	add esp, 8
	iret
Next, you start calling your C routine which make use of the stack.
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack]
Ah, I guess all my pushing and popping from the above snippet counted for the "kernel local storage"?
Finally it appears you try to push the current thread state on top of all this.
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack][thread state]
Yes... now that I think of it though, I'm pushing the thread state twice: once in the ISR and once in the task switcher.
When you load it seems all this garbage is present and never gets poped which looks like this.
[thread's data on stack][CPU pushed (EIP, EFLAGS, CS)][kernel function local storage and call stack][.......thread's data on stack(corrupted)....]
Yes, I see now. That would definitely be a problem :(

Now that I've noticed the task's state is already on the stack, I guess I could just use that. But how do I know how much to take off the stack to account for the C++ that it goes through?

Posted: Sun Jun 24, 2007 11:39 am
by Kevin McGuire
The way to do it is to place some code at the very beginning on your ISR:
extern g_kernel_scheduler_current_thread
pushf
pushad
mov eax, g_kernel_scheduler_current_thread
mov [eax->esp], esp /// <--- we save the point where the thread's state and stack data lies


When we save esp we save the point that is:
[thread's data on stack][CPU pushed arguments][thread state][ ...... ]
The .... part is all the garbage the kernel's C/C++ routine will add to the stack. We save esp right before the ..... part, and reload it when we resume/(switch back to) the thread.


Then you continue on by calling into your kernel's C/C++ functions. When you go to the point in which you would like to switch to the next thread.
mov eax, g_kernel_scheduler_next_thread
mov esp, [eax->esp]
popad
popf

Posted: Sun Jun 24, 2007 11:56 am
by mmiikkee12

Code: Select all

mov [eax->esp], esp
You do realize that eax is not a pointer to a struct right? :-)

Anyway, I'm trying that (well, something like that, but slightly less of an insane hybrid of two languages) now.

Posted: Sun Jun 24, 2007 12:50 pm
by mmiikkee12
Still not working. Based on what you gave me, I came up with this:

Code: Select all

; IRQ 0 (ISR 32) is a special case, since it's used for the scheduler also
extern current_esp
global _irq0
_irq0:
	pusha
	pushf
	mov eax, [current_esp]
	mov [eax], esp
	jmp isr32

Code: Select all

extern current_esp
global do_task_switch
do_task_switch:
	mov eax, [current_esp]
	mov esp, [eax]
	popa
	popf
	iret

Code: Select all

qemu: fatal: Trying to execute code outside RAM or ROM at 0xc0a19c60

EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=001b94bc
EIP=c0a19c60 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
No, that's definitely not what I wanted to do :(

Posted: Sun Jun 24, 2007 2:03 pm
by Kevin McGuire
It looks correct to me as long as current_esp is a pointer that points to a valid memory location, and you are setting it correctly before jumping to the thread (bootstrapping the scheduler) and after choosing the next thread (before resuming - iret).

In C/C++ the current_esp should be, <some type> *current_esp.

You are going to have to produce more information about your environment or problem before I can make any "logical" deduction about what it might be. There are too many possibilities at the moment.

[edit]
Oh. Look at the order of the pusha, pushf, popa, and popf.

Posted: Sun Jun 24, 2007 4:24 pm
by mmiikkee12
In C/C++ the current_esp should be, <some type> *current_esp.
Actually, extern "C" uint_32 *current_esp so it doesn't get mangled.
Oh. Look at the order of the pusha, pushf, popa, and popf.
That was part of it, thanks.
The other part was that when I put in the extra code at the start of my timer handler, I was doing this:

Code: Select all

set(idt[32], (isrfunc_t)_irq0, 0x08, 0x8E);
When I meant to do this:

Code: Select all

set(idt[32], (isrfunc_t)[b]&[/b]_irq0, 0x08, 0x8E);
Therefore it was taking whatever garbage I had in _irq0 and trying to use it as an address. Never a good thing (unless you're trying as hard as you can to crash the computer - and I'm not :-)