Multitasking problems and the stack (I think)

Stevo14 · Post by **Stevo14** » Thu Mar 13, 2008 3:51 am

Hello everyone, this is my first post here, I am just starting OS development.

I am currently using the code from JamesM's multitasking tutorial as a learning tool. Everything works fine until I try to return back into my main function in two scenarios.
The first is this: I call fork() in main, it fork's the kernel process and returns as the parent. The parent task runs fine for several milliseconds after which switch_task() gets called by my timer interrupt. Everything is saved and the new task (the child task) is loaded without problems. Then it page faults on the "return;" command when the child tries to go back to the main() function. Here is a screen shot of the output from this scenario:

The other scenario is this: I don't call fork() in main(), the kernel process then runs for several milliseconds after which switch_task() gets called by my timer interrupt (like before). This time though the kernel task (at this point the only task) also page faults trying to "return;" back into main(). Here is a screen shot of this one:

My first thought was that the stack was at fault because it would push or pop a bogus value on the return causing a page fault. But I am new at this an am probably wrong.

JamesM · Post by **JamesM** » Thu Mar 13, 2008 3:56 am

Hi,

so, firstly, does your timer interrupt work without task switching? I.e., if you comment out the task switch code in your timer handler, does the program work? If so, your stack is not being killed by your irq handler.

If that *does* work, you need to look at your task switching code, because something is FUBAR - I suggest you try and fix scenario 2 before 1, because 2 is the more disturbing one (and a fix for 2 will possible fix 1 also).

If you're still stuck, perhaps you could post some code?

Cheers,

James

Stevo14 · Post by **Stevo14** » Thu Mar 13, 2008 4:14 am

JamesM wrote:Hi,

so, firstly, does your timer interrupt work without task switching? I.e., if you comment out the task switch code in your timer handler, does the program work? If so, your stack is not being killed by your irq handler.

Yes, my timing code works otherwise. (It will even get the current date/time from the CMOS chip and intermittently print it to the screen.

)
But just for the sake of consistency here is the relevant portion of the timer code:

Code: Select all

void timer_handler(struct regs *r)
{
    /* Increment our 'tick count' */
    timer_ticks++;

	//every 4 clocks (40 milliseconds) we invoke the process manager's scheduler
	if (timer_ticks % 4 == 0)
   {
		//schedule the next task
		write_string("Switching tasks...");
		switch_task();
		write_string("[done]\n");
	}

"[done]" never gets printed but that is expected. It is mostly there to tell me if something went terribly wrong in switch_task() and it actually ended up returning back to the timer handler.

JamesM wrote: If that *does* work, you need to look at your task switching code, because something is FUBAR - I suggest you try and fix scenario 2 before 1, because 2 is the more disturbing one (and a fix for 2 will possible fix 1 also).

If you're still stuck, perhaps you could post some code?

Cheers,

James

I would be surprised if the problem was in the switch_task() code, being that it is identical to the switch_task() function in your tutorial (with added debugging print functions of course). I'll look over it again though just to see if I can find anything.

JamesM · Post by **JamesM** » Thu Mar 13, 2008 4:47 am

What debug output have you got in switch_tasks? Could you get it to output exactly what register values it is poking? (Specifically, the EIP, ESP and EBP register values, and their values before change.)

Stevo14 · Post by **Stevo14** » Thu Mar 13, 2008 5:10 am

JamesM wrote:What debug output have you got in switch_tasks? Could you get it to output exactly what register values it is poking? (Specifically, the EIP, ESP and EBP register values, and their values before change.)

Humm... this is very strange. While I was adding more debug output to the switch_task() function it ended up working because of the debuging code that I added! It happens after loading the new eip, ebp, and esp values. The comments will explain:

Code: Select all

void switch_task()
{
   // If we haven't initialised tasking yet, just return.
   if (!current_task)
       return;

// Read esp, ebp now for saving later on.
unsigned int esp, ebp, eip;
asm volatile("mov %%esp, %0" : "=r"(esp));
asm volatile("mov %%ebp, %0" : "=r"(ebp));

// Read the instruction pointer. We do some cunning logic here:
   // One of two things could have happened when this function exits -
   // (a) We called the function and it returned the EIP as requested.
   // (b) We have just switched tasks, and because the saved EIP is essentially
   // the instruction after read_eip(), it will seem as if read_eip has just
   // returned.
   // In the second case we need to return immediately. To detect it we put a dummy
   // value in EAX further down at the end of this function. As C returns values in EAX,
   // it will look like the return value is this dummy value! (0x12345).
   eip = read_eip();

   // Have we just switched tasks?
   if (eip == 0x12345)
{
write_string("Switched back to process ");//we have a successful task switch!
write_number(getpid());
      return;
} 

// No, we didn't switch tasks. Let's save some register values and switch.
   current_task->eip = eip;
   current_task->esp = esp;
   current_task->ebp = ebp;

write_string("Before switch: esp:");
write_hex(current_task->esp);
write_string(" ebp:");
write_hex(current_task->ebp);
write_string(" eip:");
write_hex(current_task->eip);
write_string("\n");

// Get the next task to run.
   current_task = current_task->next;
   // If we fell off the end of the linked list start again at the beginning.
   if (!current_task) current_task = ready_queue;

//now reload everything with the new task
eip = current_task->eip;
   esp = current_task->esp;
   ebp = current_task->ebp; 

//uncomenting one of these lines, all of them or two or three will make it work fine
//write_string("After switch: esp:");
//write_hex(current_task->esp);
//write_string(" ebp:");
//write_hex(current_task->ebp);
//write_string(" eip:");
//write_hex(current_task->eip);
//write_string("\n");

// Make sure the memory manager knows we've changed page directory.
   current_directory = current_task->page_directory;

// Here we:
   // * Stop interrupts so we don't get interrupted.
   // * Temporarily put the new EIP location in ECX.
   // * Load the stack and base pointers from the new task struasm volatile("mov %%esp, %0" : "=r"(esp));ct.
   // * Change page directory to the physical address (physicalAddr) of the new directory.
   // * Put a dummy value (0x12345) in EAX so that above we can recognise that we've just
   // switched task.
   // * Restart interrupts. The STI instruction has a delay - it doesn't take effect until after
   // the next instruction.
   // * Jump to the location in ECX (remember we put the new EIP in there).
   asm volatile("         \
     cli;                 \
     mov %0, %%ecx;       \
     mov %1, %%esp;       \
     mov %2, %%ebp;       \
     mov %3, %%cr3;       \
     mov $0x12345, %%eax; \
     sti;                 \
     jmp *%%ecx           "
                : : "r"(eip), "r"(esp), "r"(ebp), "r"(current_directory->physicalAddr));
}

JamesM · Post by **JamesM** » Thu Mar 13, 2008 5:15 am

have you tried putting a "cli" at the start of the function to ensure interrupts are disabled?

Stevo14 · Post by **Stevo14** » Thu Mar 13, 2008 6:00 am

JamesM wrote:have you tried putting a "cli" at the start of the function to ensure interrupts are disabled?

I just did and it didn't help any. I also figured out that, in order to make it work, I need to call some sort of function in between "esp = current_task->esp;" and the assembly code. I tried inserting different types of functions like "write_char(0x00)" or "settextcolor(9,0)". Anything will work aslong as it is between "esp = current_task->esp;" and the assembly code. Does this confirm that it is a problem with the stack?

Stevo14 · Post by **Stevo14** » Thu Mar 13, 2008 4:45 pm

Ok, I've made some more progress. I figured out that when the child process enters, if I read the esp and ebp right then, the stack pointer (esp) doesn't have the value that it was given when the parent created the child.

I will work on this more but it's late, and I need sleep.

EDIT: OK, I now have partial success. By this I mean that it does work, but not for the right reasons. (e.g. I hacked it.) I added a "write_char(0x00)" just before the asm code that updates all the registers and it seems to work fine.

There was also a problem in my timer code where I was sending the "end of interrupt" signals after I called the handler. This was causing problems because, when switching tasks, the timer handler never returns (by design I believe) and so the PIC's were never told that I received the interrupt. It seems like quite a silly mistake now that I have fixed it...