Page 1 of 1

Software task switching problem

Posted: Wed Aug 18, 2004 5:50 am
by petrusss2
I'm having problems with my software task switching, I have a couple of threads (5), each of them prints out a text to the screen using my kprintf function.
After a time, the computer crashes (3rd page fault errors mostly, but sometimes there are other errors, looks like the stack is incorrect). If I instead set the screen manually (0xB8000, without any functions) it works correct.
I've tested to clear the interrupt flag before I call kprintf in my threads, but that doesn't help.
The kprintf function just formats the data (%i, %s etc) and prints it directly out on the screen at 0xB8000+.
Anyone familiar with this? Or ideas of what might be wrong?

PS: My PIT interrupt stub clears the interrupt flag, so no task switching will occur while scheduling another one.

Re:Software task switching problem

Posted: Wed Aug 18, 2004 6:02 am
by Candy
is your code thread safe, are interrupts disabled during interrupt service routine execution, can we see your code?

Re:Software task switching problem

Posted: Wed Aug 18, 2004 6:34 am
by Pype.Clicker
i got something similar at early multi-threading times. The problem was i had a single 'display buffer' for formatting text that had to be shown on screen.

Behind the "thread safe" or "reentrant" words hides the question "is there any resource (like the video pointer) that is in a *global* variable (not on stack) and that many threads could try to modify concurently ? If there's you should either duplicate that resource (give each thread its own buffer for preparing messages) or use a synchronizer object to prevent concurrent use

Re:Software task switching problem

Posted: Wed Aug 18, 2004 7:25 pm
by Dreamsmith
Regarding global variables...

Rule 1: Don't use global variables.
Rule 2: Make sure any global variable you're using in more than one thread is declared volatile.
Rule 3: Make sure any changes that affect global variables are either atomic or protected by some kind of lock.
Rule 4: Remember that a static local variable is just a global variable in disguise. See rule #1, and after you ignore it (like the rest of us), see the following two rules.

Other thoughts...

Is the printf function in the kernel? Are the threads using it user-mode threads? Did you make sure each thread has its own stacks? Note the plural: does each one have its own priv-0 stack, in addition to its own priv-3 stack? Are these stacks large enough to hold the data declared in the printf function, taking other function calls and recursion into account, if necessary?

Re:Software task switching problem

Posted: Wed Aug 18, 2004 7:30 pm
by Dreamsmith
petrusss2 wrote:...I have a couple of threads (5)...
Just one minor nit: in the English language, the word "couple" denotes the specific quantity of two (e.g. a married couple). "A couple of theads" means "two threads". Use "a few" if you mean a small number greater than two.

Re:Software task switching problem

Posted: Thu Aug 19, 2004 1:41 pm
by petrusss2
Well, it's nothing todo with the stacks, I've tested to disable the scheduling (by not sending an EOI) and use schedYield() instead, the threads looks like this:

Code: Select all

uint32 MyThread(uint32 _param) {
   FloodChar('*', 80);
   kprintf("\n");
   FloodChar(' ', 36);
   kprintf("Thread %i", _param);
   FloodChar(' ', 36);
   kprintf("\n");
   FloodChar('*', 80);
   kprintf("\n");

   while(1) {
      kprintf("## THREAD %i\n", _param);
      schedYield();
   }
   
   return 0xF000B000;
}
This works perfect.
And, as I've said, I've tested to call my kprintf after a "cli" so only one thread may execute it at once.

This is my context switcher:

Code: Select all

 
[global schedSwitchContext]
schedSwitchContext: 
   mov ebp, esp
   mov esp, [ebp + 4]     ; change stack
   
   add esp, 4   ; empty thingy
   pop ss
   pop gs
   pop fs
   pop es
   pop ds

   pop edi
   pop esi
   pop ebp
   add esp, 4
   pop edx
   pop ecx
   pop ebx
   pop eax
   add esp, 4      ; error code

   sti

   iret
   hlt
This has to be called inside an interrupt, because it's using the stack setup by it.

And, here's my schedYield() function:

Code: Select all

void schedYield() {
   g_iCurrentThreadTicks = 0;
   pitSetYield();
   sti;
   asm("int $0x20");
}

PS: I will never more say "a couple" of things > 2.

Re:Software task switching problem

Posted: Fri Aug 20, 2004 12:10 am
by Dreamsmith
Not sure if it's related to your problem, but why is there an "sti" before your "iret" instruction? It either is redundant and does nothing, or its an error -- it can't possibly be doing anything useful there, the iret instruction that follows it will load the flags, including I, with new values.

Rule of thumb: your code should probably only ever execute STI once, during bootup. ISRs will have the I flag set and reset as necessary by the interrupt mechanism and IRET instruction, and other bits of code that need to call CLI should precede the call with a PUSHF and restore with POPF later, not blindly assume it was set to begin with.

I don't see in this code any attempt to set/maintain priv-0 stacks. Are these threads running at priv-0? If not, that INT is going to cause a stack switch. If you're using that stack to store state information, then switching to another user task, and it also calls INT 0x20, and you haven't altered the priv-0 stack, you are going to be clobbering things.

Alas, you've left out enough code from what you've posted that there's really no way to tell what's going on...

Re:Software task switching problem

Posted: Fri Aug 20, 2004 11:24 am
by petrusss2
Well, the sti in my context switcher was there because I forgot it was there (from some experiments), thanks for pointing it out.
But it didn't solve the problem.
I've been doing some debugging and testing and I came up with this:

Code: Select all

 
void MyTest(uint32 _id) {
   char *pVideo = 0xB8000;
   pVideo[0] = '1' + _id;
   pVideo[1] = PCON_COLOR(PCON_COLOR_RED,PCON_COLOR_BLUE);
}
Every thread call this in an infinitive loop (like the one in my previous post), not atomically, and that shows some weird output.
The output address (should be 0xB8000) may differ from time to time (but it's mostly at 0xB8000), even though I've set it as a constant. How could this be?
Sometimes pVideo\[0\] points to the correct position (upper-left corner) and pVideo\[1\] points to a random memory location.
I've even tested to declare pVideo as volatile, doesn't help a bit.

Should I code some kind of lock that locks variables, like a lloop that loops until the variable is released? Because mutexes will be too slow for that.

And you're talking about user stacks, well, I'm not using user privilege levels yet, so it's all running in PL 0.

Re:Software task switching problem

Posted: Fri Aug 20, 2004 12:02 pm
by proxy
the reason you get wierd output is because you can have a thread be interrupted in the middle of that and have another thread start writing it's info there. the more threads doing this the more it's a problem.

you can end up with partial writes to the location (espeically on SMP systems, you can litterally have 2 threads write at the same exact time).

you need to protect this with some sort of mutex.

most likely (just for experimentation) if you disabled interrupts in my test did the work and re-enabled them at the end it woudl work ok.

proxy

Re:Software task switching problem

Posted: Fri Aug 20, 2004 9:17 pm
by Brendan
Hi,
proxy wrote: the reason you get wierd output is because you can have a thread be interrupted in the middle of that and have another thread start writing it's info there. the more threads doing this the more it's a problem.
I'd like to offer an alternative problem :)

A re-entrancy problem wouldn't explain pVideo[1] pointing to a random memory location because it's a local variable (on the stack or in a register only).

Therefore I'm going to guess that the context switch code or the IRQ handlers are trashing one or more general registers (and/or the stack).


Cheers,

Brendan

Re:Software task switching problem

Posted: Sat Aug 21, 2004 7:49 am
by petrusss2
It's solved.
The problem was that I was sending EOI in my context switcher (which I forgot to copy to the code I pasted here), which overwrote eax.
Now, I preserve eax and it works perfect, both my "MyTest" function and my "kprintf".