Software task switching problem

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
petrusss2

Software task switching problem

Post by petrusss2 »

I'm having problems with my software task switching, I have a couple of threads (5), each of them prints out a text to the screen using my kprintf function.
After a time, the computer crashes (3rd page fault errors mostly, but sometimes there are other errors, looks like the stack is incorrect). If I instead set the screen manually (0xB8000, without any functions) it works correct.
I've tested to clear the interrupt flag before I call kprintf in my threads, but that doesn't help.
The kprintf function just formats the data (%i, %s etc) and prints it directly out on the screen at 0xB8000+.
Anyone familiar with this? Or ideas of what might be wrong?

PS: My PIT interrupt stub clears the interrupt flag, so no task switching will occur while scheduling another one.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Software task switching problem

Post by Candy »

is your code thread safe, are interrupts disabled during interrupt service routine execution, can we see your code?
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Software task switching problem

Post by Pype.Clicker »

i got something similar at early multi-threading times. The problem was i had a single 'display buffer' for formatting text that had to be shown on screen.

Behind the "thread safe" or "reentrant" words hides the question "is there any resource (like the video pointer) that is in a *global* variable (not on stack) and that many threads could try to modify concurently ? If there's you should either duplicate that resource (give each thread its own buffer for preparing messages) or use a synchronizer object to prevent concurrent use
Dreamsmith

Re:Software task switching problem

Post by Dreamsmith »

Regarding global variables...

Rule 1: Don't use global variables.
Rule 2: Make sure any global variable you're using in more than one thread is declared volatile.
Rule 3: Make sure any changes that affect global variables are either atomic or protected by some kind of lock.
Rule 4: Remember that a static local variable is just a global variable in disguise. See rule #1, and after you ignore it (like the rest of us), see the following two rules.

Other thoughts...

Is the printf function in the kernel? Are the threads using it user-mode threads? Did you make sure each thread has its own stacks? Note the plural: does each one have its own priv-0 stack, in addition to its own priv-3 stack? Are these stacks large enough to hold the data declared in the printf function, taking other function calls and recursion into account, if necessary?
Dreamsmith

Re:Software task switching problem

Post by Dreamsmith »

petrusss2 wrote:...I have a couple of threads (5)...
Just one minor nit: in the English language, the word "couple" denotes the specific quantity of two (e.g. a married couple). "A couple of theads" means "two threads". Use "a few" if you mean a small number greater than two.
petrusss2

Re:Software task switching problem

Post by petrusss2 »

Well, it's nothing todo with the stacks, I've tested to disable the scheduling (by not sending an EOI) and use schedYield() instead, the threads looks like this:

Code: Select all

uint32 MyThread(uint32 _param) {
   FloodChar('*', 80);
   kprintf("\n");
   FloodChar(' ', 36);
   kprintf("Thread %i", _param);
   FloodChar(' ', 36);
   kprintf("\n");
   FloodChar('*', 80);
   kprintf("\n");

   while(1) {
      kprintf("## THREAD %i\n", _param);
      schedYield();
   }
   
   return 0xF000B000;
}
This works perfect.
And, as I've said, I've tested to call my kprintf after a "cli" so only one thread may execute it at once.

This is my context switcher:

Code: Select all

 
[global schedSwitchContext]
schedSwitchContext: 
   mov ebp, esp
   mov esp, [ebp + 4]     ; change stack
   
   add esp, 4   ; empty thingy
   pop ss
   pop gs
   pop fs
   pop es
   pop ds

   pop edi
   pop esi
   pop ebp
   add esp, 4
   pop edx
   pop ecx
   pop ebx
   pop eax
   add esp, 4      ; error code

   sti

   iret
   hlt
This has to be called inside an interrupt, because it's using the stack setup by it.

And, here's my schedYield() function:

Code: Select all

void schedYield() {
   g_iCurrentThreadTicks = 0;
   pitSetYield();
   sti;
   asm("int $0x20");
}

PS: I will never more say "a couple" of things > 2.
Dreamsmith

Re:Software task switching problem

Post by Dreamsmith »

Not sure if it's related to your problem, but why is there an "sti" before your "iret" instruction? It either is redundant and does nothing, or its an error -- it can't possibly be doing anything useful there, the iret instruction that follows it will load the flags, including I, with new values.

Rule of thumb: your code should probably only ever execute STI once, during bootup. ISRs will have the I flag set and reset as necessary by the interrupt mechanism and IRET instruction, and other bits of code that need to call CLI should precede the call with a PUSHF and restore with POPF later, not blindly assume it was set to begin with.

I don't see in this code any attempt to set/maintain priv-0 stacks. Are these threads running at priv-0? If not, that INT is going to cause a stack switch. If you're using that stack to store state information, then switching to another user task, and it also calls INT 0x20, and you haven't altered the priv-0 stack, you are going to be clobbering things.

Alas, you've left out enough code from what you've posted that there's really no way to tell what's going on...
petrusss2

Re:Software task switching problem

Post by petrusss2 »

Well, the sti in my context switcher was there because I forgot it was there (from some experiments), thanks for pointing it out.
But it didn't solve the problem.
I've been doing some debugging and testing and I came up with this:

Code: Select all

 
void MyTest(uint32 _id) {
   char *pVideo = 0xB8000;
   pVideo[0] = '1' + _id;
   pVideo[1] = PCON_COLOR(PCON_COLOR_RED,PCON_COLOR_BLUE);
}
Every thread call this in an infinitive loop (like the one in my previous post), not atomically, and that shows some weird output.
The output address (should be 0xB8000) may differ from time to time (but it's mostly at 0xB8000), even though I've set it as a constant. How could this be?
Sometimes pVideo\[0\] points to the correct position (upper-left corner) and pVideo\[1\] points to a random memory location.
I've even tested to declare pVideo as volatile, doesn't help a bit.

Should I code some kind of lock that locks variables, like a lloop that loops until the variable is released? Because mutexes will be too slow for that.

And you're talking about user stacks, well, I'm not using user privilege levels yet, so it's all running in PL 0.
proxy

Re:Software task switching problem

Post by proxy »

the reason you get wierd output is because you can have a thread be interrupted in the middle of that and have another thread start writing it's info there. the more threads doing this the more it's a problem.

you can end up with partial writes to the location (espeically on SMP systems, you can litterally have 2 threads write at the same exact time).

you need to protect this with some sort of mutex.

most likely (just for experimentation) if you disabled interrupts in my test did the work and re-enabled them at the end it woudl work ok.

proxy
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Software task switching problem

Post by Brendan »

Hi,
proxy wrote: the reason you get wierd output is because you can have a thread be interrupted in the middle of that and have another thread start writing it's info there. the more threads doing this the more it's a problem.
I'd like to offer an alternative problem :)

A re-entrancy problem wouldn't explain pVideo[1] pointing to a random memory location because it's a local variable (on the stack or in a register only).

Therefore I'm going to guess that the context switch code or the IRQ handlers are trashing one or more general registers (and/or the stack).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
petrusss2

Re:Software task switching problem

Post by petrusss2 »

It's solved.
The problem was that I was sending EOI in my context switcher (which I forgot to copy to the code I pasted here), which overwrote eax.
Now, I preserve eax and it works perfect, both my "MyTest" function and my "kprintf".
Post Reply