Software task switching problem
Software task switching problem
I'm having problems with my software task switching, I have a couple of threads (5), each of them prints out a text to the screen using my kprintf function.
After a time, the computer crashes (3rd page fault errors mostly, but sometimes there are other errors, looks like the stack is incorrect). If I instead set the screen manually (0xB8000, without any functions) it works correct.
I've tested to clear the interrupt flag before I call kprintf in my threads, but that doesn't help.
The kprintf function just formats the data (%i, %s etc) and prints it directly out on the screen at 0xB8000+.
Anyone familiar with this? Or ideas of what might be wrong?
PS: My PIT interrupt stub clears the interrupt flag, so no task switching will occur while scheduling another one.
After a time, the computer crashes (3rd page fault errors mostly, but sometimes there are other errors, looks like the stack is incorrect). If I instead set the screen manually (0xB8000, without any functions) it works correct.
I've tested to clear the interrupt flag before I call kprintf in my threads, but that doesn't help.
The kprintf function just formats the data (%i, %s etc) and prints it directly out on the screen at 0xB8000+.
Anyone familiar with this? Or ideas of what might be wrong?
PS: My PIT interrupt stub clears the interrupt flag, so no task switching will occur while scheduling another one.
Re:Software task switching problem
is your code thread safe, are interrupts disabled during interrupt service routine execution, can we see your code?
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Software task switching problem
i got something similar at early multi-threading times. The problem was i had a single 'display buffer' for formatting text that had to be shown on screen.
Behind the "thread safe" or "reentrant" words hides the question "is there any resource (like the video pointer) that is in a *global* variable (not on stack) and that many threads could try to modify concurently ? If there's you should either duplicate that resource (give each thread its own buffer for preparing messages) or use a synchronizer object to prevent concurrent use
Behind the "thread safe" or "reentrant" words hides the question "is there any resource (like the video pointer) that is in a *global* variable (not on stack) and that many threads could try to modify concurently ? If there's you should either duplicate that resource (give each thread its own buffer for preparing messages) or use a synchronizer object to prevent concurrent use
Re:Software task switching problem
Regarding global variables...
Rule 1: Don't use global variables.
Rule 2: Make sure any global variable you're using in more than one thread is declared volatile.
Rule 3: Make sure any changes that affect global variables are either atomic or protected by some kind of lock.
Rule 4: Remember that a static local variable is just a global variable in disguise. See rule #1, and after you ignore it (like the rest of us), see the following two rules.
Other thoughts...
Is the printf function in the kernel? Are the threads using it user-mode threads? Did you make sure each thread has its own stacks? Note the plural: does each one have its own priv-0 stack, in addition to its own priv-3 stack? Are these stacks large enough to hold the data declared in the printf function, taking other function calls and recursion into account, if necessary?
Rule 1: Don't use global variables.
Rule 2: Make sure any global variable you're using in more than one thread is declared volatile.
Rule 3: Make sure any changes that affect global variables are either atomic or protected by some kind of lock.
Rule 4: Remember that a static local variable is just a global variable in disguise. See rule #1, and after you ignore it (like the rest of us), see the following two rules.
Other thoughts...
Is the printf function in the kernel? Are the threads using it user-mode threads? Did you make sure each thread has its own stacks? Note the plural: does each one have its own priv-0 stack, in addition to its own priv-3 stack? Are these stacks large enough to hold the data declared in the printf function, taking other function calls and recursion into account, if necessary?
Re:Software task switching problem
Just one minor nit: in the English language, the word "couple" denotes the specific quantity of two (e.g. a married couple). "A couple of theads" means "two threads". Use "a few" if you mean a small number greater than two.petrusss2 wrote:...I have a couple of threads (5)...
Re:Software task switching problem
Well, it's nothing todo with the stacks, I've tested to disable the scheduling (by not sending an EOI) and use schedYield() instead, the threads looks like this:
This works perfect.
And, as I've said, I've tested to call my kprintf after a "cli" so only one thread may execute it at once.
This is my context switcher:
This has to be called inside an interrupt, because it's using the stack setup by it.
And, here's my schedYield() function:
PS: I will never more say "a couple" of things > 2.
Code: Select all
uint32 MyThread(uint32 _param) {
FloodChar('*', 80);
kprintf("\n");
FloodChar(' ', 36);
kprintf("Thread %i", _param);
FloodChar(' ', 36);
kprintf("\n");
FloodChar('*', 80);
kprintf("\n");
while(1) {
kprintf("## THREAD %i\n", _param);
schedYield();
}
return 0xF000B000;
}
And, as I've said, I've tested to call my kprintf after a "cli" so only one thread may execute it at once.
This is my context switcher:
Code: Select all
[global schedSwitchContext]
schedSwitchContext:
mov ebp, esp
mov esp, [ebp + 4] ; change stack
add esp, 4 ; empty thingy
pop ss
pop gs
pop fs
pop es
pop ds
pop edi
pop esi
pop ebp
add esp, 4
pop edx
pop ecx
pop ebx
pop eax
add esp, 4 ; error code
sti
iret
hlt
And, here's my schedYield() function:
Code: Select all
void schedYield() {
g_iCurrentThreadTicks = 0;
pitSetYield();
sti;
asm("int $0x20");
}
PS: I will never more say "a couple" of things > 2.
Re:Software task switching problem
Not sure if it's related to your problem, but why is there an "sti" before your "iret" instruction? It either is redundant and does nothing, or its an error -- it can't possibly be doing anything useful there, the iret instruction that follows it will load the flags, including I, with new values.
Rule of thumb: your code should probably only ever execute STI once, during bootup. ISRs will have the I flag set and reset as necessary by the interrupt mechanism and IRET instruction, and other bits of code that need to call CLI should precede the call with a PUSHF and restore with POPF later, not blindly assume it was set to begin with.
I don't see in this code any attempt to set/maintain priv-0 stacks. Are these threads running at priv-0? If not, that INT is going to cause a stack switch. If you're using that stack to store state information, then switching to another user task, and it also calls INT 0x20, and you haven't altered the priv-0 stack, you are going to be clobbering things.
Alas, you've left out enough code from what you've posted that there's really no way to tell what's going on...
Rule of thumb: your code should probably only ever execute STI once, during bootup. ISRs will have the I flag set and reset as necessary by the interrupt mechanism and IRET instruction, and other bits of code that need to call CLI should precede the call with a PUSHF and restore with POPF later, not blindly assume it was set to begin with.
I don't see in this code any attempt to set/maintain priv-0 stacks. Are these threads running at priv-0? If not, that INT is going to cause a stack switch. If you're using that stack to store state information, then switching to another user task, and it also calls INT 0x20, and you haven't altered the priv-0 stack, you are going to be clobbering things.
Alas, you've left out enough code from what you've posted that there's really no way to tell what's going on...
Re:Software task switching problem
Well, the sti in my context switcher was there because I forgot it was there (from some experiments), thanks for pointing it out.
But it didn't solve the problem.
I've been doing some debugging and testing and I came up with this:
Every thread call this in an infinitive loop (like the one in my previous post), not atomically, and that shows some weird output.
The output address (should be 0xB8000) may differ from time to time (but it's mostly at 0xB8000), even though I've set it as a constant. How could this be?
Sometimes pVideo\[0\] points to the correct position (upper-left corner) and pVideo\[1\] points to a random memory location.
I've even tested to declare pVideo as volatile, doesn't help a bit.
Should I code some kind of lock that locks variables, like a lloop that loops until the variable is released? Because mutexes will be too slow for that.
And you're talking about user stacks, well, I'm not using user privilege levels yet, so it's all running in PL 0.
But it didn't solve the problem.
I've been doing some debugging and testing and I came up with this:
Code: Select all
void MyTest(uint32 _id) {
char *pVideo = 0xB8000;
pVideo[0] = '1' + _id;
pVideo[1] = PCON_COLOR(PCON_COLOR_RED,PCON_COLOR_BLUE);
}
The output address (should be 0xB8000) may differ from time to time (but it's mostly at 0xB8000), even though I've set it as a constant. How could this be?
Sometimes pVideo\[0\] points to the correct position (upper-left corner) and pVideo\[1\] points to a random memory location.
I've even tested to declare pVideo as volatile, doesn't help a bit.
Should I code some kind of lock that locks variables, like a lloop that loops until the variable is released? Because mutexes will be too slow for that.
And you're talking about user stacks, well, I'm not using user privilege levels yet, so it's all running in PL 0.
Re:Software task switching problem
the reason you get wierd output is because you can have a thread be interrupted in the middle of that and have another thread start writing it's info there. the more threads doing this the more it's a problem.
you can end up with partial writes to the location (espeically on SMP systems, you can litterally have 2 threads write at the same exact time).
you need to protect this with some sort of mutex.
most likely (just for experimentation) if you disabled interrupts in my test did the work and re-enabled them at the end it woudl work ok.
proxy
you can end up with partial writes to the location (espeically on SMP systems, you can litterally have 2 threads write at the same exact time).
you need to protect this with some sort of mutex.
most likely (just for experimentation) if you disabled interrupts in my test did the work and re-enabled them at the end it woudl work ok.
proxy
Re:Software task switching problem
Hi,
A re-entrancy problem wouldn't explain pVideo[1] pointing to a random memory location because it's a local variable (on the stack or in a register only).
Therefore I'm going to guess that the context switch code or the IRQ handlers are trashing one or more general registers (and/or the stack).
Cheers,
Brendan
I'd like to offer an alternative problemproxy wrote: the reason you get wierd output is because you can have a thread be interrupted in the middle of that and have another thread start writing it's info there. the more threads doing this the more it's a problem.
A re-entrancy problem wouldn't explain pVideo[1] pointing to a random memory location because it's a local variable (on the stack or in a register only).
Therefore I'm going to guess that the context switch code or the IRQ handlers are trashing one or more general registers (and/or the stack).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Software task switching problem
It's solved.
The problem was that I was sending EOI in my context switcher (which I forgot to copy to the code I pasted here), which overwrote eax.
Now, I preserve eax and it works perfect, both my "MyTest" function and my "kprintf".
The problem was that I was sending EOI in my context switcher (which I forgot to copy to the code I pasted here), which overwrote eax.
Now, I preserve eax and it works perfect, both my "MyTest" function and my "kprintf".