Page 1 of 2
Scheduler and semaphores
Posted: Sun Jul 24, 2005 9:13 am
by FlashBurn
I changed my kernel so that it uses the APIC Timer. My problem is now that my scheduler doesn?t work anymore when I have more than 1 thread
- There where problems before on one of my real PCs -
I think that it has something to do with my semaphore and mutex functions. Because I set the APIC Timer to a 100Hz and tested it in Bochs and there I get a page fault when the scheduler executes.
My mutex code:
Code: Select all
;----------------------------
PROC mutex_acquire, mut
;----------------------------
BEGIN
mov eax,1
mov esi,[mut]
;----------------------------
align 4
.loop:
test dword[esi],1
jne .loop
xchg [esi],eax
test eax,eax
jnz .loop
;----------------------------
.end:
cli
RETURN
ENDP
;----------------------------
;----------------------------
PROC mutex_acquire_noint, mut
;----------------------------
BEGIN
mov eax,1
mov esi,[mut]
;----------------------------
align 4
.loop:
test dword[esi],1
jne .loop
xchg [esi],eax
test eax,eax
jnz .loop
;----------------------------
.end:
RETURN
ENDP
;----------------------------
;----------------------------
PROC mutex_release, mut
;----------------------------
BEGIN
mov esi,[mut]
xor eax,eax
xchg [esi],eax
sti
RETURN
ENDP
;----------------------------
;----------------------------
PROC mutex_release_noint, mut
;----------------------------
BEGIN
mov esi,[mut]
xor eax,eax
xchg [esi],eax
RETURN
ENDP
;----------------------------
[see next post]
Re:Scheduler and semaphores
Posted: Sun Jul 24, 2005 9:13 am
by FlashBurn
[next piece of prev post]
My semaphore code:
Code: Select all
;----------------------------
PROC semaphore_acquire, sem
VARS this_cpu
;----------------------------
BEGIN
APIC_GET_ID eax
mov ebx,[cpu_ptr+4*eax]
mov [this_cpu],ebx
;----------------------------
; make sure we are the only one to work on the semaphore
CALL mutex_acquire, dword[sem]
sub dword[esi+semaphore_t.count],1
;----------------------------
; look if we have to wait or we can go right away
cmp dword[esi+semaphore_t.count],0
jge .end
;----------------------------
; we have to wait
mov ebx,[this_cpu]
mov edi,[esi+semaphore_t.end]
mov eax,[ebx+cpu_t.schdl_act_thread]
test edi,edi
jz .first
xor ebx,ebx
mov [edi+thread_t.next],eax
mov [eax+thread_t.prev],edi
mov [eax+thread_t.next],ebx
mov [esi+semaphore_t.end],eax
jmp .scheduler
;----------------------------
; we are the first thread
align 4
.first:
mov [esi+semaphore_t.start],eax
mov [esi+semaphore_t.end],eax
mov [eax+thread_t.prev],edi
mov [eax+thread_t.next],edi
;----------------------------
; scheduler have to know that this thread wants to wait
.scheduler:
mov ebx,[this_cpu]
or dword[eax+thread_t.flags],thread_wait
or dword[ebx+cpu_t.scheduler_flags],scheduler_reschedule
CALL mutex_release, dword[sem]
hlt
; int INT_SCHEDULER
.end_wait:
RETURN
;----------------------------
align 4
.end:
CALL mutex_release, dword[sem]
RETURN
ENDP
;----------------------------
;----------------------------
PROC semaphore_release, sem
;----------------------------
BEGIN
;----------------------------
; make sure we are the only one to work on the semaphore
CALL mutex_acquire, dword[sem]
add dword[esi+semaphore_t.count],1
;----------------------------
; look if we need to awake a thread
cmp dword[esi+semaphore_t.count],0
jg .end
;----------------------------
; we have to awake the thread on the top of the queue
mov eax,[esi+semaphore_t.start]
mov ebx,[eax+thread_t.next]
xor ecx,ecx
test ebx,ebx
jz .last
mov [ebx+thread_t.prev],ecx
mov [esi+semaphore_t.start],ebx
mov [eax+thread_t.prev],ecx
mov [eax+thread_t.next],ecx
jmp .scheduler
;----------------------------
; there is only one thread on the queue
align 4
.last:
mov [esi+semaphore_t.start],ebx
mov [esi+semaphore_t.end],ebx
mov [eax+thread_t.prev],ebx
mov [eax+thread_t.next],ebx
;----------------------------
; scheduler needs to awaken the thread in eax
.scheduler:
and dword[eax+thread_t.flags],not thread_wait
push eax
CALL mutex_release, dword[sem]
CALL scheduler_add_scheduler ;par is in pushed eax
; int INT_SCHEDULER
;----------------------------
.end_awaken:
RETURN
;----------------------------
align 4
.end:
CALL mutex_release, dword[sem]
RETURN
ENDP
;----------------------------
And now the function that could do much harm to the system:
Code: Select all
;----------------------------
PROC scheduler_add_scheduler, ptr2thread
;----------------------------
BEGIN
CALL semaphore_acquire, sem_schdl_queue
cli
CALL mutex_acquire_noint, schdl_mutex
;----------------------------
; add thread to ready queue
mov esi,[ptr2thread]
mov ebx,1
mov ecx,[esi+thread_t.dyn_prio]
mov edi,[ready_queue_ptr]
shl ebx,cl
or [ready_queue_bitmap],ebx
mov eax,[edi+4*ecx]
test eax,eax
jz .first
mov ebx,[eax+thread_t.prev]
mov [esi+thread_t.next],eax
mov [esi+thread_t.prev],ebx
mov [eax+thread_t.prev],esi
mov [ebx+thread_t.next],esi
jmp .end
;----------------------------
; it is the 1st thread in this priority queue
align 4
.first:
mov [esi+thread_t.prev],esi
mov [esi+thread_t.next],esi
mov [edi+4*ecx],esi
;----------------------------
align 4
.end:
APIC_GET_ID eax
mov edx,[cpu_ptr+4*eax]
or dword[edx+cpu_t.scheduler_flags],scheduler_reschedule
CALL mutex_release_noint, schdl_mutex
sti
CALL semaphore_release, sem_schdl_queue
RETURN
ENDP
;----------------------------
You may have a look at the functions and can tell me if there is any logical fault in there or if you see anything that could not work the way it should!
Edit::
There are commented int calls, if I uncomment them the scheduler will work for some time, but then not only the threads are not running but also the scheduler isn?t!
Re:Scheduler and semaphores
Posted: Sun Jul 24, 2005 12:19 pm
by Brendan
Hi,
Both of your semaphore procedures use the scheduler's procedures, and the scheduler's procedures use both of your semaphore procedures - it's asking for trouble. For "scheduler_add_scheduler" (and the assumed "scheduler_remove_scheduler") you should avoid using semaphores and your problem will probably disappear.
Also, in general any critical section that is faster than a thread switch should be protected by a mutex and not semaphores (for performance reasons).
I also think it's a good idea to do deadlock checking within the "mutex_acquire" code. For e.g. "if mutex is already locked by this CPU then PANIC". This is one of the reasons I prefer using "lock bts [mutex],31" rather than "xchg [esi],eax" - there's 31 bits left over for sanity checks. For example:
Code: Select all
PROC mutex_acquire, mut
;----------------------------
BEGIN
mov esi,[mut]
%ifdef CHECK_LOCK_CORRUPTION
test dword [esi],0x7FFF0000
je .loop
call kernelPanic, LOCK_CORRUPT
%endif
%ifdef CHECK_DEADLOCKS
.loop:
test dword [esi],0x80000000
je .try
%ifdef MULTI-CPU
mov ax,THIS_CPU_NUMBER
cmp [esi],ax
jne .loop
%endif
call kernelPanic, DEADLOCK
%else
.loop:
test dword [esi],0x80000000
jne .loop
%endif
try:
lock bts dword[esi],31
jc .loop
;----------------------------
RETURN
ENDP
That way, during development you can have automatic error detection and you can easily remove all the checks for the final release (once you know it's right).
Cheers,
Brendan
Re:Scheduler and semaphores
Posted: Sun Jul 24, 2005 2:19 pm
by FlashBurn
You are not full right
Now the threads run for ca. 1 sec in Bochs, but not on the real PC. I also think that the problem with Bochs is a failure of Bochs, because if I write some debug code which outputs some data before the test of the mutex then it works ??? If I don?t have these debug code the code goes in an endless loop -> 2 threads are waiting (seems so) and the running thread tries to get the mutex, but the mutex is in use ???
But on my real PC it looks like there is a problem when it switches the thread. And because I have no error in Bochs I don?t know from where this comes - maybe it is the same as above, but occures when it switches the threads.
Edit::
I test the code further and there is no problem with the scheduler, it works - in Bochs -. There is only a problem when it comes to using semaphore and mutex!
Re:Scheduler and semaphores
Posted: Sun Jul 24, 2005 3:04 pm
by FlashBurn
Ok, it is in any case the semaphore code which produces the errors >:(
So maybe I should overthink my code:
Code: Select all
semaphore_acquire:
acquire the mutex for the semaphore ;the mutex disables ints
sub [count of semaphore],1
if count <= 0 then
put the thread onto the waiting queue of the semaphore
set the waiting flag in the thread flags ;the scheduler wont put the thread into the ready queue
set the reschedule flag of the scheduler
end if
release mutex ;which enable ints
return
semaphore_release:
acquire the mutex for the semaphore ;the mutex disables ints
add [count of semaphore],1
if count > 0 then
get the thread on top of the waiting queue of the semaphore
call scheduler_add_scheduler with ptr2thread
set the reschdule flag of the scheduler
end if
release mutex ;which enable ints
return
If the above is right, my code should be also right, but it doesn?t work :'(
Re:Scheduler and semaphores
Posted: Sun Jul 24, 2005 5:18 pm
by Brendan
Hi,
One problem is that your "semaphore_acquire" procedure adds the current thread to the semaphore's queue and then waits for the scheduler to restart the thread without releasing the mutex (which prevents the semaphore from being released).
You'd want something like:
Code: Select all
semaphore_acquire:
acquire the mutex for the semaphore
sub [count of semaphore],1
if count <= 0 then
put the thread onto the waiting queue of the semaphore
release the mutex for the semaphore
tell the scheduler to stop this thread from running
return
end if
release the mutex for the semaphore
return
Unfortunately this leaves another problem - another CPU could try to restart the blocked thread after it releases the mutex but before the scheduler has stopped running it.
The scheduler's code to block the thread and find another thread to run would need to use a mutex too, so one way around the problem would be to protect things with the mutex the scheduler uses. For example:
Code: Select all
semaphore_acquire:
acquire the mutex for the semaphore
sub [count of semaphore],1
if count <= 0 then
put the thread onto the waiting queue of the semaphore
acquire the scheduler's mutex (released by the scheduler)
release the mutex for the semaphore
tell the scheduler to stop this thread from running
return
end if
release the mutex for the semaphore
return
In this case, the scheduler's code to stop the thread from running needs to release the scheduler's mutex
but must not lock it first. This sounds ugly, but I can't think of a way to avoid it (except CLI on single-CPU computers).
Cheers,
Brendan
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 1:53 am
by FlashBurn
Ok, I have to explain the working of this more! As long as the actual thread has the mutex it wont be interrupted because ints are disabled. Then I put it onto the semaphores queue and set the waiting flag for the thread and the reschedule flag for the scheduler. Then I release the mutex which enables the ints and I also use a "hlt" for beeing sure that the scheduler will be called before the thread gets out of the semaphore. The scheduler then checks if it has to reschedule (yes) and when it wants to put the thread back onto the ready queue it checks the flags of the thread and sees that it wants to wait, in future the scheduler will do something with the priority, but at the moment the only thing that is done is that the scheduler makes nothing with the thread!
With this algo every thread should print its number and then another thread should be running and the same starts again. The problem is now that this works for a ca. 1 sec, but then a thread is running 2 times and most times another thread will run and then I seem to have a deadlock!
What I can?t understand is how this can happen!?
Edit::
I had the time to test the actual changes on a real PC and it didn?t worked >:( I?m using now a mutex not a semaphore for my putchar function and this works perfectly in bochs, but on a real pc it only makes the thread switch from idle to thread 1 to thread 2 and nothing! And this time it can?t be a deadlock, because the ints are disabled during having the lock!
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 4:58 am
by distantvoices
first I suppose you disable all hw interrupts. and maybe add some preempting stuff at the keyboard interrupt so you can trigger preemption at will and nothing else disturbs you.
second, have you declared all your mutexes
volatile?
then - have you placed already some crucial output at different stages of the scheduling/mutexing stuff - so you know where which task is at any given time?
welcome, my friend, to debug sessions darker than night. May the schwartz be with you.
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 5:17 am
by FlashBurn
1st I have to say that I do all my programming stuff in assembly
so I don?t have such things such volatile. And the point which I don?t understand is that it works as long as I output something to the Bochs special port (0xe9) - it works in Bochs - ! But I will try the method of using the keyboard interrupt. Also it couldn?t be that it doesn?t work if I use "cli" and "sti" for my putchar function, because then when it returns from the function the scheduler will be called and a thread switch will occure and the scheduler has to work! But there is one more thing I should check (IDT).
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 5:40 am
by Candy
Default guess is that you have a race condition. That is, you have two "threads" that are trying to do the exact same at the exact same moment, and one of them has the lock and the other is spinning on it. You make it less likely by introducing a manual slowdown (say, output to a port?) outside the mutexed code, and you make it more likely by introducing a manual slowdown inside the mutexed code.
Check mainly for mutexed sections of code that can be interrupted. If you don't have any, godspeed.
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 7:26 am
by Brendan
Hi,
FlashBurn wrote:Ok, I have to explain the working of this more! As long as the actual thread has the mutex it wont be interrupted because ints are disabled. Then I put it onto the semaphores queue and set the waiting flag for the thread and the reschedule flag for the scheduler. Then I release the mutex which enables the ints and I also use a "hlt" for beeing sure that the scheduler will be called before the thread gets out of the semaphore.
There's no way you can guarantee that the first IRQ received will be the scheduler's IRQ - it could be a keyboard IRQ or anything else. While your OS is small this can be avoided by masking in the PIC, but that won't work when you start using the other IRQs (masking and unmasking the other IRQs would cause them to be lost, which would caould device driver problems). It's something that will need to be fixed sooner or later. It's also slow - how much time will your threads spend waiting for the scheduler's IRQ?
It's also easy enough to fix. For example:
Code: Select all
IRQ0handler:
call sendEOI
call reschedule
iretd
reschedule:
* find next thread to switch to *
call goto_thread
ret
goto_thread:
* switch to the selected thread *
ret
I'd also recommend using a single procedure for stopping the currently running thread:
Code: Select all
PROC stop_thread, flags
BEGIN
mov eax,[flags]
cli
mov ebx,[address_of_current_thread's_data]
or dword[ebx+thread_t.flags],eax
sti
call reschedule
RETURN
ENDP
Instead of doing the "HLT" and wait thing, you'd just do "call stop_thread, thread_wait". Later on you'd use the same thing with other flags (message_wait, sleep_wait, etc). I'd also have a procedure for starting a thread, for e.g.:
Code: Select all
PROC start_thread, threadAddress, flags
BEGIN
mov eax,[flags]
mov ebx,[threadAddress]
xor eax,0xFFFFFFFF
and dword[ebx+thread_t.flags],eax
cmp dword[ebx+thread_t.flags],0
jne .notReady
call reschedule
.notReady:
RETURN
ENDP
This can be improved later with "if thread became ready and thread has higher priority than current thread then call goto_thread", which could save the scheduler the overhead of finding the next thread to run.
BTW you are currently masking/disabling all other IRQs in the PIC aren't you? It could explain some of your symptoms if you currently don't..
Cheers,
Brendan
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 3:32 pm
by FlashBurn
Yes I?me masking all irqs! I found 1 problem, the spurious interrupt and I?ve rewritten and change the queue handling code of my scheduler, now it works under Bochs - but not on real PC, it still doesn?t get the thread switching working - And I found that I should not send an EOI when I have a spurious interrupt so I will change the got for it and test it then on a real PC!
Would it be a good idea to use the PIT for inc a timer variable and then I could use the APIC Timer in one shot mode and every time a reschedule is need I would do an "int INT_SCHEDULER"?
Could a write to a not existing port be the reason for my problem on my real PC? Because I just saw that I?ve forgotten to comment the line where I output a char to the Bochs port for debugging!
Re:Scheduler and semaphores
Posted: Mon Jul 25, 2005 7:16 pm
by Brendan
Hi,
FlashBurn wrote:Would it be a good idea to use the PIT for inc a timer variable and then I could use the APIC Timer in one shot mode and every time a reschedule is need I would do an "int INT_SCHEDULER"?
Yes, mostly.
For the time variable, your OS may want to use the PIT if there is no local APIC. Also the PIT isn't connected to the IO APIC on some older multi-CPU motherboards (not sure if that matters for your OS). Both of these reasons are why I use the RTC/CMOS periodic IRQ (IRQ 8 ) for the time variable instead of the PIT.
As the local APIC timer IRQ handler is already running within the kernel you'd be able to use "call INT_SCHEDULER" (or "call reschedule") instead of a software interrupt (it's faster, and code running outside the kernel normally wouldn't need to use this code). You could also call the reschedule code from within your semaphore code to get rid of that HLT (and the IRQ masking/unmasking stuff).
FlashBurn wrote:Could a write to a not existing port be the reason for my problem on my real PC? Because I just saw that I?ve forgotten to comment the line where I output a char to the Bochs port for debugging!
A write to a non-existing IO port shouldn't matter, but if there's a device using the IO port it could be a problem. I think IO port 0xE9 is fairly safe though (I've never had any problems leaving Bochs debugging code in on real computers, but every real computer is different).
Cheers,
Brendan
Re:Scheduler and semaphores
Posted: Thu Jul 28, 2005 3:17 pm
by FlashBurn
It will be a while till I will post about my problems on this topic. I just get my internet back after a hdd crash >:( I lost some of my source files and the backup was from the 22th
- I should backup more often - I have to reprogram the lost files! And this could take some time.
Edit::
This was going faster than I thought
So now Bochs gives me - for the first time and I don?t know why just now ??? - the information that I?ve sent an EOI although there was no bit set in the ISR. Could this be a problem on a real PC? The problem to solve is that I have to call the scheduler if I do it with a call or an int its equal, but he will always sent an EOI!
Edit::
I?ve seen that it is impossible to call the scheduler with the call instruction because he does an iret the end and so I need to do an int!
Re:Scheduler and semaphores
Posted: Thu Jul 28, 2005 8:13 pm
by Brendan
Hi,
FlashBurn wrote:This was going faster than I thought
So now Bochs gives me - for the first time and I don?t know why just now ??? - the information that I?ve sent an EOI although there was no bit set in the ISR. Could this be a problem on a real PC? The problem to solve is that I have to call the scheduler if I do it with a call or an int its equal, but he will always sent an EOI!
Edit::
I?ve seen that it is impossible to call the scheduler with the call instruction because he does an iret the end and so I need to do an int!
You could do something like:
Code: Select all
local_APIC_timer_IRQ_handler:
call send_EOI
call reschedule
iretd
reschedule:
...find a thread and switch to it...
ret
Depending on your OS (and what it's doing), it's likely that most thread switches will be caused by threads needing to wait for something, rather than their CPU time running out. This implies that the local APIC timer IRQ isn't used as often as you'd think. For example, in my OS most threads stop running because they need to wait for a message to arrive.
BTW sorry to hear about your hard drive. It's a reminder for the rest of us - I haven't done a backup for 6 months, and should setup cron properly
...
Cheers,
Brendan