Page 1 of 2

Scheduler and semaphores

Posted: Sun Jul 24, 2005 9:13 am
by FlashBurn
I changed my kernel so that it uses the APIC Timer. My problem is now that my scheduler doesn?t work anymore when I have more than 1 thread :( - There where problems before on one of my real PCs -

I think that it has something to do with my semaphore and mutex functions. Because I set the APIC Timer to a 100Hz and tested it in Bochs and there I get a page fault when the scheduler executes.

My mutex code:

Code: Select all

;----------------------------
PROC mutex_acquire, mut
;----------------------------
BEGIN
   mov eax,1
   mov esi,[mut]
;----------------------------
align 4
.loop:
   test dword[esi],1
   jne .loop

   xchg [esi],eax

   test eax,eax
   jnz .loop
;----------------------------
.end:
   cli

   RETURN
ENDP
;----------------------------

;----------------------------
PROC mutex_acquire_noint, mut
;----------------------------
BEGIN
   mov eax,1
   mov esi,[mut]
;----------------------------
align 4
.loop:
   test dword[esi],1
   jne .loop

   xchg [esi],eax

   test eax,eax
   jnz .loop
;----------------------------
.end:
   RETURN
ENDP
;----------------------------

;----------------------------
PROC mutex_release, mut
;----------------------------
BEGIN
   mov esi,[mut]
   xor eax,eax

   xchg [esi],eax

   sti

   RETURN
ENDP
;----------------------------

;----------------------------
PROC mutex_release_noint, mut
;----------------------------
BEGIN
   mov esi,[mut]
   xor eax,eax

   xchg [esi],eax

   RETURN
ENDP
;----------------------------
[see next post]

Re:Scheduler and semaphores

Posted: Sun Jul 24, 2005 9:13 am
by FlashBurn
[next piece of prev post]

My semaphore code:

Code: Select all

;----------------------------
PROC semaphore_acquire, sem
VARS this_cpu
;----------------------------
BEGIN
   APIC_GET_ID eax
   mov ebx,[cpu_ptr+4*eax]
   mov [this_cpu],ebx
;----------------------------
;   make sure we are the only one to work on the semaphore
   CALL mutex_acquire, dword[sem]

   sub dword[esi+semaphore_t.count],1
;----------------------------
;   look if we have to wait or we can go right away
   cmp dword[esi+semaphore_t.count],0
   jge .end
;----------------------------
;   we have to wait
   mov ebx,[this_cpu]
   mov edi,[esi+semaphore_t.end]
   mov eax,[ebx+cpu_t.schdl_act_thread]

   test edi,edi
   jz .first

   xor ebx,ebx
   mov [edi+thread_t.next],eax
   mov [eax+thread_t.prev],edi
   mov [eax+thread_t.next],ebx

   mov [esi+semaphore_t.end],eax

   jmp .scheduler
;----------------------------
;   we are the first thread
align 4
.first:
   mov [esi+semaphore_t.start],eax
   mov [esi+semaphore_t.end],eax

   mov [eax+thread_t.prev],edi
   mov [eax+thread_t.next],edi
;----------------------------
;   scheduler have to know that this thread wants to wait
.scheduler:
   mov ebx,[this_cpu]
   or dword[eax+thread_t.flags],thread_wait
   or dword[ebx+cpu_t.scheduler_flags],scheduler_reschedule

   CALL mutex_release, dword[sem]

   hlt
;   int INT_SCHEDULER

.end_wait:
   RETURN
;----------------------------
align 4
.end:
   CALL mutex_release, dword[sem]

   RETURN
ENDP
;----------------------------

;----------------------------
PROC semaphore_release, sem
;----------------------------
BEGIN
;----------------------------
;   make sure we are the only one to work on the semaphore
   CALL mutex_acquire, dword[sem]

   add dword[esi+semaphore_t.count],1
;----------------------------
;   look if we need to awake a thread
   cmp dword[esi+semaphore_t.count],0
   jg .end
;----------------------------
;   we have to awake the thread on the top of the queue
   mov eax,[esi+semaphore_t.start]
   mov ebx,[eax+thread_t.next]
   xor ecx,ecx

   test ebx,ebx
   jz .last

   mov [ebx+thread_t.prev],ecx
   mov [esi+semaphore_t.start],ebx
   mov [eax+thread_t.prev],ecx
   mov [eax+thread_t.next],ecx

   jmp .scheduler
;----------------------------
;   there is only one thread on the queue
align 4
.last:
   mov [esi+semaphore_t.start],ebx
   mov [esi+semaphore_t.end],ebx
   mov [eax+thread_t.prev],ebx
   mov [eax+thread_t.next],ebx
;----------------------------
;   scheduler needs to awaken the thread in eax
.scheduler:
   and dword[eax+thread_t.flags],not thread_wait

   push eax

   CALL mutex_release, dword[sem]

   CALL scheduler_add_scheduler         ;par is in pushed eax
   
;   int INT_SCHEDULER
;----------------------------
.end_awaken:
   RETURN
;----------------------------
align 4
.end:
   CALL mutex_release, dword[sem]

   RETURN
ENDP
;----------------------------
And now the function that could do much harm to the system:

Code: Select all

;----------------------------
PROC scheduler_add_scheduler, ptr2thread
;----------------------------
BEGIN
   CALL semaphore_acquire, sem_schdl_queue
   
   cli
   
   CALL mutex_acquire_noint, schdl_mutex
;----------------------------
;   add thread to ready queue
   mov esi,[ptr2thread]
   mov ebx,1
   mov ecx,[esi+thread_t.dyn_prio]
   mov edi,[ready_queue_ptr]
   shl ebx,cl

   or [ready_queue_bitmap],ebx

   mov eax,[edi+4*ecx]

   test eax,eax
   jz .first

   mov ebx,[eax+thread_t.prev]
   mov [esi+thread_t.next],eax
   mov [esi+thread_t.prev],ebx
   mov [eax+thread_t.prev],esi
   mov [ebx+thread_t.next],esi

   jmp .end
;----------------------------
;   it is the 1st thread in this priority queue
align 4
.first:
   mov [esi+thread_t.prev],esi
   mov [esi+thread_t.next],esi
   mov [edi+4*ecx],esi
;----------------------------
align 4
.end:
   APIC_GET_ID eax
   mov edx,[cpu_ptr+4*eax]
   or dword[edx+cpu_t.scheduler_flags],scheduler_reschedule

   CALL mutex_release_noint, schdl_mutex
   
   sti
   
   CALL semaphore_release, sem_schdl_queue

   RETURN
ENDP
;----------------------------
You may have a look at the functions and can tell me if there is any logical fault in there or if you see anything that could not work the way it should!

Edit::

There are commented int calls, if I uncomment them the scheduler will work for some time, but then not only the threads are not running but also the scheduler isn?t!

Re:Scheduler and semaphores

Posted: Sun Jul 24, 2005 12:19 pm
by Brendan
Hi,

Both of your semaphore procedures use the scheduler's procedures, and the scheduler's procedures use both of your semaphore procedures - it's asking for trouble. For "scheduler_add_scheduler" (and the assumed "scheduler_remove_scheduler") you should avoid using semaphores and your problem will probably disappear.

Also, in general any critical section that is faster than a thread switch should be protected by a mutex and not semaphores (for performance reasons).

I also think it's a good idea to do deadlock checking within the "mutex_acquire" code. For e.g. "if mutex is already locked by this CPU then PANIC". This is one of the reasons I prefer using "lock bts [mutex],31" rather than "xchg [esi],eax" - there's 31 bits left over for sanity checks. For example:

Code: Select all

PROC mutex_acquire, mut
;----------------------------
BEGIN
   mov esi,[mut]

%ifdef CHECK_LOCK_CORRUPTION
   test dword [esi],0x7FFF0000
   je .loop
   call kernelPanic, LOCK_CORRUPT
%endif

%ifdef CHECK_DEADLOCKS
.loop:
   test dword [esi],0x80000000
   je .try
 %ifdef MULTI-CPU
   mov ax,THIS_CPU_NUMBER
   cmp [esi],ax
   jne .loop
 %endif
   call kernelPanic, DEADLOCK
%else
.loop:
   test dword [esi],0x80000000
   jne .loop
%endif

try:
   lock bts dword[esi],31
   jc .loop

;----------------------------
   RETURN
ENDP
That way, during development you can have automatic error detection and you can easily remove all the checks for the final release (once you know it's right).


Cheers,

Brendan

Re:Scheduler and semaphores

Posted: Sun Jul 24, 2005 2:19 pm
by FlashBurn
You are not full right ;) Now the threads run for ca. 1 sec in Bochs, but not on the real PC. I also think that the problem with Bochs is a failure of Bochs, because if I write some debug code which outputs some data before the test of the mutex then it works ??? If I don?t have these debug code the code goes in an endless loop -> 2 threads are waiting (seems so) and the running thread tries to get the mutex, but the mutex is in use ???

But on my real PC it looks like there is a problem when it switches the thread. And because I have no error in Bochs I don?t know from where this comes - maybe it is the same as above, but occures when it switches the threads.

Edit::

I test the code further and there is no problem with the scheduler, it works - in Bochs -. There is only a problem when it comes to using semaphore and mutex!

Re:Scheduler and semaphores

Posted: Sun Jul 24, 2005 3:04 pm
by FlashBurn
Ok, it is in any case the semaphore code which produces the errors >:(

So maybe I should overthink my code:

Code: Select all

semaphore_acquire:
 acquire the mutex for the semaphore ;the mutex disables ints
 sub [count of semaphore],1
 if count <= 0 then 
  put the thread onto the waiting queue of the semaphore
  set the waiting flag in the thread flags ;the scheduler wont put the thread into the ready queue
  set the reschedule flag of the scheduler
 end if 
  release mutex ;which enable ints  
  return

semaphore_release:
 acquire the mutex for the semaphore ;the mutex disables ints
 add [count of semaphore],1
 if count > 0 then
  get the thread on top of the waiting queue of the semaphore
  call scheduler_add_scheduler with ptr2thread
  set the reschdule flag of the scheduler
 end if
  release mutex ;which enable ints  
  return
If the above is right, my code should be also right, but it doesn?t work :'(

Re:Scheduler and semaphores

Posted: Sun Jul 24, 2005 5:18 pm
by Brendan
Hi,

One problem is that your "semaphore_acquire" procedure adds the current thread to the semaphore's queue and then waits for the scheduler to restart the thread without releasing the mutex (which prevents the semaphore from being released).

You'd want something like:

Code: Select all

semaphore_acquire:
 acquire the mutex for the semaphore
 sub [count of semaphore],1
 if count <= 0 then
  put the thread onto the waiting queue of the semaphore
  release the mutex for the semaphore
  tell the scheduler to stop this thread from running
  return
 end if 
  release the mutex for the semaphore
  return
Unfortunately this leaves another problem - another CPU could try to restart the blocked thread after it releases the mutex but before the scheduler has stopped running it.

The scheduler's code to block the thread and find another thread to run would need to use a mutex too, so one way around the problem would be to protect things with the mutex the scheduler uses. For example:

Code: Select all

semaphore_acquire:
 acquire the mutex for the semaphore
 sub [count of semaphore],1
 if count <= 0 then
  put the thread onto the waiting queue of the semaphore
  acquire the scheduler's mutex (released by the scheduler)
  release the mutex for the semaphore
  tell the scheduler to stop this thread from running
  return
 end if 
  release the mutex for the semaphore
  return
In this case, the scheduler's code to stop the thread from running needs to release the scheduler's mutex but must not lock it first. This sounds ugly, but I can't think of a way to avoid it (except CLI on single-CPU computers).


Cheers,

Brendan

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 1:53 am
by FlashBurn
Ok, I have to explain the working of this more! As long as the actual thread has the mutex it wont be interrupted because ints are disabled. Then I put it onto the semaphores queue and set the waiting flag for the thread and the reschedule flag for the scheduler. Then I release the mutex which enables the ints and I also use a "hlt" for beeing sure that the scheduler will be called before the thread gets out of the semaphore. The scheduler then checks if it has to reschedule (yes) and when it wants to put the thread back onto the ready queue it checks the flags of the thread and sees that it wants to wait, in future the scheduler will do something with the priority, but at the moment the only thing that is done is that the scheduler makes nothing with the thread!

With this algo every thread should print its number and then another thread should be running and the same starts again. The problem is now that this works for a ca. 1 sec, but then a thread is running 2 times and most times another thread will run and then I seem to have a deadlock!

What I can?t understand is how this can happen!?

Edit::

I had the time to test the actual changes on a real PC and it didn?t worked >:( I?m using now a mutex not a semaphore for my putchar function and this works perfectly in bochs, but on a real pc it only makes the thread switch from idle to thread 1 to thread 2 and nothing! And this time it can?t be a deadlock, because the ints are disabled during having the lock!

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 4:58 am
by distantvoices
first I suppose you disable all hw interrupts. and maybe add some preempting stuff at the keyboard interrupt so you can trigger preemption at will and nothing else disturbs you.

second, have you declared all your mutexes volatile?

then - have you placed already some crucial output at different stages of the scheduling/mutexing stuff - so you know where which task is at any given time?

welcome, my friend, to debug sessions darker than night. May the schwartz be with you. :-)

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 5:17 am
by FlashBurn
1st I have to say that I do all my programming stuff in assembly 8) so I don?t have such things such volatile. And the point which I don?t understand is that it works as long as I output something to the Bochs special port (0xe9) - it works in Bochs - ! But I will try the method of using the keyboard interrupt. Also it couldn?t be that it doesn?t work if I use "cli" and "sti" for my putchar function, because then when it returns from the function the scheduler will be called and a thread switch will occure and the scheduler has to work! But there is one more thing I should check (IDT).

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 5:40 am
by Candy
Default guess is that you have a race condition. That is, you have two "threads" that are trying to do the exact same at the exact same moment, and one of them has the lock and the other is spinning on it. You make it less likely by introducing a manual slowdown (say, output to a port?) outside the mutexed code, and you make it more likely by introducing a manual slowdown inside the mutexed code.

Check mainly for mutexed sections of code that can be interrupted. If you don't have any, godspeed.

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 7:26 am
by Brendan
Hi,
FlashBurn wrote:Ok, I have to explain the working of this more! As long as the actual thread has the mutex it wont be interrupted because ints are disabled. Then I put it onto the semaphores queue and set the waiting flag for the thread and the reschedule flag for the scheduler. Then I release the mutex which enables the ints and I also use a "hlt" for beeing sure that the scheduler will be called before the thread gets out of the semaphore.
There's no way you can guarantee that the first IRQ received will be the scheduler's IRQ - it could be a keyboard IRQ or anything else. While your OS is small this can be avoided by masking in the PIC, but that won't work when you start using the other IRQs (masking and unmasking the other IRQs would cause them to be lost, which would caould device driver problems). It's something that will need to be fixed sooner or later. It's also slow - how much time will your threads spend waiting for the scheduler's IRQ?

It's also easy enough to fix. For example:

Code: Select all

IRQ0handler:
   call sendEOI
   call reschedule
   iretd

reschedule:
   * find next thread to switch to *
   call goto_thread
   ret

goto_thread:
   * switch to the selected thread *
   ret
I'd also recommend using a single procedure for stopping the currently running thread:

Code: Select all

PROC stop_thread, flags
BEGIN
  mov eax,[flags]
  cli
  mov ebx,[address_of_current_thread's_data]
  or dword[ebx+thread_t.flags],eax
  sti
  call reschedule
  RETURN
ENDP
Instead of doing the "HLT" and wait thing, you'd just do "call stop_thread, thread_wait". Later on you'd use the same thing with other flags (message_wait, sleep_wait, etc). I'd also have a procedure for starting a thread, for e.g.:

Code: Select all

PROC start_thread, threadAddress, flags
BEGIN
  mov eax,[flags]
  mov ebx,[threadAddress]
  xor eax,0xFFFFFFFF
  and dword[ebx+thread_t.flags],eax
  cmp dword[ebx+thread_t.flags],0
  jne .notReady
  call reschedule
.notReady:
  RETURN
ENDP
This can be improved later with "if thread became ready and thread has higher priority than current thread then call goto_thread", which could save the scheduler the overhead of finding the next thread to run.

BTW you are currently masking/disabling all other IRQs in the PIC aren't you? It could explain some of your symptoms if you currently don't..


Cheers,

Brendan

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 3:32 pm
by FlashBurn
Yes I?me masking all irqs! I found 1 problem, the spurious interrupt and I?ve rewritten and change the queue handling code of my scheduler, now it works under Bochs - but not on real PC, it still doesn?t get the thread switching working - And I found that I should not send an EOI when I have a spurious interrupt so I will change the got for it and test it then on a real PC!

Would it be a good idea to use the PIT for inc a timer variable and then I could use the APIC Timer in one shot mode and every time a reschedule is need I would do an "int INT_SCHEDULER"?

Could a write to a not existing port be the reason for my problem on my real PC? Because I just saw that I?ve forgotten to comment the line where I output a char to the Bochs port for debugging!

Re:Scheduler and semaphores

Posted: Mon Jul 25, 2005 7:16 pm
by Brendan
Hi,
FlashBurn wrote:Would it be a good idea to use the PIT for inc a timer variable and then I could use the APIC Timer in one shot mode and every time a reschedule is need I would do an "int INT_SCHEDULER"?
Yes, mostly.

For the time variable, your OS may want to use the PIT if there is no local APIC. Also the PIT isn't connected to the IO APIC on some older multi-CPU motherboards (not sure if that matters for your OS). Both of these reasons are why I use the RTC/CMOS periodic IRQ (IRQ 8 ) for the time variable instead of the PIT.

As the local APIC timer IRQ handler is already running within the kernel you'd be able to use "call INT_SCHEDULER" (or "call reschedule") instead of a software interrupt (it's faster, and code running outside the kernel normally wouldn't need to use this code). You could also call the reschedule code from within your semaphore code to get rid of that HLT (and the IRQ masking/unmasking stuff).
FlashBurn wrote:Could a write to a not existing port be the reason for my problem on my real PC? Because I just saw that I?ve forgotten to comment the line where I output a char to the Bochs port for debugging!
A write to a non-existing IO port shouldn't matter, but if there's a device using the IO port it could be a problem. I think IO port 0xE9 is fairly safe though (I've never had any problems leaving Bochs debugging code in on real computers, but every real computer is different).


Cheers,

Brendan

Re:Scheduler and semaphores

Posted: Thu Jul 28, 2005 3:17 pm
by FlashBurn
It will be a while till I will post about my problems on this topic. I just get my internet back after a hdd crash >:( I lost some of my source files and the backup was from the 22th :( - I should backup more often - I have to reprogram the lost files! And this could take some time.

Edit::

This was going faster than I thought ;) So now Bochs gives me - for the first time and I don?t know why just now ??? - the information that I?ve sent an EOI although there was no bit set in the ISR. Could this be a problem on a real PC? The problem to solve is that I have to call the scheduler if I do it with a call or an int its equal, but he will always sent an EOI!

Edit::

I?ve seen that it is impossible to call the scheduler with the call instruction because he does an iret the end and so I need to do an int!

Re:Scheduler and semaphores

Posted: Thu Jul 28, 2005 8:13 pm
by Brendan
Hi,
FlashBurn wrote:This was going faster than I thought ;) So now Bochs gives me - for the first time and I don?t know why just now ??? - the information that I?ve sent an EOI although there was no bit set in the ISR. Could this be a problem on a real PC? The problem to solve is that I have to call the scheduler if I do it with a call or an int its equal, but he will always sent an EOI!

Edit::

I?ve seen that it is impossible to call the scheduler with the call instruction because he does an iret the end and so I need to do an int!
You could do something like:

Code: Select all

local_APIC_timer_IRQ_handler:
    call send_EOI
    call  reschedule
    iretd

reschedule:
    ...find a thread and switch to it...
    ret
Depending on your OS (and what it's doing), it's likely that most thread switches will be caused by threads needing to wait for something, rather than their CPU time running out. This implies that the local APIC timer IRQ isn't used as often as you'd think. For example, in my OS most threads stop running because they need to wait for a message to arrive.

BTW sorry to hear about your hard drive. It's a reminder for the rest of us - I haven't done a backup for 6 months, and should setup cron properly ;)...


Cheers,

Brendan