Coding semaphores

gommo · Post by **gommo** » Wed May 05, 2004 6:49 am

Hi,

i have a basic kernel running that simply is executing two threads that count up displaying the variable on the screen.

The threads are running in ring 3.

Whats the basic steps in writing semaphores for my OS.

I have to disable interrupts inside the wait and signal functions? Should these jump to ring0 to do this. Should I have these as system calls?

Thanks

Pype.Clicker · Post by **Pype.Clicker** » Wed May 05, 2004 7:16 am

ultimately yes, DPL0 is needed to implement semaphores properly. There may be some "user-only" code for "best cases" where all the 'clients' of the semaphore are running as threads of the same process and where the semaphore is still available. Check userland-only semaphores for details ...

But as soon as one 'client' don't trust the others or once the 'sleep until the semaphore become available' action must be taken, it is necessary to get at kernel level (since only the kernel has the ability to suspend interrupts between the check and the completion of the 'switch to another free thread after the current one has been moved to the semaphore's pending queue'

BI lazy · Post by **BI lazy** » Wed May 05, 2004 12:55 pm

As Pype says: implement the basic service for semaphores in Kernel land, say as semaphor.c. Have some call gate or Software ISR at hands for the parameter passing. Take sure that the Down and Up operations are atomic. This means, that a semaphor operation is not to be interrupted whilst execution implying that it needs to be done in kernel land, with Ints disabled. No scheduling, no message passed, no other processes woken up. Nothing.

As soon as you are in kernel land, the fun starts.

the semaphores mechanism *needs* to be tightly coupled with the scheduling and thread queue mechanism (I do a micro kernel, so the kernel actually 'sees' threads) for you will have to block/ready threads upon the traditional semaphor operations: Down (to acquire the semaphor) and Up (to release it). The Down operation may result in suspending the process and adding it to the semaphor pending queue. The Up Operation may result in fetching the next waiting process from the pending queue and making it ready.

I go further. My semaphores have a special flag, indicating whether the time a thread spends in a semaphor shall be counted and added to its round robin time slice - if it is a user land thread - or not. But enough of this. The feature is not well tested and might as well prove to be the worst bug ever announced as a feature to the world of computing. *rofl*

Candy · Post by **Candy** » Wed May 05, 2004 2:19 pm

Pype.Clicker wrote: ultimately yes, DPL0 is needed to implement semaphores properly. There may be some "user-only" code for "best cases" where all the 'clients' of the semaphore are running as threads of the same process and where the semaphore is still available. Check userland-only semaphores for details ...

But as soon as one 'client' don't trust the others or once the 'sleep until the semaphore become available' action must be taken, it is necessary to get at kernel level (since only the kernel has the ability to suspend interrupts between the check and the completion of the 'switch to another free thread after the current one has been moved to the semaphore's pending queue'

Uhm... if I'm not mistaking, I've just implemented semaphores that can be used in usermode-only, excluding the thread blocking section. I don't see why that part couldn't be executed in kernel mode as special BlockThreadOnSemaphore function, minimizing the kernel call amount to only the case where the semaphore is empty

Concept:

Taking a semaphore, you xchg the current value with -1, check the retrieved value. If it's -1, loop the xchg until you do get a value.
If the value is 0, call BlockThreadOnSemaphore
if the value is not 0, decrease it by one, move (just mov, nothing special) it to the semaphore location, and continue

Returning a semaphore, you xchg the current value with -1 until you get something other than -1. Add one to this value and mov the current value (after increasing) back to the semaphore location.

IMO, this works. If it doesn't work, or if I completely misunderstood semaphores (wasn't paying attention in class, so it's well possible), please tell me.

Pype.Clicker · Post by **Pype.Clicker** » Wed May 05, 2004 2:57 pm

okay, now let's figure the current scenario ...
- the thread reads the semaphore and sees it should block.
- just before the thread could block, the thread is preempted and thread B receives the CPU
- thread B releases the semaphore.
- when A will get the CPU again, how will it see that it no longer needs to block as it actually received the semaphore from B ?

sr5yh · Post by **sr5yh** » Wed May 05, 2004 3:38 pm

a is interrupted by b

(ahaha..)

ineo · Post by **ineo** » Wed May 05, 2004 5:07 pm

Pype.Clicker wrote: - when A will get the CPU again, how will it see that it no longer needs to block as it actually received the semaphore from B ?

That's called a spinning loop: the loop is around the locked instruction. If it's well written, this section is really small and fast, so your CPU make some "active" wait, but that's not really a problem. And you don't even need CLI

It can be used at any CPL you like, since it only relies on LOCK (or xchg). Typicaly you may use it to access semaphore, and avoid doing CLI/STI all the time.

INeo

Candy · Post by **Candy** » Wed May 05, 2004 10:43 pm

Pype.Clicker wrote: - the thread reads the semaphore and sees it should block.
- just before the thread could block, the thread is preempted and thread B receives the CPU
- thread B releases the semaphore.
- when A will get the CPU again, how will it see that it no longer needs to block as it actually received the semaphore from B ?

B cannot release the semaphore until A has completed the kernel portion of the block, and only after this kernel portion can B release the semaphore

so said the department of redundancy department

The idea is that the semaphore operation should ideally be placed within a donotinterrupt() ... youmayinterruptnow() block. This way these situations do not appear as often, and the semaphore will work a lot more efficient than otherwise.

Only one thread may modify the current semaphore value, similar to a mutex. The restoring of the zero to the memory location is done in the kernel, so you can apply all your kernel tricks there (cli/ipi/whatever). The point is, the only time that you enter the kernel is when a thread goes in the kernel to block itself.

BI lazy · Post by **BI lazy** » Thu May 06, 2004 12:15 am

Hmm... shouldn't the semaphor per se be handled in *one* *atomic* operation?

How do you, candy, guarantee an atomic operation in your approach?

Pype.Clicker · Post by **Pype.Clicker** » Thu May 06, 2004 1:47 am

Candy wrote: B cannot release the semaphore until A has completed the kernel portion of the block, and only after this kernel portion can B release the semaphore

Of course. That's something that can easily be done when your *whole* semaphore is handled in kernel mode, but with the

Code: Select all

void get_the_sem() {
  int s=-1;
  while (s == -1) s=test_and_set(sem, -1);
  if (s) atomic(s--);
  else kernel_wait(sem);
}

void release_the_sem() {
  int s=-1;
  while (s == -1) s=test_and_set(sem, -1);
  if (!s && sem->pending_threads) kernel_signal(sem);
  else atomic(s++);
}

The scenario i showed will make the B thread busy-looping as long as A get the CPU back to perform "kernel_wait(sem)" ...

so said the department of redundancy department

;D

The idea is that the semaphore operation should ideally be placed within a donotinterrupt() ... youmayinterruptnow() block. This way these situations do not appear as often, and the semaphore will work a lot more efficient than otherwise.

That seems wise, but i fail to see how it will actually be possible in usermode. Iirc, i heard you saying "we don't need no interruption. No VPI or VIF flag. Hey, Hacker! Leave Kernel alone ... all we need is just a lock sub [sem],1 ... ", right ?

Only one thread may modify the current semaphore value, similar to a mutex. The restoring of the zero to the memory location is done in the kernel, so you can apply all your kernel tricks there (cli/ipi/whatever). The point is, the only time that you enter the kernel is when a thread goes in the kernel to block itself.

and when another thread needs to wake up a sleeping thread...

Candy · Post by **Candy** » Thu May 06, 2004 1:47 am

BI lazy wrote: Hmm... shouldn't the semaphor per se be handled in *one* *atomic* operation?

How do you, candy, guarantee an atomic operation in your approach?

It should be handled atomic compared to other processes/threads. Since no thread can modify the semaphore while one is adjusting it, it doesn't matter how long it takes for that one to adjust it. Since the semaphore code itself is very short, it's worthwile to turn one setting of the semaphore (-1, or ~0, which is the least likely setting for a semaphore) into the mutex-taken-by-someone. A mutex' function is to make a complex operation atomic, so by "mutexing" the semaphore operation, you assure that the semaphore is adjusted within the atomicity. The only caveat is the block-on-empty, where you must let the kernel handle it, so that the kernel can take the thread-table-mutex, release your semaphore back, then add you to the blocked threads on semaphores, and then release the mutex on the thread table. If any process/thread was to up the semaphore in the time you're still working on the thread-semaphore-table, it won't get access because of the second mutex, and it can only actually signal any threads for that semaphore as soon as you've released the second mutex, which is after you've added the thread.

In short, the semaphore operation is virtually mutexed, thereby assuring that it will execute virtually atomically.

I'll try to make a Microsoft Paint(tm) drawing that shows all possible points of failure, and why they do not apply.

ineo · Post by **ineo** » Thu May 06, 2004 9:19 am

@Candy: In fact you're speaking of a kind of "local" atomicity.

However I don't see the point here, maybe I am missing something. Since CLI/STI are not really required for semaphore (you could use spinning loops to avoid it, and that's how it works on multiprocessor systems), You could use semaphore at any level you like without any kernel code.
However You will need some "system managed" semaphores for shared resources. But there again it may be implemented using a user task (aka server in my microkernel).

Could you be more precise ? Or explain, because I feel like I am missing something important.

Thanks,
INeo

BI lazy · Post by **BI lazy** » Thu May 06, 2004 11:50 am

semaphores as a *service* spinning around the microkernel?

So you'd send a message to the service: Hey, I wanna take the semaphor X down ... Sure this is a performant approach?

I for my part have decided to break with the micro kernel paradigm - nearby everything to be a task or so - and to provide semaphores as a system wide means of synchronization and critical sections control in the micro kernel itself. It also eases the wait-on-semaphor stuff.

Stay safe

proxy · Post by **proxy** » Thu May 06, 2004 1:17 pm

I have a question regarding semaphores:

normally my operations boil down to somthign liek this:

Code: Select all

void Semaphore::p() {
   // this is to ensure atomic operation
   AutoLock _(SysLock::instance());
   if(--m_Value < 0) {
      block();
   }
}
void Semaphore::v() {
   // this is to ensure atomic operation
   AutoLock _(SysLock::instance());
   if(++m_Value <= 0) {
      unblock();
   }
}

AutoLock is a class which ensure no interrupts until it is deconstructed (goes out of scope), so this works very nicely. you can assume block/unblock add/remove threads from a block list and mark them as runnable/blocked, etc.

now assuming i had proper dec_and_test and inc_and_test functions (using lock, etc), could i not implement these like this:

Code: Select all

void Semaphore::p() {
   if(dec_and_testl(&m_Value)) {
      block();
   }
}

void Semaphore::v() {
   if(inc_and_testle(&m_Value)) {
      unblock();
   }
}

the logic to decide to block or unblock is atomic (and that's the point where semaphores are vulnerable i beleive, so i think it would work assuming that block and unblock are atomic when modifying the threads and modifying the lists.

this way, the only time you need to truely goto kernel space is when you are block/unblocking..

is this correct?

ineo · Post by **ineo** » Thu May 06, 2004 2:46 pm

BI lazy wrote:So you'd send a message to the service: Hey, I wanna take the semaphor X down ... Sure this is a performant approach?

In fact the performance problem is solved by using "local" semaphore that do not require any external help.
Furthemore, I am thinking of a generic way to handle asynchronous monitoring on objects (semaphores & so on). I'll see how to handle the performance issues, but I have no clues yet.

In fact microkernel is for sure not the most performant architecture. But I like it's modularity and scalability. And I am willing to lose some performance for that, but I try to be not too bad by finding hints.

OSDev.org

Coding semaphores

Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores

Re:Coding semaphores