OSDev.org

Posted: **Fri Feb 26, 2010 3:12 pm**

Ok, as the title suggest I want a discussion when to use a spinlock and when to use a semaphore. At the moment I´m only talking about kernel space and a spinlock means when you have it the ints are disabled (so on a single cpu system you always get the spinlock) and a semaphore means if it is taken you will go waiting till you are woken up by the owner of the semaphore.

At the moment I only have spinlocks in my kernel, but as my code gets bigger and bigger I have to think about using semaphores for somethings, but I need a point to distinguish if I should use a spinlock or a semaphore. So when are you using a spinlock and when a semaphore?

Posted: **Sat Feb 27, 2010 12:35 am**

Hi,

FlashBurn wrote:At the moment I only have spinlocks in my kernel, but as my code gets bigger and bigger I have to think about using semaphores for somethings, but I need a point to distinguish if I should use a spinlock or a semaphore. So when are you using a spinlock and when a semaphore?

As a general "rule of thumb":

if you try to acquire a lock but can't (because someone else has already acquired it), then determine if the overhead of 2 task switches is better or worse than the overhead of doing nothing until the lock is released. For example, on average 2 task switches might cost you 2000 cycles, and on average the lock might be freed in 1500 cycles, and 2000 is more than 1500.
If the overhead of doing nothing is less, then use a spinlock
If the overhead of 2 task switches is less, then redesign/rewrite your code (so that the lock isn't held for far too long) and then use a spinlock.

Of course I'm used to micro-kernels. For a monolithic kernel you might have situations where redesigning/rewriting doesn't solve the problem, and maybe using a semaphore or mutex could be a good idea - I don't know for sure (but I wouldn't worry about semaphores/mutexes until you find yourself in this situation).

Cheers,

Brendan

Posted: **Sat Feb 27, 2010 2:03 am**

Brendan wrote: Of course I'm used to micro-kernels. For a monolithic kernel you might have situations where redesigning/rewriting doesn't solve the problem, and maybe using a semaphore or mutex could be a good idea - I don't know for sure (but I wouldn't worry about semaphores/mutexes until you find yourself in this situation).

I´m also writing on a mikrokernel. Maybe I should "benchmark" my code and then decide what is better. I try to use lock for as short as possible, but when I use it to protect some binary trees it could be take some time till the code finishes. At the moment my concern is that it takes too long when I protect things like the address space, so that I can search for a area which is big enough and then map in the pages and I thought that it maybe would be good to use a semaphore for protecting the address space and a spinlock for protecting my physical memory manager.

Edit::

I just finished my "benchmark". I just read the tsc in bochs and on a P3-S 1.13GHz. So the function which uses a spinlock and has the most code (I think so) is my function for finding an area in the address space which is big enough. So on bochs I get 453-581 instructions (because bochs only counts instructions) and on my p3 I get 689-1611 clock ticks. But I have to admit that these are best case scenarios, because the search for an area isn´t long as there is only one big area from which I alloc, but as I use an avl tree for searching an area it should not take much longer if I woould have many areas (they are sorted by size and address).

Posted: **Sat Feb 27, 2010 4:22 am**

Hi,

FlashBurn wrote:I´m also writing on a mikrokernel. Maybe I should "benchmark" my code and then decide what is better. I try to use lock for as short as possible, but when I use it to protect some binary trees it could be take some time till the code finishes. At the moment my concern is that it takes too long when I protect things like the address space, so that I can search for a area which is big enough and then map in the pages and I thought that it maybe would be good to use a semaphore for protecting the address space and a spinlock for protecting my physical memory manager.

Edit::

I just finished my "benchmark". I just read the tsc in bochs and on a P3-S 1.13GHz. So the function which uses a spinlock and has the most code (I think so) is my function for finding an area in the address space which is big enough. So on bochs I get 453-581 instructions (because bochs only counts instructions) and on my p3 I get 689-1611 clock ticks. But I have to admit that these are best case scenarios, because the search for an area isn´t long as there is only one big area from which I alloc, but as I use an avl tree for searching an area it should not take much longer if I woould have many areas (they are sorted by size and address).

Ok - this is a good example. 689 to 1611 cycles probably means an average of about 1150 cycles for the complete operation, but the other thread trying to get the lock might start trying to get the lock just after the lock was acquired or just before the lock is about to be released (or anywhere in between). Because of this (assuming the lock isn't heavily contended) the actual time the other thread needs to wait before the lock is released might be closer to half that, or about 575 cycles.

You don't say how quick 2 address space switches would be so it's hard to compare. However, let's assume that 2 task switches are faster than 575 cycles. What would you do to make the searching quicker? The first thing I'd do is ask why the kernel is searching for an unused part of the address space to begin with. If the caller says "put_something_at(virtual_address)" instead of "put_something_somewhere_unused()", then there'd be no need for the kernel to search at all. In some cases this might mean the caller needs to search, but (depending on the situation) the caller might use a buffer at a fixed (virtual) address, or objstacks or "sbrk()" or some other method (where searching isn't necessary, or can be done a lot faster, or can be done without any locks).

Cheers,

Brendan

Posted: **Sat Feb 27, 2010 11:00 am**

Brendan wrote: You don't say how quick 2 address space switches would be so it's hard to compare.

If you could give me a hint how I could measure it!?

Brendan wrote: However, let's assume that 2 task switches are faster than 575 cycles. What would you do to make the searching quicker? The first thing I'd do is ask why the kernel is searching for an unused part of the address space to begin with. If the caller says "put_something_at(virtual_address)" instead of "put_something_somewhere_unused()", then there'd be no need for the kernel to search at all. In some cases this might mean the caller needs to search, but (depending on the situation) the caller might use a buffer at a fixed (virtual) address, or objstacks or "sbrk()" or some other method (where searching isn't necessary, or can be done a lot faster, or can be done without any locks).

The problem here is that with my system I have to keep track which areas of the address space are free and which not. So if I would implement a thing like letting the user program page fault and insert a page at that address I would also need to search for that virtual address and take it out of the free list. The problem I have with this way is, what is if at this address is something other mapped, like a library? So I say the user program needs to ask the kernel for memory and if there is a free area it will get a pointer to it. Because the user program can´t know where there is a stack mapped or a shared library. I mean I could easily implement a function like "put_something_at(virtual_address)", but I would also need to look if this address is free and so it would be the same.

I also don´t like the usual thing that a heap grows upwards and the stacks grow downwards. So where should I put in libraries and what is when I freed a stack somewhere in the middle of 2 other stacks and things like that. Every time I need to keep track what is used and what is not used and the user program can´t know all these things, so I keep it completely away from it.

My address space could look something like this:

Code: Select all

0x400000 - 0x404000 code
0x405000 - 0x40A000 stack
0x40B000 - 0x410000 data
0x411000 - 0x413000 stack
0x414000 - 0x418000 data
0x418000 - 0x41A000 shared library
...

Posted: **Sun Feb 28, 2010 12:22 am**

Hi,

FlashBurn wrote:The problem here is that with my system I have to keep track which areas of the address space are free and which not. So if I would implement a thing like letting the user program page fault and insert a page at that address I would also need to search for that virtual address and take it out of the free list. The problem I have with this way is, what is if at this address is something other mapped, like a library?

For "present" pages there's at least 3 "available" bits in page tables entries, etc. These three bits could hold a value from 0 to 7, and this value can tell your kernel what the page is used for. For example:

0 = normal RAM
1 = part of a memory mapped file
2 = part of a shared memory area
...
7 = part of a DLL or shared library

In a similar way, for "not present" pages there's at least 31 "available" bits in the page table entries, etc. In this case, maybe 2 of them could be used to describe what the page is used for (e.g. "not used at all", "memory mapped file" and "sent to swap") and the remaining bits could be some sort of ID (e.g. if it's part of a memory mapped file, bits 3 to 31 could an identifier used to find out which entry in the process' list of memory mapped files, and if it's a page sent to swap space then bits 3 to 31 could be the number of the page on the swap partition).

When a process asks the kernel to map something at a specific address, the kernel checks the area to see if there's problems (and returns an error if the caller is trying to do something completely insane). Otherwise the kernel frees or removes whatever was there (using the "available" bits to figure out how to free/remove something correctly) and maps something else there. The kernel does need to know which pages are being used for what and only really needs to know a more generic "page type". For example, the ".bss", heap space and stack/s are all just "read/write data" areas (or maybe "allocate on demand" areas) as far as the kernel cares.

When a new process is started, the process loader reads the executable file's header and parses it to figures out what should go where (and then loads shared libraries and does linking, etc). The kernel doesn't care how this works, and you could have several different process loaders and support several different executable file formats (or even start something like byte-code interpreter instead of a process loader, if the executable file is Java byte-code or something).

FlashBurn wrote:I mean I could easily implement a function like "put_something_at(virtual_address)", but I would also need to look if this address is free and so it would be the same.

In most cases the kernel should do what it's told. For example, if a process says "memory map a file at <foo>" then the kernel should wipe out whatever was already at "<foo>" and memory map the file there (rather than complaining that the area isn't free, and forcing the programmer to use an extra system call to free the area first).

FlashBurn wrote:I also don´t like the usual thing that a heap grows upwards and the stacks grow downwards. So where should I put in libraries and what is when I freed a stack somewhere in the middle of 2 other stacks and things like that. Every time I need to keep track what is used and what is not used and the user program can´t know all these things, so I keep it completely away from it.

A process *does* know these things. Someone writes source code and they probably have no idea what will end up where. A compiler and linker compiles the source code and splits it up into "sections" and creates an executable header saying where all of these "sections" go in the virtual address space. The process loader uses the executable file's header (and the headers in any shared libraries, etc) to decide where everything is before the executable file starts; and the standard library keeps track of everything that happens after the executable file is started (dynamically created stuff - heap space, etc). The person who wrote the source code may have no idea where things are, but everything else in user space does.

Cheers,

Brendan

Posted: **Sun Feb 28, 2010 5:23 am**

Ok, I know that I never will know as much as you are knowing about os deving, but this is getting into a discussion, I like "foo" more so "foo" is better than "bar"! I like your input and how you take the time to explain everything in detail, but I will keep my current vmm

I mean the discussion could also be a monolithic kernel is faster than a mikrokernel and things like that. I don´t know how windows does the whole vmm thing, but the way you describe it is the unix way (I think so, but could be wrong) and I want to take a different approach!

So I know what I wanted to know, that I can use a spinlock as long as it isn´t hold as long 2 adress space switches last. Then it would be better to use a semaphore.

Another thing I just read about are read-write-locks, maybe I will look if I could use these.

Posted: **Thu May 20, 2010 4:24 am**

There is also the interrupt latency issue. Therefore, my design goal is to only use spinlocks where they are absolutely necessary to minimize interrupt latency. This is only in two places. To lock some lists and to lock the scheduler. Maximum clock-cycles with disabled interrupts would be in the order of 100s, if not less.

Everywhere else I use EnterSection / LeaveSection for reentrance protection of data. I also have Signal / WaitForSignal (the latter is normally blocking), and good for implementing consumer / producer synchronization. Signal can also be used from ISRs and can restart threaded handlers for devices that produce IRQs.

Posted: **Thu May 20, 2010 5:16 am**

There is also the interrupt latency issue. Therefore, my design goal is to only use spinlocks where they are absolutely necessary to minimize interrupt latency.

Why do you disable interrupts when spinning? If the ISR will not be touching the memory that you are trying to protect couldn't you spin with interrupts enabled ?

- gerryg400

Posted: **Thu May 20, 2010 5:57 am**

gerryg400 wrote:
There is also the interrupt latency issue. Therefore, my design goal is to only use spinlocks where they are absolutely necessary to minimize interrupt latency.
Why do you disable interrupts when spinning? If the ISR will not be touching the memory that you are trying to protect couldn't you spin with interrupts enabled ?

- gerryg400

The spinning part alternates between enabling and disabling interrupts. Each time it fails to acquire the lock, it will enable interrupts and do a pause instruction. It is the code that runs when the lock is taken that must run with interrupts disabled. At least it must if ISRs can touch code that is protected, like restarting threads.

Posted: **Thu May 20, 2010 6:03 am**

I see. That makes sense. In my case I check whether I was spinning and if so return from the interrupt to spinning and enter the scheduler later when I get the lock.

- gerryg400

OSDev.org

semaphore or spinlock, that´s the question

semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question

Re: semaphore or spinlock, that´s the question