Designing a kernel to be preemptible

Colonel Kernel · Post by **Colonel Kernel** » Mon Jul 05, 2004 1:06 am

Hi... I'm a first time poster, so if this topic has been discussed already, please put away the big stick.

I'm designing my own microkernel, and lately I've been thinking about the issue of kernel pre-emption, in the context of what I know about other OS' kernel architectures (NT, QNX/Neutrino, and Minix are the ones I'm most familiar with... Linux not so much).

The OS FAQ has this to say about microkernels with regard to preemption:

In theory, this concept makes the kernel more responsive (since much functionality resides in preemptible user-space threads and processes), and improves the stability of the kernel by reducing the amount of code running in kernel space.

And...

Likewise, the additional design work that has to be done to get a microkernel design right could also be spent on making a monolithic kernel preemptable.

This leads me to the conclusion that one of the main benefits of the microkernel approach is its simplicity. The idea being that microkernels are supposed to have such short code paths, that the cost of these paths being non-preemptable, in terms of latency, is negligible. I remember reading something to that effect in the documentation for an older version of QNX Neutrino (www.qnx.com). Unfortunately, that doc is now in the great bit bucket in the sky.

However, the latest version of Neutrino (which still qualifies as a microkernel AFAICS) is pre-emptible. So... what gives? It makes sense, given that it's a real-time OS and has to make pretty strict latency guarantees. But I thought pre-emption would be sort of redundant for a microkernel.

Opinions...? Is it worth the added complexity?

It seems to me that this is an important design decision to make early, since it affects the way control flow in the kernel is structured.

Finally, a bit of background on my design goals:

Will support SMP eventually
Implemented on x86 initially, but will be portable
Not specifically aimed at real-time stuff, but low latency is important for general responsiveness.

I'm not really looking for a clear-cut answer here, more of a general discussion....

distantvoices · Post by **distantvoices** » Mon Jul 05, 2004 2:42 am

Hi, Fellow Micro Kernel Coder!

I'm actually developing a micro kernel - well many design quirks, many things one does because he does such a thing the very first time and lacks time to *think* properly.

Concerning the issue of preempting the microkernel itself:

well, my drivers are threads in kernel space, everything else which does the management stuff - is a process of its own - with small little threads inside. So each thread preempts the ones of lower priority: this means, that in BlueIllusion a strict pyramid of who preempts whom exists:

pager: preempts everything.
drivers: preempt services and user processes
MMsrv main process & MM thread: preempt every other service process and user processes
vm86: preempts user processes

userprocesses are queued in a priority based round robin queue. (dunno wether this is a good thing (tm)).

To my experience having low control paths in kernel land are a very good thing. Although, I have implemented a second interface for direct kernel calls - as a means to handle semaphores and signals/events. Well - they *need* to be delivered as fast as possible and doing so in a quick way without doing lenghty queries of services is a nice thing.

As you see, you *need* to take care of who preempts whom in kernel land: it makes no sense to have a page fault and the pager canna jump to life because floppy driver writes in to exactly the kind of nirvana which has invoked the pagefault handler. Well, its my driver/service architecture: each driver and service runs and does its part until it reaches *receive* in the main loop again. Only then it gives up cpu.

HOpe this helps. Its more insights from doing it than from reading books. Feel free to query my *question interface* for more. *gg*

Colonel Kernel · Post by **Colonel Kernel** » Mon Jul 05, 2004 11:05 pm

On a scale from "macro" to "micro", your microkernel sounds more "macro" than mine will be.

I intend to put all drivers in user space as separate processes (the way QNX does it), as well as the paging/process management system. This leaves only the bare fundamentals for the kernel -- scheduling, timers, synchronization, interrupts, and IPC. Of these services, IPC is the only one I can forsee taking enough time to be worth preempting. I'm not sure though... I admit to being somewhat more of an armchair OS enthusiast in that I've studied OS design for years, but haven't attempted to write my own until now (I do other kinds of commercial software development for a living).

What you said about establishing rules and priorities for preemption makes sense, but I want to understand how to apply this idea within the micro-kernel itself, where the concept of "kernel threads" is a little murky. Let me see if I can grasp what you're saying with some examples from OSes I know...

In Minix, the kernel is not pre-emptible, although it spends most of its time being interruptible. It keeps a re-entrancy counter called k_reenter. While running code in user space, k_reenter == -1. As soon as the kernel is entered, k_reenter == 0. It keeps getting incremented from there by each interrupt that occurs, and decremented as each interrupt handler returns. The scheduler can only be invoked when k_reenter == 0. If an interrupt handler running at k_reenter > 0 needs to influence the scheduling of a process somehow, it puts that process on a queue to be dealt with when k_reenter gets back down to 0 again.

In NT, unlike in Minix, each thread has its own kernel stack (in Minix, there is only one global kernel stack). The NT kernel is pre-emptible most of the time. It maintains a "current priority" for each processor called the IRQL that is raised or lowered according to what the kernel is doing at the time (higher for interrupt handlers, lower for the scheduler or DPCs, NT's equivalent of bottom-half handlers). My details on this are somewhat sketchy, since I've never seen NT source code.

I see a clear parallel between k_reenter in Minix and the IRQL in NT. Is this just a coincidence? Also, would I be correct in assuming that a kernel should only be preemptible when its equivalent of k_reenter/IRQL is at its lowest (0 for Minix, 1 for NT)? It also seems to me that to make a kernel preemptible, it must separate the idea of context switching from user/kernel mode transitions. This is more of a vague idea that I can't fully explain, but am I on the right track...? I'm just trying to get an understanding of how these mechanisms work in general.

distantvoices · Post by **distantvoices** » Tue Jul 06, 2004 2:18 am

You are of course on the right path.

Mark the following situation for plans of making a kernel preemptible:

where is it in the moment an interrupt occurs: is it in the message passing subsystem delivering a message?Is it en train de scheduling the cpu to some process/thread?

It is in my opinion, that the micro kernels message passing subsystem is non preemptible. It does *one* thing at a given time: storing and forwarding messages.

IPC in form of semaphores: I don't want this to be preemptible: imagine the kernel doing the semaphor thing for a process is preempted by an other process which wants to get exactly this semaphor: This won't work out.A Bit silly Example this is, but sufficient, I think.

Time spent in a driver doing grunt work is time where the kernel is preemptible (in a monolithic design). In a Micro kernel approach, all this stuff is preemptible per se. What might improve performance is having the kernel receive interrupts while doing something on behalf of a driver/service/process whatsoever: indeed this is feasible by keeping a counter and queueing each interrupt for later handling - this way, the kernel is "preempted" (the isr stub saves away the actual processor register contents and restores them upon return). I think (but don't take my word as to highly valueable on this) having a kernel be a preemptible beast means to fease receipt of interrupts from hardware devices and to deal with them after the crucial actions are finished up. Hope this sentence makes sense. receiving an interrupt means (in its uttermost essence) that a piece of hardware wants to be cuddled/hugged/whatsoever. *gg*

Due to a kind of uncertainity of how to handle it and future redesign, in my kernel, interrupts are disabled whilst eip resides in kernel land. *gg*

Regarding your memory management/paging stuff, pay attention: you *need* some small little weasel of thread being able to slip in and out of address spaces if you gonna use paging. It doesn't help having the pager be a process of its own. if you have a page fault to resolve and need to add some pages to some address space.

Stay safe

ps: regarding macro/micro: I've got some plans of modularity: a service finds out that it needs some driver and has some special service fetch the driver from disk and stuff it to a free location in kernel land. Maybe this is a typical *nOOb* idea, maybe not - it needs some fleshing out.

Colonel Kernel · Post by **Colonel Kernel** » Wed Jul 07, 2004 12:42 am

beyond infinity wrote:where is it in the moment an interrupt occurs: is it in the message passing subsystem delivering a message?Is it en train de scheduling the cpu to some process/thread?

It is in my opinion, that the micro kernels message passing subsystem is non preemptible. It does *one* thing at a given time: storing and forwarding messages.

Some systems apparently can pre-empt message-passing -- like QNX. Take a look here and search for "System Services". There's a nifty diagram further below entitled "QNX Neutrino Preemption details".

I suspect they can get away with this because their message passing is a synchronous rendezvous rather than an asynchronous store-and-forward scheme. I'd be curious to know how they implemented it though.

In a Micro kernel approach, all this stuff is preemptible per se. What might improve performance is having the kernel receive interrupts while doing something on behalf of a driver/service/process whatsoever: indeed this is feasible by keeping a counter and queueing each interrupt for later handling - this way, the kernel is "preempted" (the isr stub saves away the actual processor register contents and restores them upon return). I think (but don't take my word as to highly valueable on this) having a kernel be a preemptible beast means to fease receipt of interrupts from hardware devices and to deal with them after the crucial actions are finished up.

I think pre-emption of the kernel means more than allowing it to be interrupted -- the diagram from QNX I linked to before seems to suggest this. It shows the stages of kernel execution as:

Interrupts off
Preemptible, interrupts on
Non-preemptible, interrupts on
Preemptible, interrupts on
Interrupts off

Also, read the section "Kernel Locking" here. Although this part is a little less clear, it is interesting....

Regarding your memory management/paging stuff, pay attention: you *need* some small little weasel of thread being able to slip in and out of address spaces if you gonna use paging. It doesn't help having the pager be a process of its own.

QNX and Mach have both done it, so it must be possible. I think the kernel still implements many of the paging mechanisms though... the "special process" just takes care of policy. I'll have to think it through more.

distantvoices · Post by **distantvoices** » Wed Jul 07, 2004 1:32 am

Maybe there are other ways to implement effective paging than with a pager thread in micro kernel adress space.

I think that even the message passing subsystem of qnx neutrino is designed as a flow of control (thread) which can be scheduled to the processor and taken away from it. If it is rendez vous, no harm will happen upon context switch. Need to think about this one a bit too.

*BUT* I really wonder: they copy data directly from one address space to another one without buffering? wouldn't this operation require to get and buffer the data first, slip into the receiving process address space and deliver the data? - and to make the process ready for execution? (stuff it into a ready queue).

What MACH and QNX *maybe* have done is having a split approach: a process for dealing with the high level stuff (policy, management) and a thread in kernel adress space to do the grunt work: handling page tables and page directories. I even go so far to keep lists of pages and processes(pgdirs) in the pager in order to keep track of who owns which pages. It eases deletion of address spaces.

Anyway - it would be very interesting to have a sniff at QNX source code. But alas, ere this thing gonna happen the sun shines at midnight over the aequator.

Colonel Kernel · Post by **Colonel Kernel** » Mon Jul 12, 2004 1:31 am

<bump>

I'm trying to understand all the possible mechanisms at my disposal before making a design decision about how to implement preemption.

I think I understand how preemption works in a uniprocessor kernel, whether micro or not... I think I also understand how preemption works in an SMP kernel that uses fine-grained locking. However, in my readings on QNX, I'm completely baffled by this:

Kernel locking
In a uniprocessor system, only one thread is allowed to execute within the microkernel at a time. Most kernel operations are short in duration (typically a few microseconds on a Pentium-class processor). The microkernel is also designed to be completely preemptable and restartable for those operations that take more time. This design keeps the microkernel lean and fast without the need for large numbers of fine-grained locks. It is interesting to note that placing many locks in the main code path through a kernel will noticeably slow the kernel down. Each lock typically involves processor bus transactions, which can cause processor stalls.

In an SMP system, QNX Neutrino maintains this philosophy of only one thread in a preemptable and restartable kernel. The microkernel may be entered on any processor, but only one processor will be granted access at a time.

For most systems, the time spent in the microkernel represents only a small fraction of the processor's workload. Therefore, while conflicts will occur, they should be more the exception than the norm. This is especially true for a microkernel where traditional OS services like filesystems are separate processes and not part of the kernel itself.

I italicized the paragraph that confuses me the most. If I'm reading this correctly, the entire microkernel is protected by a spinlock (i.e. -- coarse-grained locking). However, if this is the case, how can it possibly be preemptible? I just can't see how this could work.

Is it even possible for an SMP kernel to be preemptible without using fine-grained locking...?

Curufir · Post by **Curufir** » Mon Jul 12, 2004 3:40 am

Hmm. Think this is actually part of the real-time element of QNX, not the micro-kernel part.

QNX schedules on a real-time (AFAIK) basis, so the scheduler has already determined when the currently running process is due to be interrupted, and when the next one starts.

With that being the case it's actually imperative that the running process not be interrupted until it has run the specified number of clock cycles.

My guess would be that it picks up stray interrupts (Mouse/Keyboard etc) using spare time between scheduled processes.

Eg Do you really want the valves on your nuclear plant being delayed by a few microseconds because someone clicked the mouse?

Brendan · Post by **Brendan** » Mon Jul 12, 2004 6:00 am

Hi,

Selective quoting:

Kernel locking
In a uniprocessor system, only one thread is allowed to execute within the microkernel at a time.

The microkernel is also designed to be completely preemptable and restartable for those operations that take more time.

In an SMP system, QNX Neutrino maintains this philosophy of only one thread in a preemptable and restartable kernel. The microkernel may be entered on any processor, but only one processor will be granted access at a time.

Colonel Kernel wrote: If I'm reading this correctly, the entire microkernel is protected by a spinlock (i.e. -- coarse-grained locking). However, if this is the case, how can it possibly be preemptible? I just can't see how this could work.

It seems to me that QNX Neutrino's kernel is about 10% pre-emptable - just enough for marketting purposes. Only 1 CPU able to use the kernel at a time, but that CPU may switch contexts on occasion ("for those operations that take more time"). If one thread/CPU needs the kernel while another thread is using it the only thing it can do is block. IMHO it's like they didn't have time to implement kernel re-entrancy and did what they could (leaving the marketting department to make the most of it after).

Colonel Kernel wrote: Is it even possible for an SMP kernel to be preemptible without using fine-grained locking...?

IMHO No

. Fine-grained locking would be required to allow multiple CPUs to use (possibly different parts of) the kernel at the same time.

The kernel manages many data structures. For example, if each thread has a message queue then each message queue would/could have it's own lock. If the kernel has 3 different data structures for managing free physical memory then each data structure could/would have it's own lock, etc. All CPUs/threads would be able to read all data structures at the same time. Only one CPU/thread would be able to modify a data structure (when it's not being read). Each CPU would be independant, so 4 threads on 4 CPUS can all be adding messages to message queues at the same time (as long as they are different message queues).

This is basic fine grained locking, and still has some limits: a thread can be pre-empted while in the kernel except for between a "lock_modify" and it's "unlock_modify", or between a "lock_read" and it's "unlock_read".

Back to QNX marketting hype:

It is interesting to note that placing many locks in the main code path through a kernel will noticeably slow the kernel down.

True, but if there's 100 locks there'd be (roughly) 100 times less chance that a CPU will be blocked than if the entire microkernel is protected by a single lock. Something like sending/receiving a message, allocating an IO port or IRQ only requires passing through a single lock - similar overhead to the alternative "big kernel lock". Unless you put code to create (or terminate) a process in the micro-kernel it's unlikely you'd need to go though very many locks (with careful design).

Each lock typically involves processor bus transactions, which can cause processor stalls.

The "big kernel lock" is more likely to cause processor stalls because every CPU will be trying to use it all the time. One lock can't be in all CPU caches at the same time, but 100 locks can be spread across available CPUs.

Curufir wrote: QNX schedules on a real-time (AFAIK) basis, so the scheduler has already determined when the currently running process is due to be interrupted, and when the next one starts.

With that being the case it's actually imperative that the running process not be interrupted until it has run the specified number of clock cycles.

If the CPU/s are running something less important, and the temperature sensor on the nuclear plant sends a "too hot" IRQ would you want the thread controlling the nuclear plant's valves to be able to preempt the less important threads?

Cheers,

Brendan

Curufir · Post by **Curufir** » Mon Jul 12, 2004 11:21 am

Brendan wrote: If the CPU/s are running something less important, and the temperature sensor on the nuclear plant sends a "too hot" IRQ would you want the thread controlling the nuclear plant's valves to be able to preempt the less important threads?

You mean as opposed to the process that is monitoring core temperature and can drop the boron rods?

Colonel Kernel · Post by **Colonel Kernel** » Mon Jul 12, 2004 5:26 pm

QNX schedules on a real-time (AFAIK) basis, so the scheduler has already determined when the currently running process is due to be interrupted, and when the next one starts.

With that being the case it's actually imperative that the running process not be interrupted until it has run the specified number of clock cycles.

My guess would be that it picks up stray interrupts (Mouse/Keyboard etc) using spare time between scheduled processes.

The scheduler in QNX is actually not all that exotic. It's remarkably similar to NT's, except that it allows for threads of "real-time" priority to run FIFO instead of Round-Robin (i.e. -- they run until they block and have no "timeslice" value per se).

Interrupts are handled whenever they occur, according to some particular architecture-specific priority scheme (they're allowed to nest).

Also, if I'm reading it correctly, interrupts can be handled on any CPU at any time, regardless of who owns the kernel spinlock.

It seems to me that QNX Neutrino's kernel is about 10% pre-emptable - just enough for marketting purposes.

I suspect that's the case. I mean, why would a microkernel need to be preempted anyway? What operations can take so long that it would be worth it...? I guess message passing, but I still haven't figured out how that would work.

If one thread/CPU needs the kernel while another thread is using it the only thing it can do is block.

Do you mean "block" here, or "spin"? The distinction between the two is pretty important in this case...

IMHO it's like they didn't have time to implement kernel re-entrancy and did what they could (leaving the marketting department to make the most of it after).

I read it that way too. I remember an older version of that doc that described the global kernel lock, but didn't mention anything about preemption.

What I don't get is this -- how can it possibly be pre-emptable at all (even 10%

), if it isn't re-entrant? Isn't that a contradiction? I mean, isn't it a really bad idea to pre-empt a thread that holds a global spinlock? It seems to me that bad things would happen... and solutions to this problem elude me.

Dreamsmith · Post by **Dreamsmith** » Mon Jul 12, 2004 11:12 pm

Colonel Kernel wrote:What I don't get is this -- how can it possibly be pre-emptable at all (even 10% ), if it isn't re-entrant? Isn't that a contradiction?

No, that's not a contradiction. The world is full of nonreentrant code, but nearly all of it is preemptible. I would assume, however, that the QNX kernel is in fact reentrant.

Colonel Kernel wrote:I mean, isn't it a really bad idea to pre-empt a thread that holds a global spinlock? It seems to me that bad things would happen... and solutions to this problem elude me.

Didn't it state that it's only preemptible 10% of the time, during particular operations that make take a long time? So we know precisely which code might be preempted, and there isn't a lot of it. Since its in an operation that's going to take a long time in any case, it doesn't add much overhead to release and reacquire spinlocks at the same time one is enabling and disabling preemption, assuming this 10% of the code is even holding any spinlocks to begin with. Alternately, the preemption mechanism itself could deal with the spinlocks at the time it preempts it.

Colonel Kernel · Post by **Colonel Kernel** » Tue Jul 13, 2004 12:58 am

This reply is long; please forgive my dogged quest for understanding.

No, that's not a contradiction. The world is full of nonreentrant code, but nearly all of it is preemptible.

I can see that in the case of user-land code, but in the kernel I would think it's a different story, especially when spinlocks/SMP is involved. Basically, I can't think of a way (ok, a clean way) to prevent preemptible kernel code from being re-entered. More on this later...

I would assume, however, that the QNX kernel is in fact reentrant.

That's not what I read here:

In an SMP system, QNX Neutrino maintains this philosophy of only one thread in a preemptable and restartable kernel. The microkernel may be entered on any processor, but only one processor will be granted access at a time.

That sounds non-reentrant to me. Which is weird, because I always assumed that the uniprocessor version was re-entrant. Although this gives me pause (from the same source as above):

In a uniprocessor system, only one thread is allowed to execute within the microkernel at a time.

(Emphasis mine)

Either this is a really obvious tautology (because it's a uniprocessor), or it implies that even the uniprocessor kernel is non-reentrant. I'm not sure which.

...continued...

Colonel Kernel · Post by **Colonel Kernel** » Tue Jul 13, 2004 12:59 am

Anyway, back to achieving preemption + non-reentrancy in the kernel. Here's a scenario, assuming a dual-proc system:

Processor 0 and 1 are running userland threads A and B, respectively. First, thread A makes a system call (one of the "longer-running operations"), traps to the kernel, and acquires the global kernel spinlock (the existance of which we've hypothesized earlier in this topic). Next, thread B makes the same system call on processor 1, traps to the kernel, and gets stuck spinning on the global kernel spinlock.

Now, let's say the implementation of the "long-running" system call knows it's about to do something time consuming, and releases the spinlock. But now, thread B is released and is allowed to run the same system call code -- meaning it must be reentrant. This contradicts the initial assumption that the kernel is non-reentrant (otherwise what would be the purpose of the global spinlock...?), so that can't be how it works.

So, instead of the long operation releasing the spinlock, what if there are two spinlocks? One for the scheduler (i.e. -- the preemption mechanism) and another for the rest of the kernel (which we assume runs at a lower interrupt priority/IRQL/whatever you want to call it, than the scheduler itself).

In this case, thread A is still holding the spinlock when a timer interrupt occurs. Its handler runs, sees that an alarm has expired, and queues a s/w interrupt to run the scheduler. When the timer ISR returns, the s/w interrupt is delivered, running the scheduler. The scheduler acquires its own spinlock, wakes up thread C, which was waiting on the alarm that just expired, and sees that thread C has a higher priority than A. So, it decides to put thread A on hold and dispatch to thread C (first releasing its own spinlock, of course).

Now, there are several possibilities here. The scheduler could forcibly grant the global kernel spinlock to thread B, allowing it to start running the long-operation, before dispatching to thread C. So far so good, the "rest of the kernel" is still protected from reentrancy and thread A is none the wiser. However, consider this -- what if thread C were in thread A's position (i.e. -- it was actually a kernel thread running the long operation with ownership of the global spinlock when it was preempted for some reason)? How is the scheduler supposed to know this and juggle the global spinlock from thread A to C? Yuck! Even if thread C were just a user thread, how would thread A ever be resumed safely? I guess the scheduler itself would have to know to reacquire the global spinlock before resuming thread A... again, yuck! All of these "solutions" smell pretty bad.

The only solution I can think of that truly works is if this "long-running operation" is implemented as a single thread in the kernel. In this case, because it's only one thread, it can only be scheduled on one processor at any given time. It's thus in no danger of being re-entered, and is trivially preemptible, because it's just a thread. What bugs me about this is the overhead of the additional context switches. Again, yuck.

Again, my goal here is to understand all the possibilities for these kernel control-flow mechanisms. I guess to summarize, I want to understand how "the other guys do it" before I undertake designing this portion of my own kernel. So, either the QNX docs are messed up (a distinct possibility in my mind), or I'm missing some elegant, efficient, and obvious way to make a non-reentrant kernel preemptible, or it's just a bad idea because of all the smelly hacks I referred to earlier.

Man, I love this stuff.

Brendan · Post by **Brendan** » Tue Jul 13, 2004 11:05 am

Hi,

IMHO either QNX docs are messed up or QNX isn't very re-entrant or pre-emptable. Either way I'd be tempted to ignore QNX.

Colonel Kernel wrote: Again, my goal here is to understand all the possibilities for these kernel control-flow mechanisms. I guess to summarize, I want to understand how "the other guys do it" before I undertake designing this portion of my own kernel.

I'm not very familiar with other OS's but I'm willing to describe mine. Perhaps people will point out alternatives with different advantages/disadvantages.

First a short intro to my OS. It's a microkernel where device drivers are user level processes. The kernel contains:
IO port protection
IRQ handling
Basic chip handling (ISA DMA, PIC/APICs, PIT and RTC)
Linear memory management
Physical memory management
Scheduler
IPC/messaging

It's designed for single CPU and multi-CPU systems (up to 256 CPUs). The kernel will be as re-entrant and as pre-emptable as possible.

IRQ Handling

When an IRQ is received by a CPU the kernel sends a message to any user level threads that have requested it (the priority of each thread determines the user level IRQ handler's priority). I don't allow nested interrupts - each IRQ is handled by an "interrupt gate" so interrupts are disabled until the message is sent.

On a multi-CPU computer the IO APIC/IRQ uses the "lowest priority delivery mode" and the kernel's IRQ handlers adjust the "Task Priority Register" so an IRQ will be accepted by a CPU that isn't handling any other IRQs if possible. In this way several IRQs may be being handled by the kernel at the same time (by different CPUs) without IRQ nesting.

On a single-CPU computer it adds an (IMHO negligable) increase in interrupt latency. Because the kernel doesn't have to bother with nested IRQs the kernel's IRQ handler can switch contexts directly within the IRQ handler.

SPINLOCKS

The kernel uses 2 types of spinlocks for quick operations. The first can only be acquired by one thread/CPU at a time. This type of lock is used for data that is always modified and never just read.

The second type of spinlock allows read/write access so that multiple threads/CPUs can read at the same time, but no other thread can have any access if its being modified. It uses a 32 bit value, where 0 = unused, positive values are a count of the number of readers and -1 is used when it's locked for modifying. This type of lock is used when data may only need to be read.

In any case both types of locks disable interrupts when they are acquired, and restore interrupts when the lock is released. All kernel data is split into the smallest lockable structures possible. For example, rather than having a large data structure and one lock for all message queues there's a seperate lock for each message queue, so that seperate queues can be modified at the same time.

LONGER OPERATIONS

Longer operations are split into several small operations that use the spinlocks above. The longest operations that are in the kernel is the code that creates and terminates a thread. As an example, the code that creates a thread is split into these smaller operations:

Create entry in thread state table with "state = init" (first lock)
Create initial address space (no lock needed)
Clone the process's address space (second lock)
Create thread data area (no lock needed as thread "state = init")
Add thread to the process data area (third lock)
Change entry in thread state table to "state = ready" (fourth lock)
Add thread to "ready to run" scheduler queue (fifth lock)

Notice that this code can be pre-empted most of the time, but does use a lot of locks (terminating a thread is worse - around 10 locks). Most of these locks aren't in use for long tough, and most operations only use one or two locks.

Consider the code for the multi-CPU version of the simpler (more common) spinlock:

Code: Select all

lock:
     pushfd
     mov edi,[GS:CPUlocalAPICaddress]
     cli
     push dword [edi+LAPICtaskPriority]
     mov dword [edi+LAPICtaskPriority],0xFF
     lock bts [theLock],1
     jnc .l2
.l1:
     popfd
     pause
     pushfd
     cli
     lock bts [theLock],1
     jc .l1
.l2:

unlock:
     lock btc [theLock],1
     jnc .criticalErrorHandler!!!
     pop dword [edi+LAPICtaskPriority]
     popfd

If the lock is free the overhead is negligable but if it isn't many cycles can be wasted. More locks means less chance of the CPU spinning. I'm looking at over 300 little spinlocks (mostly due to the way I manage physical memory).

Cheers,

Brendan

OSDev.org

Designing a kernel to be preemptible

Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible

Re:Designing a kernel to be preemptible