Page 1 of 1
Question regarding kernel preemption points
Posted: Sun Mar 25, 2007 8:35 am
by Captus
Hi everyone,
I have a question i hope some of you in here could answer
The question:
In RTOS design a way to make a kernel preemptive could be to use preemption points. Why?.. why doesn't preemption work during system calls using the timer interrupt that operates the CPU scheduler? As far as I can tell it's because interrupts are disabled during system calls, but I cant figure out for what reason they are disabled during system calls?..
Any thoughts or explainations on this would be greatly appreciated
![Smile :)](./images/smilies/icon_smile.gif)
preemption points
Posted: Sun Mar 25, 2007 1:50 pm
by Kevin McGuire
_A_ Reason Interrupts Are Disabled During System Calls
If I have two threads running.
[Thread1]
Jump Table:
[thread code] -> [kernel code] -> [hdd driver] -> [issuing I/O and DMA commands].
[Thread2]
[thread code]...
Thread1 at _this_ point in time has made a long jump into the kernel code which then made a decision using some mechanism that Thread1 was wanting to read something from disk and of course is inside the disk driver's read something function.
Thread2 would like to read something from disk. If the timer dings and the scheduler switches from Thread1 to Thread2 while it is still inside the disk driver's function, and then Thread2 jumps to kernel code, hdd driver, and then into the same function or even a different one you have a problem.
The Thread1 could be inside a function that needs to do something simple like:
write a disk address to I/O port 0x50
read a value from I/O port 0x50
If Thread1 was preempted by the timer to switch threads _when_ Thread1 has just issued the "write a disk address to I/O port 0x50" but had not issued the "read a value from I/O port 0x51" -- then Thread2 runs right into the same function and it performs:
write a disk address to I/O port 0x50
read a value from I/O port 0x50
A problem could occur because the disk hardware needs the CPU to read a value from port 0x50 right after one is written. If not that disk hardware might ignore the second sequential write to the I/O port and keep the first disk address. So when the second thread reads it will use Thread1's disk address and ultimately not get the correct data and cause a big melt down.
In reality the sequence of I/O writes are much more complex, and most of the time involve more than one device having to have commands issued. So the area of collision becomes greater. You would actually disable interrupts before the I/O write and read, then enable them after it is completed.
A Example Of Preemption Points.
I have no idea really, but I can guess that a preemption point could be where you place a tiny bit of code or a macro that enables the scheduler to switch out of the driver or system function and into the other waiting threads instead of starving them. When performing a hardware function sometimes the operation is done in blocks. Such that you might be able to allow it to take a break and let other threads run. Of course the usage of a mutex to disable entrance into that function from other threads could be required so that the hardware device the original thread was messing with when it used a preemption point could not be clobbered by something else until it has finished.
[ready DMA for a transfer]
[lock DMA channel]
[preemption point] (goto here for another transfer)
[start disk transfer using DMA]
[lock disk channel and device]
[do another transfer or exit out of kernel system call and driver]
In this manner you could keep the latency down for the kernel threads so that the scheduler is able to switch as many threads as possible into a running state to allow them to perform their required processing instead of keeping the scheduler frozen until the entire disk operation was performed using the preemption points.
The preemption point could be something simple, like calling the actual software interrupt that will trigger the scheduler to change threads or it could be something that checks the elapsed ticks for the current thread to determine if it needs to use the preemption point to prevent starvation of other threads. It would most likely not be something as simple as enabling interrupts since the timer might not trigger before it enters into the next I/O block and would causes the original problem as stated in the beginning of this message.
Posted: Sun Mar 25, 2007 2:02 pm
by Combuster
The most likely culprit are the locks: if a process acquires a lock and is then scheduled, another process can be denied access to that resource which causes stalls and failure to comply to the real-time requirements.
For comparison:
Desktop OS
Code: Select all
thread 1: require some memory - scheduled
thread 2: require some memory - will wait for thread 1
... (long delay)
thread 1: complete request
... (short delay)
thread 2: complete request
where the time constraints depend on thread 1's (if thread 1 has few timeslices compared to thread 2, thread 2 will need to wait an horrible amount of time compared to that which is needed
Realtime OS:
Code: Select all
thread 1: require some memory (will not be scheduled)
... (short delay)
thread 1: complete request
thread 2: require some memory (will not be scheduled)
... (short delay)
thread 2: complete request
notice how predictable the time usage has become.
the problem with this is that long requests will block short requests so that latency is increased. I.e: (desktop os)
Code: Select all
thread 1: request long operation (scheduled)
...(very short delay)
thread 2: perform independent short operation
instead of (realtime os)
Code: Select all
thread 1: request long operation (will not be scheduled)
... (long delay)
thread 1: complete request
thread 2: perform independent short operation
preemption points are most likely used to signal that being scheduled does not impact the performance of other processes.
Posted: Sun Mar 25, 2007 10:37 pm
by mystran
In order to have a chance in designing an RTOS, it is first important to understand what exactly is an RTOS, and what makes it "real time".
Being real-time, means the OS will be able to guarantee stuff like "this thread will be able to run for 10ms every 100ms, without ever having to wait for more than 2ms in the ready state". The key of being "real-time" is to be able to give such guarantees in terms of "real time" constants.
To be able to guarantee such a thing, you need two things: first of all you need you'll need to be able to arrange threads to be scheduled in such an order that they all meet their goals (either by suitable priorities, or by some other scheme). The other thing you need is that if you want to say "will never have to wait more than 2ms" then you can't run more than 2ms in "interrupts disabled" state. Now, this ofcourse means that if your system call could take longer than the shortest wait-time you guarantee, then you can't run the whole system call with interrupts disabled.
There are two ways to avoid that: one is to write your system calls such that only a small part (entry and exit typically, and possibly some critical sections) need to run with interrupt disabled. The other way is to put into the system call path "known safe" pre-emption points such that the time between pre-emption points is never longer than some constant.
Such pre-emption point then basicly need to put interrupts on, possibly wait a cycle or two ("sti;cli" will never check for interrupts on x86, you'll need at least "sti;nop;cli" instead) and disable interrupts again, allowing pre-emption. You can ofcourse use a combination of both the schemes. The idea of pre-emption points is that it might be easier to find spots in code where everything is in consistent state such that it's easy to suspend (and later resume) the code, without blocking other stuff by holding locks or some such stuff.
Now ofcourse pre-emption points are often used in non-realtime OS as well, since even when you don't formally attempt to provide worst-case latencies, allowing pre-emptions as often as possible will help lower the average latency. Indeed on typical desktop processor like x86 you'd have to account for TLB/cache misses, branch prediction failures and what not, if you attempted to provide real-time guarantees.. but that doesn't mean you might not want ot improve average-case latencies by not blocking pre-emptions more than you have to..
Re: Question regarding kernel preemption points
Posted: Mon Mar 26, 2007 12:15 am
by Brendan
Hi,
Captus wrote:In RTOS design a way to make a kernel preemptive could be to use preemption points. Why?.. why doesn't preemption work during system calls using the timer interrupt that operates the CPU scheduler? As far as I can tell it's because interrupts are disabled during system calls, but I cant figure out for what reason they are disabled during system calls?
You can have a kernel that is preemptable by default that uses locking to disable preemption during critical sections. Alternatively you could do the opposite - have a kernel that is not preemptable by default that uses "anti-locking" to enable preemption between critical sections. It's mostly the same with different terminology.
Now, consider a kernel that has a single kernel stack that is shared by all tasks. In this case the kernel can't be easily preempted because you'd need to return to preempted tasks in the correct order to avoid trashing the kernel stack, and you'd risk kernel stack overflow if the kernel is preempted many times. However, if the kernel made sure it has nothing on the kernel stack before allowing itself to be preempted then these problems go away.
IMHO this is what preemption points are - a way to allow preemption in kernels that have a single shared kernel stack.
BTW there are advantages to having a single shared kernel stack - space and RAM used (e.g. 1024 tasks with 4 KB kernel stacks adds up to 4 MB of RAM instead of just 4 KB) and caching (a single kernel stack would almost always be in the CPU's caches).
I also think that it'd be extremely difficult to support SMP if your using preemption points. I'm not sure that any real time systems actually use SMP though, and I can't think of how SMP can help "worst case" times regardless of how it's done. In general, SMP means that a CPU may need to wait for other CPUs to leave a critical section before it can enter that critical section, and more CPUs gives longer "worst case" times - something you don't want in a real time system.
Cheers,
Brendan
Posted: Mon Mar 26, 2007 12:45 am
by mystran
QNX supports SMP.
Posted: Mon Mar 26, 2007 1:56 am
by Brendan
Hi,
mystran wrote:QNX supports SMP.
From
this web page (on the QNX web site):
"
In an SMP system, QNX Neutrino maintains this philosophy of only one thread in a preemptable and restartable kernel. The microkernel may be entered on any processor, but only one processor will be granted access at a time."
So, what happens if you've got a simple (hypothetical) kernel with 3 functions, and on a single CPU computer function A is guaranteed to take 1 ms or less, function B is guaranteed to take 2 ms or less and function 3 is guaranteed to take 3 ms or less. On a computer with 128 CPUs, for worst case, function A may need to wait for 127 CPUs to finish executing function 3. This means that for 128 CPUs function A is guaranteed to take 382 ms ("127 * 3 + 1 ms") or less.
Of course you're right - the idea of a real time kernel is to have guaranteed worst case times, rather than fast worst case times.
Cheers,
Brendan