Hi,
I already have pre-emptive task switching but would like to add support for DPC's in my OS. I.e. I want the timer ISR to "schedule" a DPC for actually carrying out the task switch once this is required. But I have some trouble wrapping my head around how to actually control the task switch with respect to switching of stacks etc.
How things work currently: In my current OS, all ISR's start by an assembly stub that saves all "caller saves" (according to the fastcall convention my o/s uses), then calls into the registered handler which is in C (it is not necessary for the stub to save the "callee saves" because those are implicitly saved by the C code). The assembly also keeps track of our IRQL using a global variables which it pushes on the stack and sets to 1 (and pops and writes back on the way out). In this way, the IRQL always contains a "1" if we are executing some interrupt procedure, and "0" if we are not. When the ISR C-code handler is done, before returning back we check if we are now changing IRQL from "1" to "0" - if so, another C function is called to execute everything in the DPC queue.Note also that my task switcher currently, when switching away only saves the "callee-saves" stack because whoever called it (either the C code in the timer ISR or say the yield() syscall code etc.) couldn't expect the other to be saved anyway. A consequence of this design is that when there is an time-interrupt-with-task-switch scenario, every user registers get saved exactly once. Initially the "caller saves" get saved by the interrupt ISR. Then later, once the interrupt ISR invokes the taskSwitch() function the callee saves are saved prior to the switch. A consequence is that the state of the "interuptee" is not saved in one uniform place but is rather scattered over the kernel stack.
Anyway, so how would a task switch work in the DPC model? The ISR puts in a taskSwitch DPC in the queue and exits. Once all other ISR's are done, the stub will when about to change to IRQL=0 execute the DPCs. At some point it will get to the taskSwitch DPC. So what should this do? It could switch in one of the other tasks, but then it will be "switching away" from the current location in the code which will be in the middle of the taskSwitch DPC (which will thus be resumed later). This seems like a weird stack to break away from! Ideally, I would like it to break away more cleanly, maybe at the point-of-view the ISR entry or exit (the part that triggered the whole thing). What do o/s'es typically do? I.e. what do their taskSwitch DPC do and what does the stack look like once they "leave"?
Technical detail on task switching from DPC code
Re: Technical detail on task switching from DPC code
Hi,
For task switches; if anything tries to do a task switch while the current priority is non-zero I'd just set a (per CPU) "task switch was postponed" flag and wouldn't do the task switch at all; and then when the current priority is decremented to zero (in the "lower_current_priority(target_priority)" function) I'd check if the "task switch was postponed" was set and do the task switch then. Of course (depending on how the scheduler works) this could be improved by keeping track of "task ID to switch to", where if only one task switch was postponed and it was a switch to a specific task the scheduler can switch to that specific task (and avoid the overhead of figuring out which task to switch to), and where this can be set to "none" (if there are multiple postponed task switches and/or something didn't want to switch to a specific task) to make the scheduler figure out which task to switch to itself. In any case; if 20 different deferred procedure calls all ask for a task switch then only one task switch actually happens, and you'd avoid the overhead of 19 pointless task switches.
Also; once you have a "deferred procedure call" system like this, you can use it to influence the local APIC's "task priority register", to create a relationship between "deferred procedure call priority" and "local APIC IRQ priority". In this case the local APIC could/would automatically postpone/defer IRQs that are less important than the currently executing deferred procedure call.
Cheers,
Brendan
Assume that (for each CPU) you have a "current priority level" variable and 255 queues of function pointers (one for each non-zero priority level). When something calls "do_deferred_procedure_call(function, function_priority)" you compare the function's priority to the current priority level, and if it's lower priority you put the function pointer onto the queue corresponding to the function's priority, and if it's higher priority you change the current priority level and call the function and when the function returns you restore the previous/old current priority level using a "lower_current_priority(target_priority)" function. That "lower_current_priority()" function might have a loop that decrements "current_priority" then checks if the queue for the new current priority contains any deferred functions and calls them one at a time if there is. Note that this can not be done with a single stack (and can't be done with the kernel stack) because you need to do any deferred procedure calls in order of priority and not in "first in last out" order.kenz wrote:I already have pre-emptive task switching but would like to add support for DPC's in my OS. I.e. I want the timer ISR to "schedule" a DPC for actually carrying out the task switch once this is required. But I have some trouble wrapping my head around how to actually control the task switch with respect to switching of stacks etc.
For task switches; if anything tries to do a task switch while the current priority is non-zero I'd just set a (per CPU) "task switch was postponed" flag and wouldn't do the task switch at all; and then when the current priority is decremented to zero (in the "lower_current_priority(target_priority)" function) I'd check if the "task switch was postponed" was set and do the task switch then. Of course (depending on how the scheduler works) this could be improved by keeping track of "task ID to switch to", where if only one task switch was postponed and it was a switch to a specific task the scheduler can switch to that specific task (and avoid the overhead of figuring out which task to switch to), and where this can be set to "none" (if there are multiple postponed task switches and/or something didn't want to switch to a specific task) to make the scheduler figure out which task to switch to itself. In any case; if 20 different deferred procedure calls all ask for a task switch then only one task switch actually happens, and you'd avoid the overhead of 19 pointless task switches.
Also; once you have a "deferred procedure call" system like this, you can use it to influence the local APIC's "task priority register", to create a relationship between "deferred procedure call priority" and "local APIC IRQ priority". In this case the local APIC could/would automatically postpone/defer IRQs that are less important than the currently executing deferred procedure call.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Technical detail on task switching from DPC code
Hi Brendan
Thanks! I was starting to think along the same lines with respect to deferring the task switch until after all DPC's had completed (and before transitioning back to "normal" user/kernel mode from the first ISR that was triggered). It feels a bit like cheating because task switching gets special help but I don't see any way around it and it is nice and clean than the break-off boundary is always just before the transition back away from the ISR.
Regarding priorities, wouldn't ISR's be expected to always have higher priority than any DPC? So scheduling a DPC means simply adding it to a list, it will never be executed directly?
Then upon exit of the first ISR, we will execute the DPC's one-by-one in order of priority? In this way, no special stack magic etc. would be required. If a lower-priority DPC schedules a higher priority DPC then we can simply execute the DPC immediately. However, a case that might require special treatment is if a lower-priority DPC is interrupted by an ISR which then schedules a highere-priority DPC. When that ISR exits it won't execute DPCs (it knows there's another ISR active) so when that exits we go right back into the lower-priority DPC which will continue executing. Once that is done, we will then start work on the then highest-proritized DPC, but it could be argued it it is wrong that the lower-priority DPC was allowed to execute while a higher-priority DPC was waiting. Unless there's some easy solution to this I think I will accept this for now and get everything else working/stable because then management of DPCs gets simpler. Also, I don't have at the moment DPCs in mind that are extremely acute (or long-lived) where this priority might matter - the most important thing for me right now is that everything can be pre-empted by ISR's and that the ISR's are fast and that all non-acute stuff can execute in the DPC's under more normal conditions. What do you think about this? Also it might be useful (And interesting!) for more to think about the more general/correct solution for DPC's but I am a bit unsure about all the stack mess.
Thanks! I was starting to think along the same lines with respect to deferring the task switch until after all DPC's had completed (and before transitioning back to "normal" user/kernel mode from the first ISR that was triggered). It feels a bit like cheating because task switching gets special help but I don't see any way around it and it is nice and clean than the break-off boundary is always just before the transition back away from the ISR.
Regarding priorities, wouldn't ISR's be expected to always have higher priority than any DPC? So scheduling a DPC means simply adding it to a list, it will never be executed directly?
Then upon exit of the first ISR, we will execute the DPC's one-by-one in order of priority? In this way, no special stack magic etc. would be required. If a lower-priority DPC schedules a higher priority DPC then we can simply execute the DPC immediately. However, a case that might require special treatment is if a lower-priority DPC is interrupted by an ISR which then schedules a highere-priority DPC. When that ISR exits it won't execute DPCs (it knows there's another ISR active) so when that exits we go right back into the lower-priority DPC which will continue executing. Once that is done, we will then start work on the then highest-proritized DPC, but it could be argued it it is wrong that the lower-priority DPC was allowed to execute while a higher-priority DPC was waiting. Unless there's some easy solution to this I think I will accept this for now and get everything else working/stable because then management of DPCs gets simpler. Also, I don't have at the moment DPCs in mind that are extremely acute (or long-lived) where this priority might matter - the most important thing for me right now is that everything can be pre-empted by ISR's and that the ISR's are fast and that all non-acute stuff can execute in the DPC's under more normal conditions. What do you think about this? Also it might be useful (And interesting!) for more to think about the more general/correct solution for DPC's but I am a bit unsure about all the stack mess.
Re: Technical detail on task switching from DPC code
ISRs have always higher IRQLs (DIRQLs), DPC have their own IRQL, DPC_LEVEL (DISPATCH_LEVEL), ISR queues DPC into the queue (per CPU), and does nothing more, then when IRQL lowers to DPC_LEVEL, DPCs are "drained" one by one at this CPU. scheduler (dispatcher) routines are DPCs.Regarding priorities, wouldn't ISR's be expected to always have higher priority than any DPC? So scheduling a DPC means simply adding it to a list, it will never be executed directly?
Re: Technical detail on task switching from DPC code
Hi,
You need to define a relationship between "IRQ priority", "DPC priority" and "task priority" for your OS. Maybe none of the priority ranges overlap (e.g. all tasks are always lower priority than all DPCs, all DPCs are always lower priority than all IRQs"); and maybe some of the ranges overlap (e.g. a very high priority task can be higher priority than a very low priority DPC and/or a very high priority DPC can be higher priority than a very low priority IRQ); and maybe all of the ranges overlap perfectly to form a single standard priority system used by everything (e.g. "task priority N == DPC priority N == IRQ priority N").
What if (at the start of every ISR) you recorded "interrupted code's priority", and at the end of every ISR (e.g. just before the final "IRET") you executed all DPCs that are higher priority than the interrupted code's priority? In this case there's no reason to treat the first ISR as special at all; and if an ISR interrupts a low priority DPC and starts a high priority DPC, then the high priority DPC will be executed immediately before the ISR returns to the lower priority DPC.
Don't forget that DPCs may be used for things that have nothing to do with IRQs at all. For example, the kernel might use a higher priority DPC that does "acquire spinlock; do something with the data protected by the spinlock; release spinlock" so that any extra work that it notices while the lock is held can conveniently (without extra data structures/hassle) be deferred until after the lock is released (to allow the lock to be released sooner and improve lock contention, and reduce the chance that 50 other CPUs are all spinning/waiting). Of course if the DPC priority range and the IRQ priority range overlap (if high priority DPCs can be higher priority than low priority ISRs) and if local APIC's "task priority register" is effected by DPC priority; then you can also use a DPC to make sure that unimportant IRQs don't interrupt while you're holding a lock (to improve lock contention more). Maybe (if you can get the overhead of DPCs low enough) it might even makes sense to give each lock a priority level and having something like "do_function_with_lock(function, lock);" that increases the current priority to the lock's priority if necessary (if the current priority was lower than the lock's priority) as part of acquiring the lock, and then restores the priority when the lock is released.
Cheers,
Brendan
I wouldn't even assume that all IRSs always have higher priority than some user-space processes (e.g. for a "hard-real time" system and/or micro-kernel).kenz wrote:Regarding priorities, wouldn't ISR's be expected to always have higher priority than any DPC?
You need to define a relationship between "IRQ priority", "DPC priority" and "task priority" for your OS. Maybe none of the priority ranges overlap (e.g. all tasks are always lower priority than all DPCs, all DPCs are always lower priority than all IRQs"); and maybe some of the ranges overlap (e.g. a very high priority task can be higher priority than a very low priority DPC and/or a very high priority DPC can be higher priority than a very low priority IRQ); and maybe all of the ranges overlap perfectly to form a single standard priority system used by everything (e.g. "task priority N == DPC priority N == IRQ priority N").
Imagine you have a function called "foo" that occasionally starts a medium priority DPC; where "foo" is called from lots of different places - sometimes "foo" is called from code running at a high priority and sometimes "foo" is called from code running at a low priority. Whether or not the DPC that "foo" starts is executed immediately or is deferred depends on what priority the caller was using before it called "foo".kenz wrote:So scheduling a DPC means simply adding it to a list, it will never be executed directly?
To me, it seems like the cause of the problem is in your first sentence ("upon exit of the first ISR").kenz wrote:Then upon exit of the first ISR, we will execute the DPC's one-by-one in order of priority? In this way, no special stack magic etc. would be required. If a lower-priority DPC schedules a higher priority DPC then we can simply execute the DPC immediately. However, a case that might require special treatment is if a lower-priority DPC is interrupted by an ISR which then schedules a highere-priority DPC. When that ISR exits it won't execute DPCs (it knows there's another ISR active) so when that exits we go right back into the lower-priority DPC which will continue executing. Once that is done, we will then start work on the then highest-proritized DPC, but it could be argued it it is wrong that the lower-priority DPC was allowed to execute while a higher-priority DPC was waiting. Unless there's some easy solution to this I think I will accept this for now and get everything else working/stable because then management of DPCs gets simpler. Also, I don't have at the moment DPCs in mind that are extremely acute (or long-lived) where this priority might matter - the most important thing for me right now is that everything can be pre-empted by ISR's and that the ISR's are fast and that all non-acute stuff can execute in the DPC's under more normal conditions. What do you think about this? Also it might be useful (And interesting!) for more to think about the more general/correct solution for DPC's but I am a bit unsure about all the stack mess.
What if (at the start of every ISR) you recorded "interrupted code's priority", and at the end of every ISR (e.g. just before the final "IRET") you executed all DPCs that are higher priority than the interrupted code's priority? In this case there's no reason to treat the first ISR as special at all; and if an ISR interrupts a low priority DPC and starts a high priority DPC, then the high priority DPC will be executed immediately before the ISR returns to the lower priority DPC.
Don't forget that DPCs may be used for things that have nothing to do with IRQs at all. For example, the kernel might use a higher priority DPC that does "acquire spinlock; do something with the data protected by the spinlock; release spinlock" so that any extra work that it notices while the lock is held can conveniently (without extra data structures/hassle) be deferred until after the lock is released (to allow the lock to be released sooner and improve lock contention, and reduce the chance that 50 other CPUs are all spinning/waiting). Of course if the DPC priority range and the IRQ priority range overlap (if high priority DPCs can be higher priority than low priority ISRs) and if local APIC's "task priority register" is effected by DPC priority; then you can also use a DPC to make sure that unimportant IRQs don't interrupt while you're holding a lock (to improve lock contention more). Maybe (if you can get the overhead of DPCs low enough) it might even makes sense to give each lock a priority level and having something like "do_function_with_lock(function, lock);" that increases the current priority to the lock's priority if necessary (if the current priority was lower than the lock's priority) as part of acquiring the lock, and then restores the priority when the lock is released.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Technical detail on task switching from DPC code
a. If an activity is accessing private resource, it is not adversely affected (except performance-wise) by interruptions and simultaneous executions of other activities. The activity may complete after an unknown delay and with several interruptions, but its correctness is not affected.
b. If an activity is accessing shared resource, it is affected by the interruptions and simultaneous executions of its peer activities, since the accesses, in the general case, are non-atomic with respect to its peer activities. A peer is an activity which accesses the same resource which a given activity too does. Such an activity can either (b.1) handle the
interruption/simultaneous execution on its own (bulk of context switching, for e.g.), or (b.2) cannot handle the interruption/simultaneous execution on its own (synchronization with devices and other threads for e.g.)
Below I consider uniprocessor semantics alone.
Although type.a activities are not affected by ISE, in order to provide fairness, they are run with their importance elevated to the importance of the resource being accessed.
If one looks at the stack which has been switched away, its entire stack (save for the some frames at the top) consists of frames for activities of type.a. The top consists of the type.a and type.b.1 activities which carry out the actual switch.
An activity of certain importance x, A(x), can be interrupted by an activity of a higher importance x + k, A(x+k). While the latter runs, the system must allow accumulation of same-or-lower-importance activities. When A(x+k) finishes, the system must stagger down in an ordered fashion running any pending activities A(x+k), A(x+k-1), A(x+k-2), ..., in that order, before returning to A(x).
The flow is:
Regardless of their type, all activities share time as a resource. In the OP, the importance which is assigned to time is the same as assigned to the deferred calls of activities with importance higher than that of time. The importance/level is
labelled as "dpc-level".
The activity which assigns time to other activities is the context switching activity, taskSwitch.
It runs either
When the quota-expiration is signalled from a level higher than dpc-level, the switch-dpc activity is queued.
If the quota-expiration is signalled from a level lower than dpc-level, the voluntary switching of activity can be forced.
Regardless of the way the expiration is signalled, at some point in time, the dpc-level is reached and, the loop, which retires the queued dpcs, runs:
Voluntary switching is requested as:
The run-dpcs has an activity which accesses the shared resource, dpc-queue. Its importance is
, which may be higher than dpc-level. Hence, sync/drop-sync is required.
Some definitions:
(1) out-going-involuntary, in-coming-involuntary:
taskSwitch enters at Point X0 as part of one activity, and emerges at Point Y0 as part of the same or a different activity.The loop still works as it does not maintain any stale references. If it did, refreshing them and deciding based on that is enough.
(2) out-going-involuntary, in-coming-voluntary:
taskSwitch enters at Point X0 as part of one activity, and emerges at Point Y1 as part of a different activity. The drop-sync call ensures that pending dpcs are run.
(3) out-going-voluntary, in-coming-involuntary:
taskSwitch enters at Point X1 as part of one activity, and emerges at Point Y0 as part of a different activity. The loop still works as it does not maintain any stale references. If it did, refreshing them and deciding based on that is enough.
(4) out-going-voluntary, in-coming-voluntary:
taskSwitch is enters at Point X1 as part of one activity, and emerges at Point Y1 as part of the same or a different activity.The drop-sync call ensures that pending dpcs are run.
As far as running the dpcs is considered, changing threads from under the dpc-processing does not affect it.
About nesting of interrupts:
If nested interrupts are allowed, (or even if they are not they occur when running at lower levels such as the dpc-level), then multiple instances of the kernel's common interrupt handler activity (which also executes run-dpcs) will be on the stack.
Suppose IH0 is the very first instance. It records this fact atomically in the thread's context area. At any time if IH1 is pushed on top of IH0 (because of another interrupt), IH1 can know that it is not the 'primary' interrupt context, and so it can service the interrupts and let the IH0 instance deal with dpcs. IH1 knows that IH0 is still active because of the flag, and IH0 knows that, during its dpc-level processing, further dpcs can be queued.
Multiple instances of an activity can decide on the way to distribute the work. The wierdness of the stack stems from the task of deciding the freshness of the data which the interrupted activities were accessing around the switching boundary.
Suppose that each IHx instance is allowed to process dpcs, instead of only IH0. That would results in several instances of type.a and type.b.1 activities stacked on top of each other. The priorities of the dpcs would be inverted: If IH0 is about to run a dpc0, IH1 comes along and dequeues dpc1 which was later in the queue than dpc0, and begins running dpc1.
If, at level IHx, x > 1, the taskSwitch ran, the nested IH instances will be stashed away while the thread sleeps. Upon returning at Point Y0 after some amount of delay, if the IH instances refresh stale context, they can undo their stack and return back to the original user activity which was interrupted. Such a scheme requires IH instances and the dpcs to be of type.b.1, a quite complicated scheme.
About sub-priorities of DPCs:
If dpcs themselves have sub-priorities, then regard them as several distinct, levels.
b. If an activity is accessing shared resource, it is affected by the interruptions and simultaneous executions of its peer activities, since the accesses, in the general case, are non-atomic with respect to its peer activities. A peer is an activity which accesses the same resource which a given activity too does. Such an activity can either (b.1) handle the
interruption/simultaneous execution on its own (bulk of context switching, for e.g.), or (b.2) cannot handle the interruption/simultaneous execution on its own (synchronization with devices and other threads for e.g.)
Below I consider uniprocessor semantics alone.
Although type.a activities are not affected by ISE, in order to provide fairness, they are run with their importance elevated to the importance of the resource being accessed.
If one looks at the stack which has been switched away, its entire stack (save for the some frames at the top) consists of frames for activities of type.a. The top consists of the type.a and type.b.1 activities which carry out the actual switch.
An activity of certain importance x, A(x), can be interrupted by an activity of a higher importance x + k, A(x+k). While the latter runs, the system must allow accumulation of same-or-lower-importance activities. When A(x+k) finishes, the system must stagger down in an ordered fashion running any pending activities A(x+k), A(x+k-1), A(x+k-2), ..., in that order, before returning to A(x).
The flow is:
Code: Select all
sync(to = x):
raise-to-level-(x)
drop-sync(from = x, to = 0):
perform-pending-activities-A(x)
goto-level-(x-1)
perform-pending-activities-A(x-1)
goto-level-(x-2)
perform-pending-activities-A(x-2)
.
.
.
goto-level-(0)
perform-pending-activities-A(0)
labelled as "dpc-level".
The activity which assigns time to other activities is the context switching activity, taskSwitch.
It runs either
- involuntarily upon the quota-expiration of the current activity, or
- voluntarily upon the request of the current activity.
When the quota-expiration is signalled from a level higher than dpc-level, the switch-dpc activity is queued.
If the quota-expiration is signalled from a level lower than dpc-level, the voluntary switching of activity can be forced.
Regardless of the way the expiration is signalled, at some point in time, the dpc-level is reached and, the loop, which retires the queued dpcs, runs:
Code: Select all
run-dpcs // a A(dpc-level) activity
while (1) {
sync(max)
dequeue-dpc-from-listhead
drop-sync(max, dpc-level)
if-done-then-exit
// Point X0
else-call-dpc-routine // taskSwitch runs here
// Point Y0
}
Code: Select all
voluntary:
c = curr_level;
sync(dpc-level)
// Point X1
taskSwitch
// Point Y1
drop-sync(dpc-level, c)
Code: Select all
max = maximum(importance of all activities which access it)
Some definitions:
- out-going-involuntary is the current activity being forced to give up time.
- in-coming-involuntary is the new activity which went to sleep because it was forced to give up time.
- out-going-voluntary is the current activity voluntarily giving up time.
- in-coming-voluntary is teh new activity which went to sleep voluntarily.
(1) out-going-involuntary, in-coming-involuntary:
taskSwitch enters at Point X0 as part of one activity, and emerges at Point Y0 as part of the same or a different activity.The loop still works as it does not maintain any stale references. If it did, refreshing them and deciding based on that is enough.
(2) out-going-involuntary, in-coming-voluntary:
taskSwitch enters at Point X0 as part of one activity, and emerges at Point Y1 as part of a different activity. The drop-sync call ensures that pending dpcs are run.
(3) out-going-voluntary, in-coming-involuntary:
taskSwitch enters at Point X1 as part of one activity, and emerges at Point Y0 as part of a different activity. The loop still works as it does not maintain any stale references. If it did, refreshing them and deciding based on that is enough.
(4) out-going-voluntary, in-coming-voluntary:
taskSwitch is enters at Point X1 as part of one activity, and emerges at Point Y1 as part of the same or a different activity.The drop-sync call ensures that pending dpcs are run.
As far as running the dpcs is considered, changing threads from under the dpc-processing does not affect it.
About nesting of interrupts:
If nested interrupts are allowed, (or even if they are not they occur when running at lower levels such as the dpc-level), then multiple instances of the kernel's common interrupt handler activity (which also executes run-dpcs) will be on the stack.
Suppose IH0 is the very first instance. It records this fact atomically in the thread's context area. At any time if IH1 is pushed on top of IH0 (because of another interrupt), IH1 can know that it is not the 'primary' interrupt context, and so it can service the interrupts and let the IH0 instance deal with dpcs. IH1 knows that IH0 is still active because of the flag, and IH0 knows that, during its dpc-level processing, further dpcs can be queued.
Multiple instances of an activity can decide on the way to distribute the work. The wierdness of the stack stems from the task of deciding the freshness of the data which the interrupted activities were accessing around the switching boundary.
Suppose that each IHx instance is allowed to process dpcs, instead of only IH0. That would results in several instances of type.a and type.b.1 activities stacked on top of each other. The priorities of the dpcs would be inverted: If IH0 is about to run a dpc0, IH1 comes along and dequeues dpc1 which was later in the queue than dpc0, and begins running dpc1.
If, at level IHx, x > 1, the taskSwitch ran, the nested IH instances will be stashed away while the thread sleeps. Upon returning at Point Y0 after some amount of delay, if the IH instances refresh stale context, they can undo their stack and return back to the original user activity which was interrupted. Such a scheme requires IH instances and the dpcs to be of type.b.1, a quite complicated scheme.
About sub-priorities of DPCs:
If dpcs themselves have sub-priorities, then regard them as several distinct, levels.
Re: Technical detail on task switching from DPC code
Hi Brendan, thanks for your detailed reply. One thing I am wondering about are why is it necessary with dedicated DPC stacks? Why can't I just run everything off the kernel stack? Also, is one DPC stack (per processor - which means 1 since I am currently only focusing on uniprocessor) sufficient or do I need more DPC stacks (I suppose I would have to reuse them)? I mean, even if a DPC is already running, if an interrupt comes that ISR will execute off the kernel stack (interrupting the DPC) and if that starts another high priority DPC that DPC will execute as part of the ending of that high-priority ISR (still "below" the first interrupted DPC) which should work too. So there doesn't seem to be a need for more kernel stacks?
Finally another thing, which is not strictly DPC related (but maybe related to above) - I am still wondering a bit about how I can prevent deadlocks in the o/s. As I understand it, the DPC priority mechanism comes into play here by enforcing an ordering of locks. But let's take the following simple scenario. Suppose that the internal memory allocation function in the o/s need a lock "MemLock". Suppose then some DPC (spawn off by an ISR) needs to allocate some memory and grabs that lock. While holding the lock, some higher priority ISR comes in and starts a new high priority DPC (which will run immediately upon exit of that ISR), that DPC also needs memory and will also call some function that needs the MemLock. Now that DPC can wait forever. I can see two solutions:
1) The fact that a higher priority task now needs something owned by a low-priority task would automatically "elevate" the low priority task to a high priority task so that would now be "switched in" so it can complete its work and release the lock - I can see that this solution requires more stacks because you can't of course switch between things running on the same stack!
2) Maybe what should have happened was that the first DPC, at the point where it called its memory allocation function and grabbed the MemLock, it would automatically have had its priority raised (due to now being the owner of a high priority lock - requires all locks have an explicitly configured priority). If you combine this with a rule that only something of higher priority can interrupt something with lower priority, then you can be sure the above scenario can never happen - the second DPC will not execute until the first completes.
Maybe there are other options?
Finally another thing, which is not strictly DPC related (but maybe related to above) - I am still wondering a bit about how I can prevent deadlocks in the o/s. As I understand it, the DPC priority mechanism comes into play here by enforcing an ordering of locks. But let's take the following simple scenario. Suppose that the internal memory allocation function in the o/s need a lock "MemLock". Suppose then some DPC (spawn off by an ISR) needs to allocate some memory and grabs that lock. While holding the lock, some higher priority ISR comes in and starts a new high priority DPC (which will run immediately upon exit of that ISR), that DPC also needs memory and will also call some function that needs the MemLock. Now that DPC can wait forever. I can see two solutions:
1) The fact that a higher priority task now needs something owned by a low-priority task would automatically "elevate" the low priority task to a high priority task so that would now be "switched in" so it can complete its work and release the lock - I can see that this solution requires more stacks because you can't of course switch between things running on the same stack!
2) Maybe what should have happened was that the first DPC, at the point where it called its memory allocation function and grabbed the MemLock, it would automatically have had its priority raised (due to now being the owner of a high priority lock - requires all locks have an explicitly configured priority). If you combine this with a rule that only something of higher priority can interrupt something with lower priority, then you can be sure the above scenario can never happen - the second DPC will not execute until the first completes.
Maybe there are other options?
Re: Technical detail on task switching from DPC code
Hi,
If you're using the kernel stack, then "bar()" executes at high priority (same as "foo()") and creates a DPC on the kernel stack then returns to its caller (which restores the kernel stack back how it was and deletes its DPC before it can be executed).
If you're using one addition stack that is not the kernel stack (to fix the first problem); then the low priority DPC (which was created last) will be at the top of the stack and will be executed before the medium priority DPC (which was created first) is executed. In other words, it'll be broken and wrong because DPCs will be executed in "FILO order" and not in the order of their priority.
If you have a single list of DPCs, then you'd have to search to fix the second problem (either keep the list sorted and search for the correct place to insert new DPCs when they're created; or keep the list unsorted and search for the correct DPC to start). That searching is bad for performance. Using "one list per priority" avoids the need to search and therefore improves performance.
Note that this is why I was saying "it might even makes sense to give each lock a priority level and having something like "do_function_with_lock(function, lock);" that increases the current priority to the lock's priority if necessary" last time - it'd mean that most code could just acquire and release locks (without extra function calls to manage DPCs and without caring about DPCs) and it'd mean that global lock order is always followed. Of course there are cases where this can't work (e.g. lock acquired by one CPU and released by a different CPU), but those cases are relatively rare (and relatively bizarre), and it could still be beneficial for the majority of cases even if it's not useful for a minority of cases.
For "correct": while holding the lock, some ISR comes in and starts a new DPC that must be equal or lower priority than the priority of all code that could have acquired the lock already; and therefore that new DPC is deferred until after the lock is released.
Cheers,
Brendan
Assume you have something like this:kenz wrote:Hi Brendan, thanks for your detailed reply. One thing I am wondering about are why is it necessary with dedicated DPC stacks? Why can't I just run everything off the kernel stack? Also, is one DPC stack (per processor - which means 1 since I am currently only focusing on uniprocessor) sufficient or do I need more DPC stacks (I suppose I would have to reuse them)?
Code: Select all
void foo(void) { // Foo is executed as a high priority DPC
bar(); // Bar is a normal function that creates a medium priority DPC
baz(); // Baz is a normal function that creates a low priority DPC
}
If you're using one addition stack that is not the kernel stack (to fix the first problem); then the low priority DPC (which was created last) will be at the top of the stack and will be executed before the medium priority DPC (which was created first) is executed. In other words, it'll be broken and wrong because DPCs will be executed in "FILO order" and not in the order of their priority.
If you have a single list of DPCs, then you'd have to search to fix the second problem (either keep the list sorted and search for the correct place to insert new DPCs when they're created; or keep the list unsorted and search for the correct DPC to start). That searching is bad for performance. Using "one list per priority" avoids the need to search and therefore improves performance.
There are only 2 cases where this could possibly work. The first case is if DPCs happen to be created in "lowest priority first" order (regardless of when they're created and how many completely separate IRQs create them), which is almost impossible to ensure and therefore useless. The second case is if you only have "ISR priority" and "everything else priority" and nothing else; where there's no point using DPCs in normal code or from within other DPCs, and where you've lost so much of the flexibility that it's no longer worth bothering with DPCs at all.kenz wrote:I mean, even if a DPC is already running, if an interrupt comes that ISR will execute off the kernel stack (interrupting the DPC) and if that starts another high priority DPC that DPC will execute as part of the ending of that high-priority ISR (still "below" the first interrupted DPC) which should work too. So there doesn't seem to be a need for more kernel stacks?
Yes; a very common way to prevent deadlocks is to establish a "global lock order" to ensure that locks are always acquired in a specific order and always released in the reverse order; and one of the main uses of DPCs is to ensure that the "global lock order" is correctly followed - any function that can't be executed immediately because acquiring the locks it needs would break "global lock order" is deferred until it can acquire the locks in "global lock order". This means establishing a strict relationship between "DPC priority" and "global lock order". If you don't do that, then DPCs are being used wrong and you get deadlocks.kenz wrote:Finally another thing, which is not strictly DPC related (but maybe related to above) - I am still wondering a bit about how I can prevent deadlocks in the o/s. As I understand it, the DPC priority mechanism comes into play here by enforcing an ordering of locks. But let's take the following simple scenario. Suppose that the internal memory allocation function in the o/s need a lock "MemLock". Suppose then some DPC (spawn off by an ISR) needs to allocate some memory and grabs that lock.
Note that this is why I was saying "it might even makes sense to give each lock a priority level and having something like "do_function_with_lock(function, lock);" that increases the current priority to the lock's priority if necessary" last time - it'd mean that most code could just acquire and release locks (without extra function calls to manage DPCs and without caring about DPCs) and it'd mean that global lock order is always followed. Of course there are cases where this can't work (e.g. lock acquired by one CPU and released by a different CPU), but those cases are relatively rare (and relatively bizarre), and it could still be beneficial for the majority of cases even if it's not useful for a minority of cases.
This is an example of using DPCs wrong - by failing to follow a strict relationship between "DPC priority" and "global lock order" you've failed to ensure that locks are acquired in the correct order, and the end result is deadlocks.kenz wrote:While holding the lock, some higher priority ISR comes in and starts a new high priority DPC (which will run immediately upon exit of that ISR), that DPC also needs memory and will also call some function that needs the MemLock. Now that DPC can wait forever.
For "correct": while holding the lock, some ISR comes in and starts a new DPC that must be equal or lower priority than the priority of all code that could have acquired the lock already; and therefore that new DPC is deferred until after the lock is released.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.