Hi,
I've read many times that using the IA-32 hardware task switching mechanism was slower than using a software switching mechanism (e.g. saving registers an the stack).
With modern processors, is this still true ? And if so, how much slower is it to use the hardware method ?
Thanks a lot,
pini
Task switching speed
Re:Task switching speed
I've read a few comparisons and they basically ended up concluding that while one method may be marginally slower than the other (which one depends on a lot of factors iirc), that difference is far overshadowed by things the processor has to do in both cases such as invalidating bits of cache and suchlike (I forget exactly what it has to do, but you get the point). So basically unless you're going for hard realtime there's not much in it.
Re:Task switching speed
Hi,
For example, imagine you've got code like this:
Now let's assume that the OS uses flat paging only (where the same segment registers are re-used by all tasks and don't need to be changed during a task switch), that it doesn't use any LDT's, and that it uses "automatic FPU/MMX/SSE state saving".
Further, let's assume that the kernel is not re-entrant and the same kernel stack is used by all tasks.
In this case, the only thing that needs to be changed is ESP, the TS flag in CR0 and CR3. For an 80386 a hardware task switch costs around 303 cycles, while for the "minimal software task switch" it'd cost around 15 cycles. For modern CPUs the difference is probably worse.
For example, see the article quoted in the FAQ's context switching page. The main reason this is misleading is because it adds the cost of an interrupt to a task switch. For most OSs a task switch is often caused as a side-effect of a system call which needs to occur regardless (e.g. blocking on "get_message" or a task switch caused by sending a message to a higher priority task). Even for CPU bound tasks where the task switch is caused by the timer IRQ, usually the timer IRQ is needed to keep track of time anyway. For e.g.:
In almost all cases the interrupt is needed regardless of whether a task switch occurs or not.
This is like saying cars are expensive because you need to build roads, and roads cost a lot. In reality the cost of roads has nothing to do with the cost of the car itself, just like IRQ and system call overhead has nothing to do with the task switch itself.
Of course there's other benefits for software task switching (like portability, not needing hacks to support more than 8190 tasks, etc), and benefits for hardware task switching (I/O port permission bitmaps, easier virtual8086, etc).
Cheers,
Brendan
How much of the CPU state do you need to save and restore during a task switch?pini wrote:With modern processors, is this still true ? And if so, how much slower is it to use the hardware method ?
For example, imagine you've got code like this:
Code: Select all
switchTasks:
pushad
selected_task = find another task to switch to
if selected_task != current_task {
do the task switch
}
popad
ret
Further, let's assume that the kernel is not re-entrant and the same kernel stack is used by all tasks.
In this case, the only thing that needs to be changed is ESP, the TS flag in CR0 and CR3. For an 80386 a hardware task switch costs around 303 cycles, while for the "minimal software task switch" it'd cost around 15 cycles. For modern CPUs the difference is probably worse.
I've read similar articles, and most of them are completely misleading.Kemp wrote:I've read a few comparisons and they basically ended up concluding that while one method may be marginally slower than the other (which one depends on a lot of factors iirc), that difference is far overshadowed by things the processor has to do in both cases such as invalidating bits of cache and suchlike (I forget exactly what it has to do, but you get the point). So basically unless you're going for hard realtime there's not much in it.
For example, see the article quoted in the FAQ's context switching page. The main reason this is misleading is because it adds the cost of an interrupt to a task switch. For most OSs a task switch is often caused as a side-effect of a system call which needs to occur regardless (e.g. blocking on "get_message" or a task switch caused by sending a message to a higher priority task). Even for CPU bound tasks where the task switch is caused by the timer IRQ, usually the timer IRQ is needed to keep track of time anyway. For e.g.:
Code: Select all
timer_irq:
current_time++;
if(task_switch_time <= current_time) do_task_switch();
iret
This is like saying cars are expensive because you need to build roads, and roads cost a lot. In reality the cost of roads has nothing to do with the cost of the car itself, just like IRQ and system call overhead has nothing to do with the task switch itself.
Of course there's other benefits for software task switching (like portability, not needing hacks to support more than 8190 tasks, etc), and benefits for hardware task switching (I/O port permission bitmaps, easier virtual8086, etc).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Task switching speed
note that it's also possible to use _both_, getting the best of both options. In other words, you'd use e.g. pure software stack-switching when keeping in the same process. It would be possible to compare if both incoming and outgoing threads use the same TSS and do a hardware switch (after adjusting the next TSS, maybe) if they're different. That would give you the opportunity of keeping "easier to handle vm86" or any other purpose that could make you want a dedicated TSS for a specific (set) of threads ...
that's what's done in Clicker, at least .
that's what's done in Clicker, at least .