Task switching speed

pini · Post by **pini** » Sun Feb 12, 2006 7:39 am

Hi,

I've read many times that using the IA-32 hardware task switching mechanism was slower than using a software switching mechanism (e.g. saving registers an the stack).
With modern processors, is this still true ? And if so, how much slower is it to use the hardware method ?

Thanks a lot,

pini

Kemp · Post by **Kemp** » Sun Feb 12, 2006 12:06 pm

I've read a few comparisons and they basically ended up concluding that while one method may be marginally slower than the other (which one depends on a lot of factors iirc), that difference is far overshadowed by things the processor has to do in both cases such as invalidating bits of cache and suchlike (I forget exactly what it has to do, but you get the point). So basically unless you're going for hard realtime there's not much in it.

Brendan · Post by **Brendan** » Sun Feb 12, 2006 7:48 pm

Hi,

pini wrote:With modern processors, is this still true ? And if so, how much slower is it to use the hardware method ?

How much of the CPU state do you need to save and restore during a task switch?

For example, imagine you've got code like this:

Code: Select all

switchTasks:
    pushad
    selected_task = find another task to switch to
    if selected_task != current_task {
        do the task switch
    }
    popad
    ret

Now let's assume that the OS uses flat paging only (where the same segment registers are re-used by all tasks and don't need to be changed during a task switch), that it doesn't use any LDT's, and that it uses "automatic FPU/MMX/SSE state saving".

Further, let's assume that the kernel is not re-entrant and the same kernel stack is used by all tasks.

In this case, the only thing that needs to be changed is ESP, the TS flag in CR0 and CR3. For an 80386 a hardware task switch costs around 303 cycles, while for the "minimal software task switch" it'd cost around 15 cycles. For modern CPUs the difference is probably worse.

Kemp wrote:I've read a few comparisons and they basically ended up concluding that while one method may be marginally slower than the other (which one depends on a lot of factors iirc), that difference is far overshadowed by things the processor has to do in both cases such as invalidating bits of cache and suchlike (I forget exactly what it has to do, but you get the point). So basically unless you're going for hard realtime there's not much in it.

I've read similar articles, and most of them are completely misleading.

For example, see the article quoted in the FAQ's context switching page. The main reason this is misleading is because it adds the cost of an interrupt to a task switch. For most OSs a task switch is often caused as a side-effect of a system call which needs to occur regardless (e.g. blocking on "get_message" or a task switch caused by sending a message to a higher priority task). Even for CPU bound tasks where the task switch is caused by the timer IRQ, usually the timer IRQ is needed to keep track of time anyway. For e.g.:

Code: Select all

timer_irq:
    current_time++;

    if(task_switch_time <= current_time) do_task_switch();

    iret

In almost all cases the interrupt is needed regardless of whether a task switch occurs or not.

This is like saying cars are expensive because you need to build roads, and roads cost a lot. In reality the cost of roads has nothing to do with the cost of the car itself, just like IRQ and system call overhead has nothing to do with the task switch itself.

Of course there's other benefits for software task switching (like portability, not needing hacks to support more than 8190 tasks, etc), and benefits for hardware task switching (I/O port permission bitmaps, easier virtual8086, etc).

Cheers,

Brendan

Pype.Clicker · Post by **Pype.Clicker** » Mon Feb 13, 2006 4:57 am

note that it's also possible to use _both_, getting the best of both options. In other words, you'd use e.g. pure software stack-switching when keeping in the same process. It would be possible to compare if both incoming and outgoing threads use the same TSS and do a hardware switch (after adjusting the next TSS, maybe) if they're different. That would give you the opportunity of keeping "easier to handle vm86" or any other purpose that could make you want a dedicated TSS for a specific (set) of threads ...

that's what's done in Clicker, at least .

OSDev.org

Task switching speed

Task switching speed

Re:Task switching speed

Re:Task switching speed

Re:Task switching speed