Why don't we use the TSS for Task Switching?

Combuster · Post by **Combuster** » Tue Jul 29, 2014 5:19 am

Since I can't be bothered to post a relevant debunking quote to each of your comments, I'll just call a troll on your lack of ability to create an argument that's actually meaningful and applicable.

bluemoon · Post by **bluemoon** » Tue Jul 29, 2014 6:11 am

Czernobyl wrote:A last thought concerning efficiency again : there's been sort of a chicken and egg problem there. Had mainstream OSes adopted HW task switches more enthusiaticly, Intel engineers would undoubtably have added dedicated caching hardware for TSSes to their CPU designs, like there are dedicated caches for segmenting, paging (TLBs), etc. However I think even OS2 1.0 did not do much task switching, did it ?

I don't think so. The HW switching has a major flaw that it can only perform dumb load/save of context, not to mention the inability to do accounting. Without major design modification the CPU just can't keep track of the usage of FPU, SSE, etc therefore HW need to maintain such contexts, and that involve huge amount of data.

Czernobyl · Post by **Czernobyl** » Tue Jul 29, 2014 6:14 am

(Thank you, "Combuster" ! I was not feeling like arguing with you, anyway.)

To the OP :

I was reading up on the TSS, and I'm confused as to why I would want to do software task switching as opposed to using TSS entries in the GDT. Is it slower? Is it missing somthing? I've searched the wiki, searched the forums. I can't seem to find anything substantial on the topic.

in summary, you won't find much on full-fledged use of TSS-style task switching, nor protection-by-segmentation, except by studying the manuals. Most tutorials are biased towards software tasks and crippled (aka flat) segments - which is perfectly good, just not what X86 (16 and 32 bit) protected mode designers had in mind.

Good luck !

Czernobyl · Post by **Czernobyl** » Tue Jul 29, 2014 6:25 am

bluemoon wrote: The HW switching has a major flaw that it can only perform dumb load/save of context, not to mention the inability to do accounting. Without major design modification the CPU just can't keep track of the usage of FPU, SSE, etc therefore HW need to maintain such contexts, and that involve huge amount of data.

Admittedly the major drawback of the HW switching is its inflexibility - I think I've used the word earlier ;=) Take it or leave it, it's immutable hard/firmware. As for FPU state, I beg to differ : the system was cleverly designed, with the help of a bit in MSW (CR0) and a dedicated interrupt, so that you don't have to save the sate UNLESS and UNTIL it's needed.

SSE is newer and, I must admit (I'm an old guy) I have not studied the question - whether saving/restoring the state of SSE registers is compatible with doing HW task switches . SSE was introduced with the Pentium II IIRC, so I'd assume at that point in time Intel engineers did care ?

Brendan · Post by **Brendan** » Tue Jul 29, 2014 2:03 pm

Hi,

Combuster wrote:
Czernobyl wrote:It goes without the saying, on same hardware, HW task-switching would've been an order of magnitude slower if it were emulated in software instructions.
Ahem. Let's take the 486:

Software: int@71 + iret@31, pusha@9 + popa@9, 5 data segments = 5 * 9(load)+3(save), save cr3@4 load cr3@4
= 188 cycles
Hardware task switch: 309 cycles.
That gives the software emulation 121 clock cycles to do other bits of administration.

More accurately; the cost of switching from user-space to kernel has nothing to do with task switching; and (for most OSs) you only ever really switch from kernel code to kernel code. With this in mind there's no need to save/load segment registers (which involves several protection checks each), and (depending on the calling conventions you're using) it's likely that several general purpose registers needn't be preserved either. Also, CR3 tends to be unmodified (doesn't need to be saved on task switch, and is only loaded).

Essentially; a hardware task switch can typically be replaced by about 4 stack pushes, 3 moves (store ESP, load new ESP and CR3) and 4 stack pops (maybe 25 cycles).

Of course there's always additional work that hardware task switching doesn't do; like dealing with FPU/MMX/SSE/AVX state, accounting for time consumed by threads, etc. For the sake of making the comparison more accurate, let's say this additional work adds up to 100 cycles. That makes hardware task switching about 300+100 = 400 cycles and software task switching about 25 + 100 = 125 cycles.

There's no reason to stop there though. Imagine a system consisting of a collection of threads that communicate with each other (an example). In this case there's the work done by the tasks between task switches, and also the overhead of switching between user-space and kernel space, the overhead of IPC, plus the overhead of deciding which task to switch to. This can add up to maybe around 600 cycles of other stuff between task switches (assuming well designed code including SYSCALL/SYSENTER, an O(1) scheduler, "process context IDs" to avoid TLB flushing, no tasks using FPU/MMX/SSE/AVX, etc). Now we're looking at 600 + 400 = 1000 cycles for hardware task switching vs. 600 + 125 = 725 cycles for software task switching; or in other words software task switching may make the overall performance of the entire system 35% faster for the (relatively typical, especially for micro-kernels) "lots of communication" case.

Now I'm not saying that any of the figures above are 100% accurate. However, I am saying that for normal OSs hardware task switching is worse/slower and has no advantages at all, and is therefore worthless trash.

Czernobyl wrote:in summary, you won't find much on full-fledged use of TSS-style task switching, nor protection-by-segmentation, except by studying the manuals.

Correct. Various operating systems have tried it, they all found that it sucked, and the world moved to better alternatives. Heck, even Linux used hardware task switching once upon a time (back when Linus was a beginner).

Czernobyl wrote:Most tutorials are biased towards software tasks and crippled (aka flat) segments - which is perfectly good, just not what X86 (16 and 32 bit) protected mode designers had in mind.

Before Intel knew how software would use the protected mode, what they had in mind was to build a flexible system able to cope with many possibilities ("cast a wide net"). Because of this (and backward compatibility) we're left with worthless historical baggage.

Cheers,

Brendan

Primis · Post by **Primis** » Sat Aug 02, 2014 7:23 pm

It seems that I've gotten answers on both sides of the road here;
- "Don't use it! it's inflexible and slow!"
- "It's your kernel, do what you want to do, damned to convention!"

Thank you all for the help, I apologize if I started a flame war.
I will be experimenting with using the TSS and see if it fits my specific needs.

OSDev.org

Why don't we use the TSS for Task Switching?

Re: Why don't we use the TSS for Task Switching?

Re: Why don't we use the TSS for Task Switching?

Re: Why don't we use the TSS for Task Switching?

Re: Why don't we use the TSS for Task Switching?

Re: Why don't we use the TSS for Task Switching?

Re: Why don't we use the TSS for Task Switching?