Hi,
Combuster wrote:Czernobyl wrote:It goes without the saying, on same hardware, HW task-switching would've been an order of magnitude slower if it were emulated in software instructions.
Ahem. Let's take the 486:
Software: int@71 + iret@31, pusha@9 + popa@9, 5 data segments = 5 * 9(load)+3(save), save cr3@4 load cr3@4
= 188 cycles
Hardware task switch: 309 cycles.
That gives the software emulation 121 clock cycles to do other bits of administration.
More accurately; the cost of switching from user-space to kernel has nothing to do with task switching; and (for most OSs) you only ever really switch from kernel code to kernel code. With this in mind there's no need to save/load segment registers (which involves several protection checks each), and (depending on the calling conventions you're using) it's likely that several general purpose registers needn't be preserved either. Also, CR3 tends to be unmodified (doesn't need to be saved on task switch, and is only loaded).
Essentially; a hardware task switch can typically be replaced by about 4 stack pushes, 3 moves (store ESP, load new ESP and CR3) and 4 stack pops (maybe 25 cycles).
Of course there's always additional work that hardware task switching doesn't do; like dealing with FPU/MMX/SSE/AVX state, accounting for time consumed by threads, etc. For the sake of making the comparison more accurate, let's say this additional work adds up to 100 cycles. That makes hardware task switching about 300+100 = 400 cycles and software task switching about 25 + 100 = 125 cycles.
There's no reason to stop there though. Imagine a system consisting of a collection of threads that communicate with each other (
an example). In this case there's the work done by the tasks between task switches, and also the overhead of switching between user-space and kernel space, the overhead of IPC, plus the overhead of deciding which task to switch to. This can add up to maybe around 600 cycles of other stuff between task switches (assuming well designed code including SYSCALL/SYSENTER, an O(1) scheduler, "process context IDs" to avoid TLB flushing, no tasks using FPU/MMX/SSE/AVX, etc). Now we're looking at 600 + 400 = 1000 cycles for hardware task switching vs. 600 + 125 = 725 cycles for software task switching; or in other words software task switching may make the overall performance of the entire system 35% faster for the (relatively typical, especially for micro-kernels) "lots of communication" case.
Now I'm not saying that any of the figures above are 100% accurate. However, I am saying that for normal OSs hardware task switching is worse/slower and has no advantages at all, and is therefore worthless trash.
Czernobyl wrote:in summary, you won't find much on full-fledged use of TSS-style task switching, nor protection-by-segmentation, except by studying the manuals.
Correct. Various operating systems have tried it, they all found that it sucked, and the world moved to better alternatives. Heck, even Linux used hardware task switching once upon a time (back when Linus was a beginner).
Czernobyl wrote:Most tutorials are biased towards software tasks and crippled (aka flat) segments - which is perfectly good, just not what X86 (16 and 32 bit) protected mode designers had in mind.
Before Intel knew how software would use the protected mode, what they had in mind was to build a flexible system able to cope with many possibilities ("cast a wide net"). Because of this (and backward compatibility) we're left with worthless historical baggage.
Cheers,
Brendan