Why don't we use the TSS for Task Switching?
- Primis
- Member
- Posts: 62
- Joined: Fri May 14, 2010 3:46 pm
- Libera.chat IRC: Primis
- Location: New York, NY
- Contact:
Why don't we use the TSS for Task Switching?
I was reading up on the TSS, and I'm confused as to why I would want to do software task switching as opposed to using TSS entries in the GDT. Is it slower? Is it missing somthing? I've searched the wiki, searched the forums. I can't seem to find anything substantial on the topic.
Re: Why don't we use the TSS for Task Switching?
Hi,
When switching from one task that's running kernel code to another task that's running kernel code; part of the CPU's state is already saved somewhere (e.g. on the stack) and part of it is constant (e.g. the kernel's segment registers). In this case the amount of state that actually needs to be saved and restored is "almost none" (e.g. ESP and a few general purpose registers).
The hardware task switching mechanism saves and loads everything, even though most of that saving and loading is unnecessary. This makes it slower than necessary by default. Worse, segment register loads involve expensive lookups and protection checks. Because the segment register loads are unnecessary and hardware task switching doesn't avoid them, this makes hardware task switching a lot slower for no reason.
Next; there are things that hardware task switching does not do. For example, most OSs keep track of how much time a task consumed. This includes saving and restoring FPU/MMX/SSE/AVX state. This means that hardware task switching alone is not enough.
Finally; hardware task switching only works for 32-bit 80x86. It's not supported on 64-bit 80x86 or other CPUs (ARM, PowerPC, ....).
Basically; it's slow, inadequate and not portable. There's no valid reason to use it (excluding special purposes, like possibly the double fault exception handler).
Cheers,
Brendan
Normally, something happens (IRQ or kernel API call) that causes the CPU to switch to kernel code; then that kernel code switches from one task to another. Task switches themselves typically only switch from one task that's running kernel code to another task that's running kernel code.Primis wrote:I was reading up on the TSS, and I'm confused as to why I would want to do software task switching as opposed to using TSS entries in the GDT. Is it slower? Is it missing somthing? I've searched the wiki, searched the forums. I can't seem to find anything substantial on the topic.
When switching from one task that's running kernel code to another task that's running kernel code; part of the CPU's state is already saved somewhere (e.g. on the stack) and part of it is constant (e.g. the kernel's segment registers). In this case the amount of state that actually needs to be saved and restored is "almost none" (e.g. ESP and a few general purpose registers).
The hardware task switching mechanism saves and loads everything, even though most of that saving and loading is unnecessary. This makes it slower than necessary by default. Worse, segment register loads involve expensive lookups and protection checks. Because the segment register loads are unnecessary and hardware task switching doesn't avoid them, this makes hardware task switching a lot slower for no reason.
Next; there are things that hardware task switching does not do. For example, most OSs keep track of how much time a task consumed. This includes saving and restoring FPU/MMX/SSE/AVX state. This means that hardware task switching alone is not enough.
Finally; hardware task switching only works for 32-bit 80x86. It's not supported on 64-bit 80x86 or other CPUs (ARM, PowerPC, ....).
Basically; it's slow, inadequate and not portable. There's no valid reason to use it (excluding special purposes, like possibly the double fault exception handler).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Primis
- Member
- Posts: 62
- Joined: Fri May 14, 2010 3:46 pm
- Libera.chat IRC: Primis
- Location: New York, NY
- Contact:
Re: Why don't we use the TSS for Task Switching?
What purpose would it serve in the double fault exception handler? On a side note, can you assign a specific TSS to a specific interrupt such as the double fault?Brendan wrote: Basically; it's slow, inadequate and not portable. There's no valid reason to use it (excluding special purposes, like possibly the double fault exception handler
Re: Why don't we use the TSS for Task Switching?
http://wiki.osdev.org/IDT#I386_Task_GatePrimis wrote:On a side note, can you assign a specific TSS to a specific interrupt such as the double fault?
Re: Why don't we use the TSS for Task Switching?
Hi,
If you don't use a task gate for the double fault exception handler then the problem that caused the double fault still exists when the CPU tries to start the double fault exception handler, so the CPU can't start the double fault exception handler and ends up doing "triple fault" (reset the computer). Using a task gate for the double fault exception handler forces the CPU to switch to a "known good kernel state" (e.g. with a different kernel stack, different virtual address space, etc).
Of course it's hard to say if using hardware task switching for the double fault handler is justified or not. It's possibly easier to make sure that the other exception handlers aren't buggy (to ensure double fault doesn't happen in the first place).
Note: In long mode (where there is no hardware task switching), there's special support ("IST") for forcing a stack switch for cases where the kernel's stack may be invalid. It serves mostly the same purpose (but does less, with a lot less overhead).
Cheers,
Brendan
Double fault occurs when the CPU failed to start one of the other exception handlers; which typically happens when the kernel is buggy - e.g. either the kernel's stack isn't valid, or the current virtual address space doesn't contain the exception handlers.Primis wrote:What purpose would it serve in the double fault exception handler?Brendan wrote: Basically; it's slow, inadequate and not portable. There's no valid reason to use it (excluding special purposes, like possibly the double fault exception handler
If you don't use a task gate for the double fault exception handler then the problem that caused the double fault still exists when the CPU tries to start the double fault exception handler, so the CPU can't start the double fault exception handler and ends up doing "triple fault" (reset the computer). Using a task gate for the double fault exception handler forces the CPU to switch to a "known good kernel state" (e.g. with a different kernel stack, different virtual address space, etc).
Of course it's hard to say if using hardware task switching for the double fault handler is justified or not. It's possibly easier to make sure that the other exception handlers aren't buggy (to ensure double fault doesn't happen in the first place).
Note: In long mode (where there is no hardware task switching), there's special support ("IST") for forcing a stack switch for cases where the kernel's stack may be invalid. It serves mostly the same purpose (but does less, with a lot less overhead).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Why don't we use the TSS for Task Switching?
Actually, X86-style hardware task switching (using task gates) being too slow to consider it, is a myth - there used to be some basis to the myth _way back_, when CPUs (80286,80396) did not contain cache memory and even motherboards did not have cache on them either, for economic reasons. Actually Intel never intended X86 protected mode to be usable without at least some amount of (external) cache, and it was expecting to make good money from the sale of static RAM... but IBM and the PC compatible market decided otherwise.
Since processors started to have large caches on dye, speed (lack of) has ceased to be a valid reason not to base an OS on native X86 task switches. Real reasons were designers' sloth, and/or desire not to rely on methods supported only on X86 in order to remain portable. Same kind of reasons that played against making use of X86 segmentation in 32 bit code.
As others said, finally AMD64 has practically removed HW tasking and segmentation. Some may find this unfortunate. Anyway, for a pet 32-bit OS, nobody prevents you (OP) to experiment a design including full featured task gates, call gates and up to 4 privilege levels (rings). X86 style floating point is not a problem, contrary to what someone wrote above, as there will be a processor exception raised when trying to execute the first FP instruction after a task switch...
HTH
Since processors started to have large caches on dye, speed (lack of) has ceased to be a valid reason not to base an OS on native X86 task switches. Real reasons were designers' sloth, and/or desire not to rely on methods supported only on X86 in order to remain portable. Same kind of reasons that played against making use of X86 segmentation in 32 bit code.
As others said, finally AMD64 has practically removed HW tasking and segmentation. Some may find this unfortunate. Anyway, for a pet 32-bit OS, nobody prevents you (OP) to experiment a design including full featured task gates, call gates and up to 4 privilege levels (rings). X86 style floating point is not a problem, contrary to what someone wrote above, as there will be a processor exception raised when trying to execute the first FP instruction after a task switch...
HTH
Re: Why don't we use the TSS for Task Switching?
Hi,
Cheers,
Brendan
Have you got one of those mythical CPUs with "infinitely fast" caches then? I hear they're expensive...Czernobyl wrote:Since processors started to have large caches on dye, speed (lack of) has ceased to be a valid reason not to base an OS on native X86 task switches. Real reasons were designers' sloth, and/or desire not to rely on methods supported only on X86 in order to remain portable. Same kind of reasons that played against making use of X86 segmentation in 32 bit code.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Why don't we use the TSS for Task Switching?
Czernobyl wrote:Since processors started to have large caches on dye, speed (lack of) has ceased to be a valid reason not to base an OS on native X86 task switches.
No need for infinitely fast. The point is any frequently used TSS (relatively small structures) will most likely "live" in on-dye cache when switching tasks, thus making memory access irrelevant - that was the dominant cost of task switching on a 286 or 386... BTDTBrendan wrote:Have you got one of those mythical CPUs with "infinitely fast" caches then? I hear they're expensive...
Anyhow, i'm not preaching a religion here, just saying, anyone interested in designing a _non-conventional_ kernel is welcome to learn how full segmentation and native task switching are really supposed to work, and experiment... Fun to be had, guaranteed.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Why don't we use the TSS for Task Switching?
Invalid argument, proof wanted. I don't see how only memory access times would have made hardware task switching slower on 386 when the equivalent software implementation on the same processor would demand at least the same amount of memory accesses (actually more for the additional code involved).
-
- Member
- Posts: 510
- Joined: Wed Mar 09, 2011 3:55 am
Re: Why don't we use the TSS for Task Switching?
Yeah. If neither hardware nor software task switching suffers a cache miss, the ratio of the time taken to do software task switching to the time taken to do hardware switching should be fairly close (if not identical) to the ratio with no cache at all.Combuster wrote:Invalid argument, proof wanted. I don't see how only memory access times would have made hardware task switching slower on 386 when the equivalent software implementation on the same processor would demand at least the same amount of memory accesses (actually more for the additional code involved).
Re: Why don't we use the TSS for Task Switching?
How about the hundreds of protection checks? At least for 386 the descriptor cache is flushed every time an hardware task switch, and they are way more slower than accessing cache or even from memory.linguofreak wrote:Yeah. If neither hardware nor software task switching suffers a cache miss, the ratio of the time taken to do software task switching to the time taken to do hardware switching should be fairly close (if not identical) to the ratio with no cache at all.Combuster wrote:Invalid argument, proof wanted. I don't see how only memory access times would have made hardware task switching slower on 386 when the equivalent software implementation on the same processor would demand at least the same amount of memory accesses (actually more for the additional code involved).
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Why don't we use the TSS for Task Switching?
bluemoon wrote:How about the hundreds of protection checks?
Here's one (rhetoric) pop quiz:Combuster wrote:the equivalent software implementation
1) Which checks would be avoided in either case
2) What has that argument to do with memory and the presence/absence of caches (besides the fact that the 386 does have various function-specific caches already: the descriptor cache, TLB, prefetch queue, ...)
Though it is certainly true that in some cases code can be written to avoid DS/ES loads if not required by the OS, giving software switching a significant work advantage in those cases, but that doesn't hold for all models. The thing is that hardware switching was slower and unuseful in general, and because everybody kept doing software switching afterwards, hardware switching has been gravely neglected and has only become progressively worse in comparison. None of this has anything to do with memory and improvements thereof, but rather CPU silicon.
Re: Why don't we use the TSS for Task Switching?
It goes without the saying, on same hardware, HW task-switching would've been an order of magnitude slower if it were emulated in software instructions. However if you chose to do software task switching, you did not want to reproduce the behavior of HW TS accurately, so you could save some time by save/restoring exactly what was needed. What that bought you was some flexibility.
Whatever... my point is the argument of slowness (is this a word?) given against hardware task switches is not quite valid by itself. What's at stake is (in)flexibility : the choice of doing HW tasking constrains your design. In exchange you get nice co-routine like mechanics for free (almost).
That would be my answer to the OP : don't be deterred from studying TS and make your own opinion.
A last thought concerning efficiency again : there's been sort of a chicken and egg problem there. Had mainstream OSes adopted HW task switches more enthusiaticly, Intel engineers would undoubtably have added dedicated caching hardware for TSSes to their CPU designs, like there are dedicated caches for segmenting, paging (TLBs), etc. However I think even OS2 1.0 did not do much task switching, did it ?
Whatever... my point is the argument of slowness (is this a word?) given against hardware task switches is not quite valid by itself. What's at stake is (in)flexibility : the choice of doing HW tasking constrains your design. In exchange you get nice co-routine like mechanics for free (almost).
That would be my answer to the OP : don't be deterred from studying TS and make your own opinion.
A last thought concerning efficiency again : there's been sort of a chicken and egg problem there. Had mainstream OSes adopted HW task switches more enthusiaticly, Intel engineers would undoubtably have added dedicated caching hardware for TSSes to their CPU designs, like there are dedicated caches for segmenting, paging (TLBs), etc. However I think even OS2 1.0 did not do much task switching, did it ?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Why don't we use the TSS for Task Switching?
Ahem. Let's take the 486:Czernobyl wrote:It goes without the saying, on same hardware, HW task-switching would've been an order of magnitude slower if it were emulated in software instructions.
Software: int@71 + iret@31, pusha@9 + popa@9, 5 data segments = 5 * 9(load)+3(save), save cr3@4 load cr3@4
= 188 cycles
Hardware task switch: 309 cycles.
That gives the software emulation 121 clock cycles to do other bits of administration.
Myth: busted.
Re: Why don't we use the TSS for Task Switching?
Switching TSSs has much more work to do than your software (non-)equivalent.Combuster wrote: Let's take the 486:
Software: int@71 + iret@31, pusha@9 + popa@9, 5 data segments = 5 * 9(load)+3(save), save cr3@4 load cr3@4
= 188 cycles
Hardware task switch: 309 cycles.
That gives the software emulation 121 clock cycles to do other bits of administration.
And those cycle counts mean nothing in practice, they do NOT take account of possible waiting for memory accesses, that may dominate. Those are dependant on the relative speed and physical organisation of main memory, motherboard cache (if any), chipset, wait states and so on. And on the processor's side, pipeline, caching again, and so on.
On anything more modern than a 8086/8088 (if even that), the only practical way to do meaningful code timing is to do...precise measurements. The manual's instruction counts are bogus (rather, they are no use for actual timing as soon as memory and/or external device access is involved). This is why Intel finally stopped publishing those instruction counts as annexes to their processor manuals, starting with the Pentium (TM)
What was your point, anyway ?