Hardware Task swiching

Ozguxxx · Post by **Ozguxxx** » Mon Jun 14, 2004 11:31 am

hey, I use hardware task switching, I have about 30 tasks and I use 3 gdt descriptor in total. 1 for kernel task, 1 for user task, 1 for idle task. In fact you can implement all of them with 1 gdt descriptor.

Brendan · Post by **Brendan** » Mon Jun 14, 2004 1:15 pm

Hi,

Ozgunh82 wrote: hey, I use hardware task switching, I have about 30 tasks and I use 3 gdt descriptor in total. 1 for kernel task, 1 for user task, 1 for idle task. In fact you can implement all of them with 1 gdt descriptor.

You'd need at least 2 GDT descriptors wouldn't you? One for the task you're switching from and one for the task you're switching to...

Cheers,

Brendan

Ozguxxx · Post by **Ozguxxx** » Mon Jun 14, 2004 2:12 pm

I dont understand why we need 2 descriptors. Do you need a descriptor for task that is currently running?

Brendan · Post by **Brendan** » Mon Jun 14, 2004 5:17 pm

Hi,

Ozgunh82 wrote: I dont understand why we need 2 descriptors. Do you need a descriptor for task that is currently running?

Ooops! You're right - the CPU caches the base address and limit of the current TSS.

Cheers,

Brendan

proxy · Post by **proxy** » Mon Jun 14, 2004 10:39 pm

so just to get this straight, you could just load a TSS into an entry in the GDT every time you switch if you wanted? is this terribly inefficient?

proxy

Candy · Post by **Candy** » Mon Jun 14, 2004 11:03 pm

proxy wrote: so just to get this straight, you could just load a TSS into an entry in the GDT every time you switch if you wanted? is this terribly inefficient?

proxy

AFAIK, the cpu doesn't cache the GDT anyway, so I guess not so much. Still, it's not fast either.

Ozguxxx · Post by **Ozguxxx** » Tue Jun 15, 2004 2:04 am

Why is that terribly inefficient?

Brendan · Post by **Brendan** » Tue Jun 15, 2004 7:05 am

Hi,

Hardware task switching can be terribly in-efficient in general. Loading a TSS into an entry in the GDT every time you switch tasks would add roughly 10 cycles to the task switch (which is 300 cycles or more).

A software task switch could be up to 10 times faster, but this depends a lot on how segment registers are used and if CR3 is changed too.

The "hardware task switching vs. software task switching" subject tends to come up fairly often, but I couldn't find it in the wiki. I'm going to have a go at adding a pile of information to the scheduling/context switch page (probably take me an hour or so)...

Cheers,

Brendan

Pype.Clicker · Post by **Pype.Clicker** » Tue Jun 15, 2004 8:09 am

it's not in the wiki so far because whether you'll prefer hardware or software switching (or hybrid) depends much on what's your design goals.

If you mainly have inter process switches and that process have their own I/O map, for instance, hardware switching can outrun software switching, while for intra-process switches, nothing can beat software switching (since the hardware doesn't see it shouldn't change CR3 and flush TLBs)

Also think that with software TS, you'll have to handle saving/restoring FPU registers yourself, while hardware TS can assist you by issueing an exception when you try to access FPU state after a switch occured (and thus a save/restore is needed) while avoiding saving/restoring when switching from/to a task that doesn't use FPU at all

Brendan · Post by **Brendan** » Tue Jun 15, 2004 9:20 am

Hi,

Pype.Clicker wrote: it's not in the wiki so far because whether you'll prefer hardware or software switching (or hybrid) depends much on what's your design goals.

I'm currently adding it - hope you don't mind (feel free to remove or edit it if you do)

I feel it might be good to explain how each method could be used and what the problems and benefits of each are...

Pype.Clicker wrote: If you mainly have inter process switches and that process have their own I/O map, for instance, hardware switching can outrun software switching, while for intra-process switches, nothing can beat software switching (since the hardware doesn't see it shouldn't change CR3 and flush TLBs)

Also think that with software TS, you'll have to handle saving/restoring FPU registers yourself, while hardware TS can assist you by issueing an exception when you try to access FPU state after a switch occured (and thus a save/restore is needed) while avoiding saving/restoring when switching from/to a task that doesn't use FPU at all

Points noted!

AFAIK with hardware switching modern CPUs (after 486??) won't flush TLB's if the new CR3 is the same as the old CR3.

Also, with software switching you can still use the hardware's automatic FPU/MMX/SSE state saving - all you do is set TS (same as CPU would). Although I'm also mentioning that this doesn't work so well in multi-processor systems..

Thanks,

Brendan

Brendan · Post by **Brendan** » Tue Jun 15, 2004 9:48 am

Hi,

Ok - I've finished messing about with the wiki

The context switching page is: http://www.osdev.org/osfaq2/index.php/C ... 0Switching

If the moderators want to moderate it or if anyone can think of something I've missed (or got wrong) feel free to let me know or make changes. One thing it's lacking is links to related material (I haven't figured that out yet).

Thanks,

Brendan

Pype.Clicker · Post by **Pype.Clicker** » Tue Jun 15, 2004 10:13 am

Brendan wrote: I'm currently adding it - hope you don't mind (feel free to remove or edit it if you do)

I feel it might be good to explain how each method could be used and what the problems and benefits of each are...

np. that's nice to have it and your explanation is neutral enough

AFAIK with hardware switching modern CPUs (after 486??) won't flush TLB's if the new CR3 is the same as the old CR3.

I'd be interrested in any evidence of such a thing. All the manuals i've read so far were (though not really clear) stating the contrary.

Pype.Clicker · Post by **Pype.Clicker** » Tue Jun 15, 2004 10:20 am

From System Programming Manual, p 101, order id 245472-012.

All of the (non-global) TLBs are automatically invalidated any time the CR3 register is loaded (unless the G flag for a page or page-table entry is set, as describe later in this section). The CR3 register can be loaded in either of two ways:
" Explicitly, using the MOV instruction, for example: MOV CR3, EAX where the EAX register contains an appropriate page-directory base address.
" Implicitly by executing a task switch, which automatically changes the contents of the CR3 register.

Brendan · Post by **Brendan** » Tue Jun 15, 2004 11:26 am

Hi,

Ok, I'm not all that sure now

I found this (from comp.lang.asm.x86, "Re: x86 architecture questions" posted by Jack Klein):

In the documentation for the first processor that supported paging, the 386, Intel specifically stated that loading CR3 (page directory register) did not flush the TLB if the new value was the same as the old one.

This statement was missing from the documentation for the 486, although it was verified to be true.

As for other, later processors, either consult the Intel documentation or contact Intel's technical support.

The only part of it I can actually verify is that "5.2.5 Page Translation Cache" from "INTEL 80386 PROGRAMMER'S REFERENCE MANUAL 1986" says:

The existence of the page-translation cache is invisible to applications programmers but not to systems programmers; operating-system programmers must flush the cache whenever the page tables are changed. The page-translation cache can be flushed by either of two methods:

1. By reloading CR3 with a MOV instruction; for example:

MOV CR3, EAX

2. By performing a task switch to a TSS that has a different CR3 image
than the current TSS. (Refer to Chapter 7 for more information on
task switching.)

Now, the latest manual says:

Implicitly by executing a task switch, which automatically changes the contents of the CR3 register.

Now this (to me) isn't very definitive. If the new value of CR3 is the same as the old value, has CR3 been "changed" or is it the same?

I've always assumed that the CPU manufacturers wouldn't flush the TLBs during a hardware task switch if CR3 is the same, as they have no reason (that I can think of) for doing so, and it could seriously effect performance when hardware task switching is used for multi-threading.

Assumptions aren't fact though, so now that I've looked into it I'm not entirely sure..

Cheers,

Brendan

quaak · Post by **quaak** » Wed Jun 23, 2004 6:26 pm

Pype.Clicker wrote: it's not in the wiki so far because whether you'll prefer hardware or software switching (or hybrid) depends much on what's your design goals.

If you mainly have inter process switches and that process have their own I/O map, for instance, hardware switching can outrun software switching, while for intra-process switches, nothing can beat software switching (since the hardware doesn't see it shouldn't change CR3 and flush TLBs)

Also think that with software TS, you'll have to handle saving/restoring FPU registers yourself, while hardware TS can assist you by issueing an exception when you try to access FPU state after a switch occured (and thus a save/restore is needed) while avoiding saving/restoring when switching from/to a task that doesn't use FPU at all

io bitmap:

Its done via the virtual memory trick, simply map for every address space
the io map at the same place behind your single system tss.

fpu:
Same for software task switching, let the cpu generate a exception
if the fpu is accessed. Then you can also lazy safe the fpu status with
software task switching.

OSDev.org

Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching

Re:Hardware Task swiching