Page 2 of 2

Re:Alternative multitasking method; which is faster

Posted: Tue Feb 28, 2006 8:51 am
by JAAman
all processor architectures require some structure to store this information in, so basically you are critisizing them for what they called it?

you must have a structure with this information in it, if they had called it the YGG structure, you would like it, but because they called it a TSS you don't? -- this doesn't make sence to me

the TSS is simply a structure containing some required values:
RSP0
RSP1
RSP2

IST0
IST1
IST2
IST3
IST4
IST5
IST6
IST7

and the I/O permission map

other than a few RESERVED entries (appropriately located for expansion), that is all it is!

you cannot use ring transitions without this information! so that is why the structure must be present

Re:Alternative multitasking method; which is faster

Posted: Tue Feb 28, 2006 12:07 pm
by Brendan
Hi,
JAAman wrote:all processor architectures require some structure to store this information in, so basically you are critisizing them for what they called it?
Some architectures don't need this information, and there's other (cleaner) ways it could have been implemented.

I wouldn't blame AMD for keeping the "less clean" parts though - the CPU needs to support 32 bit protected mode (for marketting reasons), so designing long mode to re-use the silicon needed by protected mode makes perfect sense.
JAAman wrote:you cannot use ring transitions without this information! so that is why the structure must be present
Yes you can - have a look at SYSCALL (the CPL=3 stack is used even though the CPU changes to CPL=0, and none of the information in the TSS is used at all).


Cheers,

Brendan

Re:Alternative multitasking method; which is faster

Posted: Wed Mar 01, 2006 3:54 pm
by JAAman
oh, so you mean to run all your ISRs in ring3?
...or disable them altogether

unless there is a way i'm not familier with, you cannot use syscall for ISRs


i cannot think of any 'cleaner' way it could possibly be implemented -- its just a list of addresses, what about that isn't 'clean' -- it isn't even close to any relation of the PMode TSS (and please don't make my kernel use the user stack!!)

Re:Alternative multitasking method; which is faster

Posted: Wed Mar 01, 2006 10:09 pm
by Brendan
Hi,
JAAman wrote:oh, so you mean to run all your ISRs in ring3?
...or disable them altogether
You stated that "you cannot use ring transitions without this information!". I merely showed that not only was a ring transition possible without this information, but that it's "standard practice" for the SYSCALL instruction.
JAAman wrote:unless there is a way i'm not familier with, you cannot use syscall for ISRs
Of course - I didn't say all ring transitions can currently be done without the TSS.
JAAman wrote:i cannot think of any 'cleaner' way it could possibly be implemented -- its just a list of addresses, what about that isn't 'clean' -- it isn't even close to any relation of the PMode TSS (and please don't make my kernel use the user stack!!)
Have you considered what happens when an IRQ or exception occurs while the CPU is running at CPL=3? For a 64 bit OS, it goes something like this:
  • - potential cache miss caused by TLB lookup for the page containing the IDT entry
    - potential cache miss for reading the IDT entry
    - get the IDT entry
    - potential cache miss caused by TLB lookup for the page containing the TSS
    - potential cache miss for reading ESP0 or an IST entry in the TSS
    - get the address of the kernel stack
    - potential cache miss caused by TLB lookup for the page of the kernel stack
    - potential cache miss for the first "push" to the kernel stack
    - push return values on the kernel stack
    - potential cache miss caused by TLB lookup for the page containing the interrupt handler
    - potential cache miss for the interrupt handler's first instruction
    - start executing the interrupt handler
For a 32 bit OS it's worse (you can add potential cache misses for GDT and/or LDT lookup to the list).

Now consider SYSCALL (where the stack isn't changed, and everything else comes from MSRs):
  • - push return values on current stack
    - potential cache miss caused by TLB lookup for the page containing the SYSCALL handler
    - potential cache miss for the SYSCALL's first instruction
    - start executing the SYSCALL handler
Due to the huge difference between RAM speed and CPU speed, these potential cache misses are expensive, which is (IMHO) why SYSCALL has been implemented like it has and why it's so much faster.

By shifting ESP0, ESP1, ESP2 and the IST out of the TSS and using MSRs instead it would prevent 2 potential cache misses for every interrupt. By shifting the IDT into the CPU it'd prevent another 2 potential cache misses. The "worst case" overhead of an interrupt could be halved.

Basically (IMHO), when the 80386 was designed (i.e. when 32 bit protected mode was designed) there wasn't a large difference between RAM speed and CPU speed (AFAIK they both ran at the same speed, no caches where needed and there wasn't any cache miss or RAM access penalties). Things have changed - RAM speed didn't keep up with CPU speed, and the design of 32 bit protected mode (including the TSS, which has been recycled for long mode) isn't really suitable for modern CPUs because of this.


Cheers,

Brendan