all processor architectures require some structure to store this information in, so basically you are critisizing them for what they called it?
you must have a structure with this information in it, if they had called it the YGG structure, you would like it, but because they called it a TSS you don't? -- this doesn't make sence to me
the TSS is simply a structure containing some required values:
RSP0
RSP1
RSP2
IST0
IST1
IST2
IST3
IST4
IST5
IST6
IST7
and the I/O permission map
other than a few RESERVED entries (appropriately located for expansion), that is all it is!
you cannot use ring transitions without this information! so that is why the structure must be present
Alternative multitasking method; which is faster
Re:Alternative multitasking method; which is faster
Hi,
I wouldn't blame AMD for keeping the "less clean" parts though - the CPU needs to support 32 bit protected mode (for marketting reasons), so designing long mode to re-use the silicon needed by protected mode makes perfect sense.
Cheers,
Brendan
Some architectures don't need this information, and there's other (cleaner) ways it could have been implemented.JAAman wrote:all processor architectures require some structure to store this information in, so basically you are critisizing them for what they called it?
I wouldn't blame AMD for keeping the "less clean" parts though - the CPU needs to support 32 bit protected mode (for marketting reasons), so designing long mode to re-use the silicon needed by protected mode makes perfect sense.
Yes you can - have a look at SYSCALL (the CPL=3 stack is used even though the CPU changes to CPL=0, and none of the information in the TSS is used at all).JAAman wrote:you cannot use ring transitions without this information! so that is why the structure must be present
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Alternative multitasking method; which is faster
oh, so you mean to run all your ISRs in ring3?
...or disable them altogether
unless there is a way i'm not familier with, you cannot use syscall for ISRs
i cannot think of any 'cleaner' way it could possibly be implemented -- its just a list of addresses, what about that isn't 'clean' -- it isn't even close to any relation of the PMode TSS (and please don't make my kernel use the user stack!!)
...or disable them altogether
unless there is a way i'm not familier with, you cannot use syscall for ISRs
i cannot think of any 'cleaner' way it could possibly be implemented -- its just a list of addresses, what about that isn't 'clean' -- it isn't even close to any relation of the PMode TSS (and please don't make my kernel use the user stack!!)
Re:Alternative multitasking method; which is faster
Hi,
Now consider SYSCALL (where the stack isn't changed, and everything else comes from MSRs):
By shifting ESP0, ESP1, ESP2 and the IST out of the TSS and using MSRs instead it would prevent 2 potential cache misses for every interrupt. By shifting the IDT into the CPU it'd prevent another 2 potential cache misses. The "worst case" overhead of an interrupt could be halved.
Basically (IMHO), when the 80386 was designed (i.e. when 32 bit protected mode was designed) there wasn't a large difference between RAM speed and CPU speed (AFAIK they both ran at the same speed, no caches where needed and there wasn't any cache miss or RAM access penalties). Things have changed - RAM speed didn't keep up with CPU speed, and the design of 32 bit protected mode (including the TSS, which has been recycled for long mode) isn't really suitable for modern CPUs because of this.
Cheers,
Brendan
You stated that "you cannot use ring transitions without this information!". I merely showed that not only was a ring transition possible without this information, but that it's "standard practice" for the SYSCALL instruction.JAAman wrote:oh, so you mean to run all your ISRs in ring3?
...or disable them altogether
Of course - I didn't say all ring transitions can currently be done without the TSS.JAAman wrote:unless there is a way i'm not familier with, you cannot use syscall for ISRs
Have you considered what happens when an IRQ or exception occurs while the CPU is running at CPL=3? For a 64 bit OS, it goes something like this:JAAman wrote:i cannot think of any 'cleaner' way it could possibly be implemented -- its just a list of addresses, what about that isn't 'clean' -- it isn't even close to any relation of the PMode TSS (and please don't make my kernel use the user stack!!)
- - potential cache miss caused by TLB lookup for the page containing the IDT entry
- potential cache miss for reading the IDT entry
- get the IDT entry
- potential cache miss caused by TLB lookup for the page containing the TSS
- potential cache miss for reading ESP0 or an IST entry in the TSS
- get the address of the kernel stack
- potential cache miss caused by TLB lookup for the page of the kernel stack
- potential cache miss for the first "push" to the kernel stack
- push return values on the kernel stack
- potential cache miss caused by TLB lookup for the page containing the interrupt handler
- potential cache miss for the interrupt handler's first instruction
- start executing the interrupt handler
Now consider SYSCALL (where the stack isn't changed, and everything else comes from MSRs):
- - push return values on current stack
- potential cache miss caused by TLB lookup for the page containing the SYSCALL handler
- potential cache miss for the SYSCALL's first instruction
- start executing the SYSCALL handler
By shifting ESP0, ESP1, ESP2 and the IST out of the TSS and using MSRs instead it would prevent 2 potential cache misses for every interrupt. By shifting the IDT into the CPU it'd prevent another 2 potential cache misses. The "worst case" overhead of an interrupt could be halved.
Basically (IMHO), when the 80386 was designed (i.e. when 32 bit protected mode was designed) there wasn't a large difference between RAM speed and CPU speed (AFAIK they both ran at the same speed, no caches where needed and there wasn't any cache miss or RAM access penalties). Things have changed - RAM speed didn't keep up with CPU speed, and the design of 32 bit protected mode (including the TSS, which has been recycled for long mode) isn't really suitable for modern CPUs because of this.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.