Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best OS!

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

More results:

2-core Intel Atom (at 3GHz)
near: 24.4 million calls per second
gate: 2.4 million calls per second
syscall (load ss:esp): 1.3 million calls per second
syscall (don't load ss:esp): 3.8 million calls per second
syscall (load GS but not ss:esp): 3.4 million calls per second

Dual core Atom is even more broken in it's support for loading SS. When the SYSENTER interface is used in conjunction with loading ss:esp, performance drops to half, while when ss:esp is not loaded, performance is 60% higher. General segment register loads are not very costly here either, it is SS specifically that has a lousy implementation.

Just to prove that Brendan is wrong about SYSENTER/SYSEXIT being equal to a near call, I also tested to remove the far return in the code. That results in 7.6 million calls per second. It means that on Intel Atom, the far return takes the same amount time as the rest of the code.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by turdus »

Brendan wrote:The only difference between the "sysenter" and "alternative sysenter" method is that the former loads a different SS:ESP while the latter doesn't.
May I suggest to read the opinion of sysenter's creator about the subject: http://semipublic.comp-arch.net/wiki/SY ... ALL/SYSRET
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

Just for an interesting comparison, here are the results for the processor that has the fastest call gate implementation:

6-core AMD Phenom: (at 2.8 GHz)
near: 44.7 million calls per second
gate: 12.0 million calls per second
syscall (load ss:esp): 11.1 million calls per second
syscall (don't load ss:esp): 19.4 million calls per second
syscall (load GS but not ss:esp): 18.7 million calls per second
syscall (no far ret or ss:esp load): 29.8 million calls per second

Even on this processor, the trend is similar. Using SYSENTER and manually loading ss:esp is slower on this processor as well than a call gate. Loading a general segment register has a small impact, and using far return takes about half as long as the rest of the SYSENTER code.

I think that it can be concluded that the primary segmentation issue on modern processors is changing SS register. As little of RDOS kernel-mode code manipulates the stack, it would be a good idea to use a flat SS selector in kernel mode on modern processors. That in itself, in conjunction with using SYSENTER/SYSEXIT could provide a considerable speed-up of syscalls.

In the absence of using the stack for parameters and variables, proper protection of the thread stack can be achieved like this:
1. Allocate 3 linear pages for the stack
2. Set the lower and upper page as invalid so it page-faults when referenced in kernel
3. Set the initial ESP to base + 0x2000
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by Brendan »

Hi,
turdus wrote:
Brendan wrote:The only difference between the "sysenter" and "alternative sysenter" method is that the former loads a different SS:ESP while the latter doesn't.
May I suggest to read the opinion of sysenter's creator about the subject: http://semipublic.comp-arch.net/wiki/SY ... ALL/SYSRET
Please understand that RDOS's "sysenter method" is a lot of baggage that happens to use SYSENTER, and that RDOS's "alternative sysenter method" is slightly less baggage that also happens to use SYSENTER. At no point (including where RDOS calls it syscall) do any of these use or refer to the SYSCALL/SYSRET instructions.

For what is better, an instruction that isn't supported can't be better than an instruction that is supported. For RDOS (which is limited to 32-bit and 16-bit code) this gives the following cases:
  • "Pentium II or later" Intel CPU: SYSCALL isn't supported for 32-bit code (even if it's a modern CPU that supports SYSCALL in 64-bit code). SYSENTER is the only usable option.
  • Recent AMD CPU: Both SYSCALL and SYSENTER can be used for 32-bit code, and the difference between them will be negligible (especially when you add RDOS's baggage).
  • Less recent (32-bit only) AMD CPU: SYSENTER isn't supported. SYSCALL is the only option
  • Older AMD CPUs and "Pentium Pro or earlier" Intel CPU: Both SYSENTER and SYSCALL aren't supported
If you think about this, for 32-bit code (e.g. RDOS), SYSENTER is supported on a lot more CPUs than SYSCALL, and therefore SYSENTER is a lot more important than SYSCALL. Support for SYSCALL would help on the less recent (32-bit only) AMD CPUs (but it's easiest to pick the low hanging fruit first). ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by Brendan »

Hi,
rdos wrote:I think that it can be concluded that the primary segmentation issue on modern processors is changing SS register. As little of RDOS kernel-mode code manipulates the stack, it would be a good idea to use a flat SS selector in kernel mode on modern processors. That in itself, in conjunction with using SYSENTER/SYSEXIT could provide a considerable speed-up of syscalls.
For all segment register loads the CPU needs to fetch data from the L1 data cache (or worse) to get to the GDT/LDT entry and then do protection checks. Accessing L1 cache alone probably costs about 12 cycles. For DS, ES, FS, GS segment loads the CPU can use things like out-of-order execution and register renaming to hide the performance problem; so these segment loads seem to suck less. For CS loads the CPU can't hide the performance problem - the CPU has to wait for the CS load to complete before it can fetch the next instruction. For SS loads I'd assume similar restrictions (e.g. all calls/returns/pushes/pops need to wait for the earlier segment load to complete).

Basically, all segment register loads suck, potentially including (for e.g.) loading DS in code where all/most of the following instructions depend on DS, but sometimes the CPU can hide the suckage in some cases. Call gates suck twice as much (as the CPU has to fetch the gate's descriptors before it can start fetching the code descriptor). Both SYSENTER and SYSCALL avoid the need to fetch data from the L1 data cache (or worse) and most of the protection checks; and therefore have far less impact on a typical CPU's out-of-order execution pipeline (it'd still cause a temporary blockage, but the blockage is cleared a lot sooner). The same would apply for SYSEXIT/SYSRET compared to "RETF".


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

Brendan wrote:Hi,
turdus wrote:
Brendan wrote:The only difference between the "sysenter" and "alternative sysenter" method is that the former loads a different SS:ESP while the latter doesn't.
May I suggest to read the opinion of sysenter's creator about the subject: http://semipublic.comp-arch.net/wiki/SY ... ALL/SYSRET
Please understand that RDOS's "sysenter method" is a lot of baggage that happens to use SYSENTER, and that RDOS's "alternative sysenter method" is slightly less baggage that also happens to use SYSENTER. At no point (including where RDOS calls it syscall) do any of these use or refer to the SYSCALL/SYSRET instructions.

For what is better, an instruction that isn't supported can't be better than an instruction that is supported. For RDOS (which is limited to 32-bit and 16-bit code) this gives the following cases:
  • "Pentium II or later" Intel CPU: SYSCALL isn't supported for 32-bit code (even if it's a modern CPU that supports SYSCALL in 64-bit code). SYSENTER is the only usable option.
  • Recent AMD CPU: Both SYSCALL and SYSENTER can be used for 32-bit code, and the difference between them will be negligible (especially when you add RDOS's baggage).
  • Less recent (32-bit only) AMD CPU: SYSENTER isn't supported. SYSCALL is the only option
  • Older AMD CPUs and "Pentium Pro or earlier" Intel CPU: Both SYSENTER and SYSCALL aren't supported
If you think about this, for 32-bit code (e.g. RDOS), SYSENTER is supported on a lot more CPUs than SYSCALL, and therefore SYSENTER is a lot more important than SYSCALL. Support for SYSCALL would help on the less recent (32-bit only) AMD CPUs (but it's easiest to pick the low hanging fruit first). ;)


Cheers,

Brendan
Add that for older CPUs (for instance AMD Geode or similar), there is absolutely no reason to use SYSENTER/SYSCALL since at that time performance of segmentation hadn't started to degrade, so even if the CPUs would support SYSENTER or SYSCALL, they would be meaningless and slower. The AMD Geode does support SYSENTER, but it would be useless since the call gate speed is excellent and SYSENTER would be slower even without SS reload.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by turdus »

Brendan wrote:Please understand that RDOS's "sysenter method" is a lot of baggage that happens to use SYSENTER, and that RDOS's "alternative sysenter method" is slightly less baggage that also happens to use SYSENTER. At no point (including where RDOS calls it syscall) do any of these use or refer to the SYSCALL/SYSRET instructions.
It's not the exact instruction that's interesting, but the comparison of a way to enter kernelmode. One (SYSENTER) uses stack and segment manipulation (RDOS' "sysenter" method), the other (SYSCALL, like RDOS' "alternative sysenter" method) don't.
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

OK, so now the exception handling logic in kernel can handle 32-bit stacks, and so can the debugger and panic debugger, so now I should soon be able to carry on with switching to a 32-bit stack in kernel. :D

OTOH, I will probably not transiting the kernel to a 32-bit code segment. It's just too much work with too little utility. That would once more break the exception handlers (segment register pushes have different size in 32-bit mode, so I'll have to check each and every of those to make sure nothing breaks). I might think about it if I have a lot of time, and a working processor emulator with debugger, as I think that would be required.
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

Now I can single-step the sysenter instruction. :D
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

A new winner:

2-core Intel Core duo: (at 3 GHz)
near: 51.6 million calls per second
gate: 13.4 million calls per second

This is the processor that has the fastest call gate performance, and only lags slighty in near call performance (to i5).
rdos
Member
Member
Posts: 3307
Joined: Wed Oct 01, 2008 1:55 pm

Re: Best processor for 32-bit [rd]OS - a.k.a RDOS-OS is best

Post by rdos »

A summary of the results (sorted by call gate performance):

2-core Intel Core Duo: (at 3 GHz)
near: 51.6 million calls per second
gate: 13.4 million calls per second
sysenter: 10.5 million calls per second

6-core AMD Phenom: (at 2.8 GHz)
near: 44.7 million calls per second
gate: 12.0 million calls per second
sysenter: 16.8 million calls per second

Intel i5, 2.9GHz:
near: 56.2 million calls per second
gate: 7.2 million calls per second

2-core AMD Athlon I: (at 1GHz)
near: 19.7 million calls per second.
gate: 7.0 million calls per second.

Portable Intel Core Duo (2.13GHz):
near: 35.4 million calls per second
gate: 6.7 million calls per second

AMD Geode: (at 500MHz)
near: 5.9 million calls per second.
gate: 4.0 million calls per second.

1-core AMD Athlon, 1.2GHz:
near: 15.3 million calls per second
gate: 3.8 million calls per second

1-core Intel Celeron (2.66GHz)
near: 16.3 million calls per second
gate: 3.0 million calls per second

2-core AMD E-300 portable (at 1.2GHz):
near: 20.0 million calls per second.
gate: 2.7 million calls per second.

2-core Intel Atom (at 3GHz)
near: 24.4 million calls per second
gate: 2.4 million calls per second
sysenter: 3.6 million calls per second

Intel Celeron (400MHz)
near: 5.8 million calls per second
gate: 1.8 million calls per second
Post Reply