Hi,
rdos wrote:Brendan wrote:For long mode; call gates must switch to 64-bit code; interrupts must switch to 64-bit code and instructions like SYSCALL/SYSENTER must switch to 64-bit code. The penalty of having stubs that switch back to 32-bit code is identical in all of these cases. It's a generic penalty (e.g. the penalty of not having a 64-bit kernel) that can't be avoided regardless of what you do (unless you have a 64-bit kernel).
They can be avoided with mode switches. If 32-bit applications run in protected mode and 64-bit applications run in long mode, both types of applications can use the fastest syscalls available on a particular processor and doesn't need penalty for stubs (well, both SYSENTER and SYSCALL does have penalties for stubs, but those are unavoidable and part of the design).
If a 64-bit application running in long mode calls the kernel API and the kernel is running in protected mode, and you have to blow away all TLB entries (including any/all entries marked as "global" specifically to prevent unwanted/unnecessary TLB flushing); then in which way do 64-bit applications use the fastest syscalls available and avoid the penalty of stubs?
rdos wrote:Brendan wrote:Being able to switching video modes (after boot) is not essential, and no amount of V86 is going to help for modern UEFI systems anyway.
It does work on all the EFI/UEFI systems I've tested, but I haven't tested Macs and similar.
How many UEFI systems did you test it on?
To be clear; virtual8086 mode should work and is supported on UEFI systems. However, there is no reason (other than ugly legacy hacks - see notes below) for UEFI firmware to ensure that the real mode IVT, any real mode BIOS functions, the real mode VBE interface, any other (protected mode) VBE interface or the video card's "ROM shadow" to exist or be usable.
Note 1: We are currently in a type of transition period, as the industry adopts UEFI and abandons BIOS. At this time, it's likely that UEFI firmware on some systems actually does (internally) use the video card's ROM code that was designed for legacy/BIOS; simply because video card manufacturers may not have provided ROMs designed for UEFI in their video cards (yet). Even if the UEFI firmware itself does use VBE internally, this still doesn't mean that VBE is left in working order for anyone else (UEFI applications, UEFI boot loaders or UEFI OSs) to use, especially after "ExitBootServices()" is called. Basically, if VBE works on any UEFI system at all, then it only works due to luck and hacks, and it is not intentional or by design. It would be entirely foolish to rely on this behaviour.
Note 2: You may be able the "cheat" and extract the legacy ROM from the PCI card directly (e.g. by manipulating the video card's BARs in PCI configuration space to map the video card's ROM into the physical address space). This would bypass the problems mentioned in "note 1". Unfortunately this won't work reliably either. For a lot of systems with inbuilt video (especially laptops) the "video ROM" is actually built into the system ROM and isn't part of the video card's PCI device at all. Also, in the longer term, the legacy "VBE" ROM will cease to exist in any form at all (especially for systems with inbuilt video where there's less need for backward compatibly with PC BIOS). It would be entirely foolish to rely on this behaviour too.
rdos wrote:Brendan wrote:Switching between long mode and protected mode means completely destroying TLBs and reloading almost everything (TSS, IDT, all segment registers, etc).
1. CR3 will always be reloaded when switching from a 32-bit application to a 64-bit application (or the reverse), because they will not use the same page tables (different applications), and thus the TLB flush is inevitable
You may be right; if the OS is crap and doesn't bother using the "global pages" feature to avoid unnecessary TLB flushes when CR3 is loaded, then switching CPU modes like this won't make the OS's "unnecessary TLB flushing" worse because the OS is already as bad as it possibly can be.
rdos wrote:2. TR register is always reloaded with every thread-switch (per thread SS0 and IO-bitmaps)
OK - that's not well optimised either; so reloading the TSS during task switches doesn't make it worse.
rdos wrote:3. Segment registers will always be reloaded on thread switches.
OK, so the OS is already very bad at doing task switches (e.g. reloading segment registers during the task switch and not just reloading segment registers when you return to CPL=3); and because the OS is already bad it's hard to make it worse.
rdos wrote:Brendan wrote:Switching back again is equally expensive. For 64-bit applications; the total cost of this (including TLB misses, etc and not just the switch itself) is going to be several thousand cycles for every system call, IRQ and exception.
IRQs and syscalls will not switch mode. Only the scheduler will switch mode as it switches between a 32-bit and 64-bit process or the reverse.
Sigh.
First you ask about running a 32-bit kernel in long mode. I spend my time writing a (hopefully) useful reply, then I figure out that you're actually thinking of running a 32-bit kernel in protected mode (and only running 64-bit applications in long mode) and that I wasted my time.
Then I spend more of my time writing another (hopefully) useful reply, assuming that you want to run a 32-bit kernel in protected mode (and 64-bit applications in long mode).
Now I'm wondering if you actually want to run a 32-bit kernel in *both* protected mode and in long mode (and not just in protected mode); and I'm wondering if I've wasted my time again.
Of course I'm also starting to wonder if you're a schizophrenic crack addict.
rdos wrote:Brendan wrote:There are 2 main reasons for applications to use 64-bit. The first reason is that the application needs (or perhaps only benefits from rather than needing) the extra virtual address space. RDOS's 32-bit kernel probably won't be able to handle "greater than 4 GiB" virtual address spaces so he'll probably completely destroy this advantage.
Buffers in syscalls will need to be memmapped into the 32-bit address space. Other than that, 64-bit applications are free to use the entire address space with no penalties. Compared to the cost of syscalls, remapping buffers is a minor overhead.
So, you're planning to add buffer remapping and support for long mode paging to your "32-bit" kernel?
rdos wrote:Brendan wrote:The other reason for applications to use 64-bit is that the extra registers and the extra width of registers makes code run faster. RDOS will probably also completely destroy the performance advantages too.
Why? The application is free to use as many of the 64-bit registers it wants. The scheduler will need to save/restore additional state for 64-bit threads, but that overhead is required in any design.
It's impossible to save or restore a 64-bit process' state in 32-bit code; as 32-bit code can only access the low 32-bits of *half* the general purpose registers (and half of the 16 MMX registers, etc). To get around that you would have to do the state saving and state loading in 64-bit code. I thought you'd do it in stubs (e.g. saving the 64-bit state before passing control to the 32-bit kernel and restoring the 64-bit state after before returning to the 64-bit process), but now you're saying you won't need stubs.
If you're making modifications to the memory management specifically to support 64-bit applications, and also making modifications to the scheduler's state saving/loading to support 64-bit applications; do you have a sane reason to bother with the 32-bit kernel at all, and would it be much better in the end to write a 64-bit kernel for 64-bit applications (and then provide a "compatibility layer" in that 64-bit kernel for your crusty old 32-bit processes)?
rdos wrote:Brendan wrote:There's 3 problems - NMIs (which you're dodgy enough to ignore),
Yes, I ignore them. I even setup NMI as a crash handler.
Brendan wrote:Machine check exceptions (which you're probably dodgy enough to have never supported anyway),
Exactly
Brendan wrote:and IRQ latency (e.g. having IRQs disabled for *ages* while you switch CPU modes). I'm guessing you're dodgy enough to ignore the IRQ latency problem too.
There is no larger IRQ latency involved in flushing the TLB with CR3 and with a change from PAE to IA32e or the reverse. Both flushes the TLB. It requires a few more instructions to change mode, and reload CR0, CR3 and IDTR, but I suspect this time is minor compared to the effects of flushing TLB.
These are all just more cases of "My OS is so bad already that it's almost impossible to make it worse"...
rdos wrote:Brendan wrote:CPU designers have a tendency to assume that only new code will use new CPU modes. I doubt AMD expected anyone to want to switch from long mode back to protected mode often. To be honest, I see it as a curiosity with dubious practical applications myself. They intended for new 64-bit kernels (that are capable of supporting old 32-bit and 16-bit applications), not the other way around.
Erm. If they had avoided to break existing modes the above would be logical, but this is not the case. I see the 32-bit segmented mode as the "super mode" of the processor, and 64-bit, 16-bit and V86 as modes that are better run as sub-modes.
I see my genitalia as a massive 14 inch meat sausage that is capable of breaking concrete. Unfortunately, sometimes reality is different to how I see things.
In the same way, reality is different to how you see things. For example; segmented mode is not a "super mode" - it's an obsolete piece of crud that no sane person has used for about 30 years (since OS/2), that was obsolete when it was first introduced. It only exists in modern CPUs because Intel never removes existing CPU features regardless of how badly everyone in the world (including Intel) wishes those features never existed. The (mostly theoretical) advantages of segmented mode have never justified the practical disadvantages, on any CPU or architecture, for any possible piece of software; and they never will.
The fact that you still think segmented mode is actually good makes me wonder how we (the OSdev community) have failed you; and if there's something we could do differently to educate severely misguided people better in future.
Cheers,
Brendan