Intel propose simplified X86-S

Octocontrabass · Post by **Octocontrabass** » Mon May 22, 2023 5:35 pm

sounds wrote:Octocontrabass, I appreciate all you contribute, so can you please clarify what you think "many modern PCs no longer include a CSM" entails? Like, which products are you seeing?

I've seen servers that outright don't include a CSM. I'm not sure if I've seen any typical PCs without a CSM yet, but I have seen ones where the CSM didn't work - usually because the display adapter or NVMe boot drive didn't include a legacy option ROM.

sounds wrote:nvidia still relies on Boot Mode = Legacy/CSM:

Only to work around a bug in their UEFI option ROM (or a bug in some motherboard UEFI implementations). The UEFI option ROM is present, it just doesn't correctly initialize the display.

sounds wrote:MSI shipped with Boot Mode = Legacy/CSM as the default in 2021 -- a later update changed that, causing tons of support issues:

That's not really a UEFI problem though, that's a MSI-made-a-bad-choice problem.

sounds wrote:So, if there's a BIOS bug and the CPU doesn't respond to the ACPI wakeup request... What will your OS do then?

Hang. That's probably the same thing Windows will do, which means that kind of BIOS bug is extremely unlikely.

sounds wrote:What if it's a chromebook and doesn't support the ACPI wakeup mailbox?

Google will either come up with some other firmware interface as a substitute for ACPI or point the IA32_SIPI_ENTRY_STRUCT_PTR MSR at RAM so the OS can control what the APs do in response to SIPI. (It's entirely possible that UEFI will also point the IA32_SIPI_ENTRY_STRUCT_PTR MSR at RAM. Intel hasn't explained how they expect it to be used, just how it will function.)

rdos wrote:The effective address will need to consider segment register bases (since FS & GS still can have non-zero bases), and so this cannot be removed.

No, but it can be moved entirely to microcode. The segment base will always be zero without a FS or GS prefix, which means the logic to check the segment base no longer needs to run at all for most instructions and therefore no longer needs to be extremely fast.

rdos wrote:It's possible that 16-bit addressing could be removed, but OTOH, there might be compilers that use 16-bit addressing with FS & GS, and so this could break things.

Probably not. Even if you might save one byte using 16-bit addressing, the resulting stall in the instruction decoder usually isn't worth it.

BigBuda · Post by **BigBuda** » Mon May 22, 2023 6:35 pm

My two cents:

I've been hoping for this to happen for a long time. One of the things holding x86 back, in my opinion, is how much it's tied to legacy and to backwards compatibility. My understanding is that it implies silicon IP that effectively is rarely used nowadays, as most computers, except for edge cases (one of it being hobby OS development) now go to long mode and never leave until they're turned off. Manufacturing costs will go down, yields would go up and that's good (if it gets through the chain to us).

Octocontrabass is right, there are a lot of computers that do not allow booting in any other mode than UEFI. These are mainly server-class systems or budget laptops with only iGPUs that don't even have any useful options in the BIOS setup, but they do exist in substantial quantity (example: Lenovo IdeaPad 1 - I finished installing one of those just before commenting here, no CSM mode in sight, but also didn't even bother looking for one, as booting the live CD and installing Manjaro was flawless).

As for the other differences between x86 and x86-S, the adaptation of most mainstream OSs, if needed, would be trivial, as they mostly don't use the features proposed to be dropped or don't care about them once they're in long mode, which is the de facto statistical standard. Windows, Linux, probably the BSDs will have little to no trouble adapting to it and I suspect they'll probably welcome the change. Linux doesn't need the legacy modes, it can be booted straight from long mode, either with or without Grub (by using the stub loader - https://tecporto.pt/wiki/index.php/Boot ... bootloader). And, when used, Grub can be started from 64 bit UEFI as well. Windows the same.

Also, I'd bet x86-S CPUs and legacy x86 capable CPUs would co-exist for a while (different SKUs) because legacy support will still be required for a non-insignificant timeframe and Intel knows virtualization only goes so far.

The biggest implication of this change, in my opinion, would likely be to niche cases including the hobby OS world or systems requiring backwards compatibility.

rdos · Post by **rdos** » Wed May 24, 2023 4:15 am

It's not so much how machines are booted, using CSM or UEFI, but the incompatibility of the CPU to older software that is the problem. I mean, I can boot using 64-bit EFI and the loader will turn off long mode & paging and setup protected mode. Even with 64-bit Windows, you never know which 32-bit applications will stop working because something in protected mode is broken.

I also very much question if these changes are for getting rid of old systems, or actually will improve anything in the cost & speed of the processor. Having an extra adder for the effective address hardly causes any significant difference in complexity or transistor counts. Intel already announced that using non-zero bases for descriptors would cause extra cycles, which is fine. What is not fine is breaking protected mode by ignoring base & limits of selectors.

Maybe I could wish for a new x86-P version that does not support long mode and that doesn't support virtualization? I'm pretty sure that would be a less complex processor that could run protected mode faster, have considerably lower transistor counts and probably a higher yield. After all, the CPU with the best protected mode performance is a bit back in time. It's probably an AMD Athlon or something. After that, performance has steadily decreased, even with higher clock speeds.

BigBuda · Post by **BigBuda** » Wed May 24, 2023 5:11 am

Well, I guess those applications will either adapt, or be condemned to run in a virtual machine?

But, in any case, we can't assume that x86 will exist forever. New architectures are emerging, others have been here for years and are gaining a lot of traction because of the momentum gathered from taking over other markets. Without serious improvement, x86 will eventually fade because developers will get tired of supporting it when faced with better alternatives.

Developing OSs for x86 has been a mess for decades and it's only gotten worse. It's snowball of band-aids. A fix on a fix on a fix on a fix on a fix...

rdos · Post by **rdos** » Wed May 24, 2023 8:34 am

BigBuda wrote:Well, I guess those applications will either adapt, or be condemned to run in a virtual machine?

They won't adapt since they are no longer actively developed. It's end users that will find that some of the applications they got used to, or have to use for various reasons, no longer work on their X86-S computer.

The reason x86 is still in use is that it is backwards compatible. AMD broke virtual 86 mode within long mode, but they didn't break virtual 86 mode within protected mode, neither did they break anything else. Basically, long mode was a pure addition that broke nothing. What Intel is proposing now is not only to break real mode, virtual 86 mode and 16-bit protected mode, but also 32-bit protected mode. They will even break compatibility mode and processor core initialization. If they get away with this, then x86 no longer is a backwards compatible processor, and why would people bother with it instead of moving to ARM?

nullplan · Post by **nullplan** » Wed May 24, 2023 8:59 am

So I had a read through the PDF, and some of what they are proposing is really, really cool. For example, the 64-bit SIPI would mean that I essentially need no SMP trampoline anymore. With the ACPI mailbox system, I would still need a small trampoline to load CR0, CR3, and CR4 (before jumping to the main kernel), but with this mechanism, the CPU would read those values from memory already. That is awesome.

Also, the madlads finally did it. They actually went and got rid of lmsw. They've been threatening to do so since the 386, and now they are finally doing it. Oh my god.

Getting rid of the IO string operations is a bit of an oddity in there, but understandable. In an environment with paging, ins can lead to data loss, as when a page fault is caused, the input value is lost. And also I know of no use case for these except ATA PIO mode, and there they are easily replaced.

What I'm wondering about are the more glaring omissions from the list. Why are GDT, LDT, TSS, CS, DS, ES, and SS left around? Those are already vestigial, and the functions they still fulfill can easily be handled by additional MSRs, or possibly even flags. Also, why is the IDT still in memory? Was there not a single suitable block of 256 MSRs in range? After all, an IDT entry encodes 66 bits of information (64 bits entry address plus 1 bit for whether users are allowed to use them as soft interrupts, plus possibly one bit for whether it is a trap gate or an interrupt gate, though I have never found a use for trap gates) in 128 bits of memory, and the CPU has easier access to its own registers than to memory.

This is a golden opportunity. Functionality is being removed for once. So this would be the perfect time to get rid of these vestiges of the 286 and finally leave the 80ies behind for good.

However, I shouldn't get my hopes up. Just because someone at Intel wrote a paper doesn't mean they will actually implement it. They also proposed a simplified interrupt mechanism that was never implemented.

rdos · Post by **rdos** » Wed May 24, 2023 9:10 am

nullplan wrote:Why are GDT, LDT, TSS, CS, DS, ES, and SS left around? Those are already vestigial, and the functions they still fulfill can easily be handled by additional MSRs, or possibly even flags. Also, why is the IDT still in memory? Was there not a single suitable block of 256 MSRs in range? After all, an IDT entry encodes 66 bits of information (64 bits entry address plus 1 bit for whether users are allowed to use them as soft interrupts, plus possibly one bit for whether it is a trap gate or an interrupt gate, though I have never found a use for trap gates) in 128 bits of memory, and the CPU has easier access to its own registers than to memory.

Not so. Compatibility mode is dependent on descriptor registers and the GDT. In fact, compatibility mode makes sure you can still run your protected mode code. That's why you cannot remove any of it without breaking stuff.

What I'm more amazed about is why FS and GS are still part of the X86-S? These old bastards from the 386 processor surely should have been removed. After all, we don't want these pesky bases anymore, and you can define two MSRs instead

My two cents is that Intel wants to make another 64-bit processor, now that they failed with their initial attempt that lacked proper backward compatibility.

Octocontrabass · Post by **Octocontrabass** » Wed May 24, 2023 11:52 am

nullplan wrote:What I'm wondering about are the more glaring omissions from the list.

I suspect those are all things that have been (or can be) moved entirely to microcode. Changing them doesn't make the silicon design any simpler, it just shuffles some bits in the microcode ROM.

nullplan wrote:They also proposed a simplified interrupt mechanism that was never implemented.

They mention FRED in the X86-S proposal, so they haven't forgotten about it.

nullplan · Post by **nullplan** » Wed May 24, 2023 2:51 pm

rdos wrote:Not so. Compatibility mode is dependent on descriptor registers and the GDT. In fact, compatibility mode makes sure you can still run your protected mode code. That's why you cannot remove any of it without breaking stuff.

I do know that these things are necessary now. That is precisely the fact I am lamenting. It should not have to be the case. In long mode, CS only contains two useful bits of information: Kernel or user land, and 32 or 64 bit mode. DS, ES, and SS contain no useful information whatsoever. Yet we still have to initialize them in this arcane way that keeps sending newbies to this very forum, confused.

More sensible architectures, like PowerPC, save both bits of information that on x86 is contained in CS in a special register, and allow you to switch them on return from interrupt. No need for a special table.

The TSS contains 8 useful machine words (the stack pointers RSP0 and IST1-7). That could be done in MSRs, then the TSS would be superfluous. And since nobody needs an LDT when segmentation is as limited as it is on x86-64, nothing remains for the GDT, and so it can be removed also.

rdos wrote:What I'm more amazed about is why FS and GS are still part of the X86-S?

Because both Windows and ELF TLS use these to record the thread pointers.

rdos wrote:These old bastards from the 386 processor surely should have been removed. After all, we don't want these pesky bases anymore, and you can define two MSRs instead

Their segment bases are already MSRs, and their limits and attributes are ignored. FS and GS these days serve the role of containing a single pointer. More sensible architectures, like PowerPC, actually use a register for that. But since both x86 and x86_64 are somewhat lacking in that department, the FS/GS mechanism will have to do. Although it means you cannot set these registers without kernel support. Even with WRFSBASE, you still need the kernel to tell the userspace that it's enabled.

rdos wrote:My two cents is that Intel wants to make another 64-bit processor, now that they failed with their initial attempt that lacked proper backward compatibility.

I think you are arguing in bad faith because they are finally burying segmentation, a feature which you hold very dear. Even though the rest of the world will only say "Good riddance!" Backwards compatibility still works to the extent anyone cares about these days. For userspace, x86-S is still compatible back to the 386. Kernel space will have to adapt, but kernels are more easily changed than user space programs. dosemu will no longer work, but we have dosbox for that now. In short:

rdos wrote:If they get away with this, then x86 no longer is a backwards compatible processor, and why would people bother with it instead of moving to ARM?

Because most of what they need, want, and are familiar with will work on x86-S. Yes, some specialized applications may break, but for those there are VMs, or failing that, legacy friendly systems. If nothing else, more will work on x86-S than will work on ARM, and already a lot of stuff is working on ARM.

Octocontrabass wrote:They mention FRED in the X86-S proposal, so they haven't forgotten about it.

Must have overlooked it. So here's hoping.

rdos · Post by **rdos** » Thu May 25, 2023 2:02 am

nullplan wrote:
rdos wrote:Not so. Compatibility mode is dependent on descriptor registers and the GDT. In fact, compatibility mode makes sure you can still run your protected mode code. That's why you cannot remove any of it without breaking stuff.
I do know that these things are necessary now. That is precisely the fact I am lamenting. It should not have to be the case. In long mode, CS only contains two useful bits of information: Kernel or user land, and 32 or 64 bit mode. DS, ES, and SS contain no useful information whatsoever. Yet we still have to initialize them in this arcane way that keeps sending newbies to this very forum, confused.

This might be true in long mode, but certainly not in compatibility mode. In compatibility mode, all segments registers should operate just like in protected mode, and bases, limits and attributes should be handled properly. It's pretty obvious that you cannot ignore the base in compatibility mode, as this would break code. Ignoring the limit is less severe.

nullplan wrote: More sensible architectures, like PowerPC, save both bits of information that on x86 is contained in CS in a special register, and allow you to switch them on return from interrupt. No need for a special table.

GDT and LDT are for descriptors, including making sure that CS can only be loaded with code descriptors, and not data descriptors, gates or TSSes.

nullplan wrote: The TSS contains 8 useful machine words (the stack pointers RSP0 and IST1-7). That could be done in MSRs, then the TSS would be superfluous. And since nobody needs an LDT when segmentation is as limited as it is on x86-64, nothing remains for the GDT, and so it can be removed also.

Possible, but the STR instruction can be used in usermode to determine an unique thread ID without doing syscalls.

nullplan wrote: Their segment bases are already MSRs, and their limits and attributes are ignored. FS and GS these days serve the role of containing a single pointer. More sensible architectures, like PowerPC, actually use a register for that. But since both x86 and x86_64 are somewhat lacking in that department, the FS/GS mechanism will have to do. Although it means you cannot set these registers without kernel support. Even with WRFSBASE, you still need the kernel to tell the userspace that it's enabled.

Only in long mode. In protected mode and compatibility mode, FS and GS must be loaded with descriptors from GDT or LDT.

nullplan wrote: For userspace, x86-S is still compatible back to the 386.

Not so. Callgates were defined with the 386 to allow the kernel to define trusted entry points for userspace. This is now broken in x86-S.

In fact, the syscall mechanism invented by Intel & AMD separately is a kind of backlash to the DOS era when the OS used interrupts to request service. This is highly primitive and inefficient, but I guess Linux couldn't do any better, it having roots from (before) that era that it couldn't make go off.

sounds · Post by **sounds** » Thu May 25, 2023 8:52 am

More reasons to not move to X86-S:

* https://xorvoid.com/sectorc.html

Octocontrabass · Post by **Octocontrabass** » Thu May 25, 2023 11:43 am

rdos wrote:In compatibility mode, all segments registers should operate just like in protected mode, and bases, limits and attributes should be handled properly. It's pretty obvious that you cannot ignore the base in compatibility mode, as this would break code. Ignoring the limit is less severe.

Most 32-bit applications were written for a flat address space, so ignoring the base and limit makes no difference to them.

rdos wrote:GDT and LDT are for descriptors, including making sure that CS can only be loaded with code descriptors, and not data descriptors, gates or TSSes.

And x86-S ignores or reserves most of the bits in the descriptors, so there's no need for a table in RAM that can hold thousands of them when the OS will only use half a dozen or so.

rdos wrote:Possible, but the STR instruction can be used in usermode to determine an unique thread ID without doing syscalls.

This limits a 32-bit OS to around 8000 threads, depending on what else is in your GDT. A 64-bit OS is limited to about 4000 threads. If you place the thread ID at a fixed offset relative to FS or GS, you can access a unique thread ID without doing syscalls and without arbitrarily limiting the number of running threads in your OS.

rdos wrote:Only in long mode. In protected mode and compatibility mode, FS and GS must be loaded with descriptors from GDT or LDT.

In compatibility mode, the 64-bit OS may still directly set the FS/GS base without setting up a descriptor. But no matter how the OS does it, 32-bit applications must still use a system call.

rdos wrote:Callgates were defined with the 386 to allow the kernel to define trusted entry points for userspace. This is now broken in x86-S.

No major OSes ever used them, so most 32-bit software that would work on a 386 will still run on x86-S.

rdos wrote:In fact, the syscall mechanism invented by Intel & AMD separately is a kind of backlash to the DOS era when the OS used interrupts to request service. This is highly primitive and inefficient, but I guess Linux couldn't do any better, it having roots from (before) that era that it couldn't make go off.

Where did you get the idea that fast system call instructions are inefficient? They're extremely fast compared to call gates. Windows at least forces applications to use library code provided by the OS to perform system calls, so even the oldest 32-bit Windows programs see the advantage of fast system calls.

sounds wrote:More reasons to not move to X86-S:

* https://xorvoid.com/sectorc.html

Lack of modern hardware has never stopped code-golfers before.

rdos · Post by **rdos** » Thu May 25, 2023 3:38 pm

Octocontrabass wrote: In compatibility mode, the 64-bit OS may still directly set the FS/GS base without setting up a descriptor.

Not so. FS and GS in compatibility mode can only be loaded with selector loads. In fact, FS & GS (as well as other segment registers) can be loaded with a GDT/LDT descriptor in long mode as well. The difference in the operation is that the selector base is disabled in long mode and instead MSRs are used for the FS and GS base. In protected mode and compatibility mode, the selector base is used and the MSRs are disabled.

Octocontrabass wrote:
rdos wrote:In fact, the syscall mechanism invented by Intel & AMD separately is a kind of backlash to the DOS era when the OS used interrupts to request service. This is highly primitive and inefficient, but I guess Linux couldn't do any better, it having roots from (before) that era that it couldn't make go off.
Where did you get the idea that fast system call instructions are inefficient? They're extremely fast compared to call gates. Windows at least forces applications to use library code provided by the OS to perform system calls, so even the oldest 32-bit Windows programs see the advantage of fast system calls.

A very biased evaluation. First, using a central entrypoint requires loading a register with function number at the userspace side. At the kernel side, the function numbers must be decoded and the relevant server procedure must be called. None of this is for free.

Second, if the kernel is segmented, nothing is improved by using syscall or sysenter, rather to the contrary, they infer additional overhead. For a segmented kernel, call gates are far superior and are faster.

Third, by having a flat kernel, all pointers from userspace must be validated. When user space are given selectors that only can address userspace, and not kernel, pointers don't need to be evaluated.

Forth, segmentation has been neglected for years by AMD and Intel, therefore the comparision should have been done on AMD Athlon or something and not on modern hardware.

Octocontrabass · Post by **Octocontrabass** » Thu May 25, 2023 5:25 pm

rdos wrote:The difference in the operation is that the selector base is disabled in long mode and instead MSRs are used for the FS and GS base. In protected mode and compatibility mode, the selector base is used and the MSRs are disabled.

Intel® 64 and IA-32 Architectures Software Developer’s Manual volume 3A section 3.4.4 wrote:The hidden descriptor register fields for FS.base and GS.base are physically mapped to MSRs in order to load all address bits supported by a 64-bit implementation.

[...]

Compatibility mode ignores the upper 32 bits when calculating an effective address.

The segment bases and the MSRs are the same thing. Any operation that modifies one modifies both.

rdos wrote:First, using a central entrypoint requires loading a register with function number at the userspace side. At the kernel side, the function numbers must be decoded and the relevant server procedure must be called. None of this is for free.

It's not for free, but I don't think it costs 400 cycles.

rdos wrote:if the kernel is segmented

64-bit kernels aren't segmented.

rdos wrote:by having a flat kernel, all pointers from userspace must be validated.

I don't think pointer validation costs 400 cycles either.

rdos wrote:Forth, segmentation has been neglected for years by AMD and Intel, therefore the comparision should have been done on AMD Athlon or something and not on modern hardware.

But this is a discussion about modern hardware. You're welcome to build and run the benchmark yourself, though.

rdos · Post by **rdos** » Fri May 26, 2023 2:19 am

I already did an benchmark based on the assumption that the kernel is segmented. AMD call gates won big time, and Intel had very poor performance. Which is indicated in the benchmarks referred to here too that were done on Intel. The difference between call-gate / interrupt and syscall on Intel simply is too large due to a poor segmentation implementation.

I have no interest whatsoever in how fast long mode runs, or how fast the native graphics driver runs on Windows or Linux. My concern mainly is the speed of segmentation and the speed of VBE. Therefore, I only buy AMD based systems nowadays.

If I would some day want to write a 64-bit OS (which is highly unlikely), I'd pick ARM, not X86-64 or X86-S.

OSDev.org

Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S

Re: Intel propose simplified X86-S