x86 is too bloated
x86 is too bloated
I feel that x86, as well as IA-32, AMD64, Intel 64, x86-64, and all the like are very bloated architectures.
I feel that a lot of the hardware features are things that should be implemented in software, and not forced upon a developer.
For example, paging. The whole idea of forcing paging in order to enter long mode is stupid. I, personally, heavily dislike paging and try to avoid it at all costs. I feel like it is important for a mature OS, but it should be at the software level and therefore it would be optional and it would vastly differ from OS-to-OS. And a lot of times paging is useless in the cases where there is no suitable disk, like QEMU.
Also, access rings. Why is this a hardware feature? This is something that definitely should be implemented at the software level.
And data execution prevention. I mean that is so obviously a software issue that it only furthers my point on x86.
I feel that a lot of the hardware features are things that should be implemented in software, and not forced upon a developer.
For example, paging. The whole idea of forcing paging in order to enter long mode is stupid. I, personally, heavily dislike paging and try to avoid it at all costs. I feel like it is important for a mature OS, but it should be at the software level and therefore it would be optional and it would vastly differ from OS-to-OS. And a lot of times paging is useless in the cases where there is no suitable disk, like QEMU.
Also, access rings. Why is this a hardware feature? This is something that definitely should be implemented at the software level.
And data execution prevention. I mean that is so obviously a software issue that it only furthers my point on x86.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: x86 is too bloated
It's one architecture with several names depending on whether you're in 64-bit mode or not.rizxt wrote:I feel that x86, as well as IA-32, AMD64, Intel 64, x86-64, and all the like are very bloated architectures.
Nowadays every mature OS will use paging, so there's no reason to allow you to disable it, so AMD made it mandatory to enter 64-bit mode.rizxt wrote:For example, paging. [...] I feel like it is important for a mature OS,
Paging is still useful for isolating different address spaces even with no disk. There's some disagreement about the terminology; the x86 manuals use "paging" to refer only to the hardware mechanism that translates linear addresses to physical addresses. Swapping data to disk is entirely up to the operating system, and many can run without it (see every Linux live CD for example).rizxt wrote:And a lot of times paging is useless in the cases where there is no suitable disk, like QEMU.
How do you propose we prevent user programs from stomping all over the kernel or each other if every program has privileges to do anything it wants with the CPU? I agree that four rings is excessive, though. No one uses ring 1 or ring 2 anymore.rizxt wrote:Also, access rings. Why is this a hardware feature? This is something that definitely should be implemented at the software level.
No one's forcing you to use this one! It's entirely optional. (But it's a pretty good idea - why not let the CPU help you find software bugs?)rizxt wrote:And data execution prevention. I mean that is so obviously a software issue that it only furthers my point on x86.
Re: x86 is too bloated
So, use a different processor.
Re: x86 is too bloated
IMO, mandatory paging in long mode is not the best idea. But that's the way it is. If you don't want to use it, setup it once with 2MiB pages and forget about it. As for bloat in x86(-64) architecture. Its instruction set, more than anything else, is extremely bloated, but that's the price they had to pay for over 40 years of backward compatibility. New CPUs can still run original 8086 code! And x86-64 (AMD64) was just 64 bits slapped on top of existing x86 architercture.
I am sure about it, that if new major architecture was to replace x86 today, it would be RISC one.Today, CISC is just a relic of era when most of the code had to be written in assembler. But I am not so sure if Intel would dare to get rid of all x86 compatiblility just like that. They once tried. Remember Itanium CPUs? Their lack of compatibility with existing software was the main reason for its flop.
I am sure about it, that if new major architecture was to replace x86 today, it would be RISC one.Today, CISC is just a relic of era when most of the code had to be written in assembler. But I am not so sure if Intel would dare to get rid of all x86 compatiblility just like that. They once tried. Remember Itanium CPUs? Their lack of compatibility with existing software was the main reason for its flop.
Re: x86 is too bloated
The requirement to keep backward compatibility is the best thing that ever happened. Software compatibility (like Unix) is a complete disaster. It creates completely unreadable code cluttered with ifdefs that basically nobody understands.pvc wrote:IMO, mandatory paging in long mode is not the best idea. But that's the way it is. If you don't want to use it, setup it once with 2MiB pages and forget about it. As for bloat in x86(-64) architecture. Its instruction set, more than anything else, is extremely bloated, but that's the price they had to pay for over 40 years of backward compatibility. New CPUs can still run original 8086 code! And x86-64 (AMD64) was just 64 bits slapped on top of existing x86 architercture.
I am sure about it, that if new major architecture was to replace x86 today, it would be RISC one.Today, CISC is just a relic of era when most of the code had to be written in assembler. But I am not so sure if Intel would dare to get rid of all x86 compatiblility just like that. They once tried. Remember Itanium CPUs? Their lack of compatibility with existing software was the main reason for its flop.
Re: x86 is too bloated
Broadly I agree with you. I'm very new to developing for the x86-64, My CPU dev experience is centred around the 68k and ARM32, but I have come to appreciate the design and understand how it as ended up as it has... Looking forward, ARM64 will probably be my hobby platform of choice, but right now I'm enjoying learning about this quirky dinosaur, with the help of the forum members here.rizxt wrote:I feel that x86, as well as IA-32, AMD64, Intel 64, x86-64, and all the like are very bloated architectures.
I feel that a lot of the hardware features are things that should be implemented in software, and not forced upon a developer.
I actually like the idea of paging as it solves a lot of problems; Memory fragmentation, virtual memory, isolation of software, and security... but like with all forms of engineering, there are no perfect solutions, only the most acceptable compromise.For example, paging. The whole idea of forcing paging in order to enter long mode is stupid. I, personally, heavily dislike paging and try to avoid it at all costs. I feel like it is important for a mature OS, but it should be at the software level and therefore it would be optional and it would vastly differ from OS-to-OS. And a lot of times paging is useless in the cases where there is no suitable disk, like QEMU.
My own OS project doesn't use paging (at this time), as I consider it a very "heavy weight" approach to solving the above problems... I'm trying to think how I can use the x86's (frankly, really quite brilliant) MMU in a different, less one size fits all way. Since I'm not aiming for any form of backward software compatibility (and I'm not making YetAnotherUnix™) so I think I can take more liberties with this. Happy to discuss in a separate thread about how one might take alternative approaches.
As for Long mode requiring Paging... That does make sense to me, as it has an address space so large that there needs to be some way to rationalise it, and it is reasonable to use a well established approach.
I'm not sure what you mean here? All CPUs require some some of privilege level, as this is the most basic and efficient way to provide a security/stability partition between User code and Operating System code. I agree the x86 has a surfeit of privilege levels, two levels is plenty for most situations.Also, access rings. Why is this a hardware feature? This is something that definitely should be implemented at the software level.
Again ironically (given how in favour I am of such a hardware feature), my own OS project doesn't really take advantage of the privilege levels, as it is a microkernel everything (except interrupts) runs in user mode (ring 3), and privilege is enforced at the message passing stage... I've sort of argued against myself there a bit
I can't really think of a better execution prevention system than a hardware enforced one... again, I don't use it (yet), but I really really like it!And data execution prevention. I mean that is so obviously a software issue that it only furthers my point on x86.
-edit- With the way you were going, I'm surprised you didn't have a paragraph about how you don't like interrupts!
CuriOS: A single address space GUI based operating system built upon a fairly pure Microkernel/Nanokernel. Download latest bootable x86 Disk Image: https://github.com/h5n1xp/CuriOS/blob/main/disk.img.zip
Discord:https://discord.gg/zn2vV2Su
Discord:https://discord.gg/zn2vV2Su
Re: x86 is too bloated
Why? All modern OS separates address spaces, so they need paging anyway. How do you specify kernel addresses without page tables?rizxt wrote:For example, paging. The whole idea of forcing paging in order to enter long mode is stupid.
Show me one software level paging implementation! It's just not possible. MMU is part of the CPU hardware for good reason. ALL architectures implement paging in hardware, there's nothing x86 specific about that.rizxt wrote:I, personally, heavily dislike paging and try to avoid it at all costs. I feel like it is important for a mature OS, but it should be at the software level
I have a feeling you have absolutely no clue what protection rings are. It is a hardware feature because it must be a hardware feature. ALL architectures separates user space from kernel space in hardware, simply because no software solution exists for that.rizxt wrote:Also, access rings. Why is this a hardware feature? This is something that definitely should be implemented at the software level.
ROTFL. Again, no software solution exists, and NX bit is always implemented in hardware on ALL architectures this isn't x86 specific.rizxt wrote:And data execution prevention. I mean that is so obviously a software issue that it only furthers my point on x86.
I can imagine that they could create a 64 bit only CPU, which does not support real mode nor protected mode, only long mode. Without the real-mode BIOS, and UEFI using long mode already the backward-compatibility issues would be minimal (limited to SMP trampoline codes only).pvc wrote:But I am not so sure if Intel would dare to get rid of all x86 compatiblility just like that. They once tried. Remember Itanium CPUs? Their lack of compatibility with existing software was the main reason for its flop.
Cheers,
bzt
-
- Member
- Posts: 426
- Joined: Tue Apr 03, 2018 2:44 am
Re: x86 is too bloated
Not sure I'd ever call it "quite brilliant". It's certainly impressive Intel managed to keep it all together and backward compatible, but that's only because they literally had no other choice (they really messed up with the 286 MMU, and it's why we got stuck with real mode DOS for so long.)bloodline wrote: ... I'm trying to think how I can use the x86's (frankly, really quite brilliant) MMU in a different, less one size fits all way.
The 286 could have been so much better if it was just privilege mode and paging bolt on to real mode. Ie. Keep real mode x86 segment arithmetic for a per-process 1MB address space, and use paging to provide multiple protected address spaces. With 4KB pages, you'd only need a 256 entry page table, which would have been 1K per process, and a small number of TLB. The 286 had 134,000 transistors. By comparison, the IBM ROMP CPU MMU (which provided 32-bit to 40-bit virtual memory with TLB) was 61,500 transistors, which along with the 45,000 transistors of the ROMP CPU itself still comes in cheaper than the 286 transistor budget.
I too, originally had grand ideas about mixing paging and segmentation. I was going to have per-process segments, buddy allocated from the 4GB paged address space, that could be shared by smaller processes.
I would have started with perhaps 64MB address spaces, which is plenty big enough for most CLI utilities. A process would be migrated to a larger address space as required, doubling when it reaches its current limit transparently, up to the point where it needs it's own complete address space.
Bigger processes can then just use their own 4GB address space, and rely on just paging for isolation, and the kernel address space uniformly mapped into all address spaces.
The idea would have been that sharing lots of common processes in a single paged address space might have avoided TLB flushing overhead on address space switches, in the hope that most switches would have been between processes within a single paged address space.
Alas, it would have added extra complexity, and while I could probably hide it sufficiently in x86 specific code that it would not affect portability of the majority of the kernel, the effort didn't seem worth trying as it was a architectural dead end anyway.
TL;DR
x86 MMU is a horrible mess.
Re: x86 is too bloated
Yeah, it's clear that intel are paying the price for the ballsup that was the 286... They have been lucky that process and other architectural improvements have hidden the severity of this error.thewrongchristian wrote:Not sure I'd ever call it "quite brilliant". It's certainly impressive Intel managed to keep it all together and backward compatible, but that's only because they literally had no other choice (they really messed up with the 286 MMU, and it's why we got stuck with real mode DOS for so long.)bloodline wrote: ... I'm trying to think how I can use the x86's (frankly, really quite brilliant) MMU in a different, less one size fits all way.
My feeling is that Apple have shown with the M1 that with proper investment, more modern architectures can not only complete, but outperform the x86 in the same space. At a time when 90% of day to day computing can be done in cross platform browsers, legacy support for CPUs essentially worthless.
You can call it a mess (and I won't disagree with you), but I am blown away by how many options it offers and that it all actually works! The x86-64 MMU is amazing, just perhaps not for the right reasons
TL;DR
x86 MMU is a horrible mess.
CuriOS: A single address space GUI based operating system built upon a fairly pure Microkernel/Nanokernel. Download latest bootable x86 Disk Image: https://github.com/h5n1xp/CuriOS/blob/main/disk.img.zip
Discord:https://discord.gg/zn2vV2Su
Discord:https://discord.gg/zn2vV2Su
Re: x86 is too bloated
Software 'paging' could be done with excessive overuse of memcpy, though that would be pretty terrible.
There's also an argument to be made that the software pagetable walk used on some architectures would qualify. All of those have been pretty thoroughly rejected by actual users at this point though...
PowerPC for instance used a hashtable approach for the entries, but only seached two of them for any particular address. Failures resulted in an interrupt to have the OS figure out which entry to replace. I vaguely recall IA-64 going further than that, and having a TLB miss throw an interrupt and leave the rest to the OS. While I'm sure there was some researcher somewhere that loved the idea of inventing their own newer, better page table format, the efficiency loss is horrifying.
There's also an argument to be made that the software pagetable walk used on some architectures would qualify. All of those have been pretty thoroughly rejected by actual users at this point though...
PowerPC for instance used a hashtable approach for the entries, but only seached two of them for any particular address. Failures resulted in an interrupt to have the OS figure out which entry to replace. I vaguely recall IA-64 going further than that, and having a TLB miss throw an interrupt and leave the rest to the OS. While I'm sure there was some researcher somewhere that loved the idea of inventing their own newer, better page table format, the efficiency loss is horrifying.
Re: x86 is too bloated
That's the fault of poor programming language support/features, not of the software compatibility approach. Hardware compatability has failed anyway due to the multi-platform nature of the market. Not only do we now have different OSes, but we also have different platforms like PC and mobile and they use different CPU and GPU architectures. Going forward, the software approach is the right way to go, and hardware backward compatability is more or less an obsolete, old-hat idea.rdos wrote:Software compatibility (like Unix) is a complete disaster. It creates completely unreadable code cluttered with ifdefs that basically nobody understands.
Re: x86 is too bloated
The only thing that would have made the 286 better is if it never existed in the first place.thewrongchristian wrote:The 286 could have been so much better if ...
Intel should have scrapped the entire idea and focused on the 386. They almost certainly could have brought it to market earlier and it would have saved us an infinite amount of headaches. It probably still would have included segments, but with the benefits of paging protections, it would have been a simpler system (like what we use in 64-bit mode). We wouldn't have grossness like GDT entries with fields split up all over the place if backwards compatibility with the 286 wasn't required. The list goes on.
By the time anyone implemented anything that used the 286 properly, the 386 was just about ready to ship. It was a complete waste of time.
Re: x86 is too bloated
rizxt wrote:Also, access rings. Why is this a hardware feature? This is something that definitely should be implemented at the software level.
How is the kernel supposed to prevent an app from doing a cli hlt without access rings? Every arch out there has protection rings in hardware, as that is the only way.
-
- Member
- Posts: 426
- Joined: Tue Apr 03, 2018 2:44 am
Re: x86 is too bloated
It makes a lot of sense. Hardware page table walking requires extra complexity and microcode to do the actual walking. That's not free.reapersms wrote: There's also an argument to be made that the software pagetable walk used on some architectures would qualify. All of those have been pretty thoroughly rejected by actual users at this point though...
PowerPC for instance used a hashtable approach for the entries, but only seached two of them for any particular address. Failures resulted in an interrupt to have the OS figure out which entry to replace. I vaguely recall IA-64 going further than that, and having a TLB miss throw an interrupt and leave the rest to the OS. While I'm sure there was some researcher somewhere that loved the idea of inventing their own newer, better page table format, the efficiency loss is horrifying.
In contrast, you can use those resources instead for extra TLB, which will give you a bigger TLB reach. The MIPS R2000 has a fully associative 64 entry TLB, giving a reach of 256KB. Contemporary i386 had a 32 entry TLB, which was 4 way set associative, and so was not only smaller (128KB max reach) but also didn't handle collisions as well, potentially reducing it's effective reach somewhat.
MIPS put all the complexity of translation lookup in software, kept the hardware simple, and made it fast in the common case.
Soft page lookup also provides more flexibility in hardware page size, as there is no table dictating layout. MIPS32 doesn't make use of this, but later 64-bit MIPS did.
Given the choice for performance and efficiency, would you choose a MIPS R2000 or a i386?
Re: x86 is too bloated
I haven't dealt with any MIPS older than the R3000, and none of the systems used paging/VM even when I did, so I don't have a particularly informed opinion there. Given the choice, I'd probably lean towards the i386, but that would be mostly due to familiarity, and a general sense that MIPS tended to get chosen for being cheap/easy to license. For the systems I did work with, they relied heavily on coprocessors to get anything done in a reasonable timeframe, and had pretty terrible performance for general purpose code relative to their competitors.
As for software paging, I suspect the hardware cost of the fixed function search/TLB fill for the page present case still works out to being a better/faster solution overall than routing to an interrupt every time, but have no hard data there. It certainly seems like there'd be tighter latency guarantees available (though if you care about that, you probably wouldn't have paging enabled anyways).
There's probably a tradeoff in there for a larger/more associative TLB, in that that likely slows down the initial lookup, all other things being equal. That slowdown may be generally insignificant (and probably is) overall, but it's something to keep in mind certainly. For the comparison given there, I'd expect the MIPS to have slightly slower TLB-found performance, but hitting that more often. The 386 would be slightly faster, but more likely to fall into a TLB fill situation, and win that one via the known table format. For the page not found case, neither one is going to be fast about it, but that's expected.
One other consideration is that a full software approach means you're going to be consuming some amount of those larger TLBs or caches to track the translation and cache lines for the code to walk your structures. The 386 would be able to avoid that, as the pagetables are referred to with direct physical addresses anyways. The tables themselves I assume dirty the cache either way. With what I recall of MIPS icache performance (when it existed), that would be a pretty heavy cost.
Hardware vs software systems are probably veering further off topic though...
As for software paging, I suspect the hardware cost of the fixed function search/TLB fill for the page present case still works out to being a better/faster solution overall than routing to an interrupt every time, but have no hard data there. It certainly seems like there'd be tighter latency guarantees available (though if you care about that, you probably wouldn't have paging enabled anyways).
There's probably a tradeoff in there for a larger/more associative TLB, in that that likely slows down the initial lookup, all other things being equal. That slowdown may be generally insignificant (and probably is) overall, but it's something to keep in mind certainly. For the comparison given there, I'd expect the MIPS to have slightly slower TLB-found performance, but hitting that more often. The 386 would be slightly faster, but more likely to fall into a TLB fill situation, and win that one via the known table format. For the page not found case, neither one is going to be fast about it, but that's expected.
One other consideration is that a full software approach means you're going to be consuming some amount of those larger TLBs or caches to track the translation and cache lines for the code to walk your structures. The 386 would be able to avoid that, as the pagetables are referred to with direct physical addresses anyways. The tables themselves I assume dirty the cache either way. With what I recall of MIPS icache performance (when it existed), that would be a pretty heavy cost.
Hardware vs software systems are probably veering further off topic though...