OSDev.org

Posted: **Fri Jan 21, 2011 6:28 pm**

The 386 design is still with us in modern x86 CPUs. It can't be fixed so this is all a theoretical argument.

I say Intel really screwed up when designing the 386.

Intel should have completely dropped the 286 "protection" and made the 386 compatible with the 8086 only.

That eliminates the following:
GDT/LDT, Segments (in protected mode), Limits, CPL0-3, TSS, IO Permission Map, Call Gates, Instructions like ARPL, eflags I/O privilege level and NT, address / operand size overrides

What the 386 design should have been:
- protected mode as a 32 bit flat address space with no segments, gdt/ldt, tss etc
- of course it has paging in protected mode
- it maintains 8086 compatibility and starts real mode
- virtual8086 is also available (slightly different design because there's no TSS)
- eflags bit 1 should have been the cpuid bit, and cpuid should have been implemented on the first 386
- smsw instruction changed to illegal opcode instruction
- 386 reserved interrupts should have been the last 32 (224-255), not the first 32. They overlap with hardware interrupts, a stupid design
- address and operand size overrides are NOT supported. In real / v8086 mode only 16bits of the registers are available
- the a20 bullshit should have been fixed up properly. One solution could have been a bit in CR0 that forced address lines 31-20 to zero until it was set
- there should have been special real mode instructions to allow accessing memory above 1mb, unreal mode is dodgy and a consequence of the poor intel design

Naturally, a 286 operating system would NOT work on a 386. But really what 286 operating systems were there? I think Windows 2.0 and 3.0 could run on an 8086 if the 286 wasn't detected. I think 286 applications could run on an 8086 cpu, I think from memory thats how Windows 2.0 / 3.0 worked. Of course, a 386 operating system has v8086 mode to run 286 applications. So the disadvantage of not supporting 286 operating systems really was not that bad.

Of course this is a theoretical idea, we can't fix Intel's historical mistakes.

Posted: **Fri Jan 21, 2011 7:34 pm**

You know, for all the brain damaged-ness that the x86 architecture contains, you sure picked upon all the superficial, unimportant, prod it a few times at boot and it gets out of your way stuff...

And yes, there was a very important 286 operating system: OS/2.

Posted: **Fri Jan 21, 2011 8:13 pm**

Ok how would you have designed the 386 CPU? What "brain damaged" features of x86 would you have fixed up????
Remember the 386 was built with around 275 000 transistors and its must be backwards compatible with the 8086.

OS/2 - IBM would have been forced to write a pure 32 bit version of OS/2, and I believe it would have been able to run 16 bit OS/2 applications by using v8086 mode.

Posted: **Fri Jan 21, 2011 8:21 pm**

Yes a lot of these things you never see such as segment overrides.

But, a benefit of simplifying the 386 is it allows you to reallocate the opcodes to more useful purposes. For example segment overrides + address/op overrides are 7 single byte opcodes that have been wasted.

Also the protection checks are a complete waste of time, how many transistors are wasted checking the CS limit?

Posted: **Fri Jan 21, 2011 10:09 pm**

Intel made the 80376 for embedded devices in the very late 80s. It was however quickly discontinued. The 80376 was basically a 386 but with the legacy compatability thrown out.

http://en.wikipedia.org/wiki/Intel_80376

Today it makes very much sense since a shaved x86 is the only way Intel is going to compete in the low power segment with their x86 architecture.

Posted: **Sat Jan 22, 2011 5:09 am**

I think you forgot to actually write something on the subject of this thread: You bring up a list of what you think should have been done differently. However, you haven't said anything about why the original 386 design was bad.

tom9876543 wrote:That eliminates the following:
GDT/LDT, Segments (in protected mode), Limits, CPL0-3, TSS, IO Permission Map, Call Gates, Instructions like ARPL, eflags I/O privilege level and NT, address / operand size overrides

Why is it a good thing to eliminate features? Just because your specific OS doesn't need them, you think nobody has a use for them?

How am I supposed to write my microkernel (or rather, the drivers for it) on your "improved" CPU that doesn't support IOPL and the I/O Permission Bitmap any more? Give every process access to everything?

Also, you certainly don't want to do away with CPL0-3 without a replacement that allows at least distinguishing a kernel and a user mode.

- protected mode as a 32 bit flat address space with no segments, gdt/ldt, tss etc

That's a different design, but why is it a better one?

- it maintains 8086 compatibility and starts real mode

I think this one causes much more trouble than compatibility with the 286 PM.

- address and operand size overrides are NOT supported. In real / v8086 mode only 16bits of the registers are available

And in Protected Mode you cannot use 16 bit values any more?

- there should have been special real mode instructions to allow accessing memory above 1mb, unreal mode is dodgy and a consequence of the poor intel design

Why add stuff for a legacy mode?

Of course this is a theoretical idea, we can't fix Intel's historical mistakes.

As said above, before claiming that Intel has made stupid mistakes, you should explain why you think your approach would have worked better.

Posted: **Sat Jan 22, 2011 8:17 am**

Why is it a good thing to eliminate features?

- Because modern operating systems don't use them.
- Another reason is there is no equivalent on other architectures such as ARM or PowerPC, so any portable operating system simply won't use them.

How am I supposed to write my microkernel (or rather, the drivers for it) on your "improved" CPU that doesn't support IOPL and the I/O Permission Bitmap any more?

I guess your microkernel is NOT portable. Does ARM or PowerPC support IOPL???? If the IOPL / IO Bitmap was never implemented you would not be here today complaining about something that never existed.

Also, you certainly don't want to do away with CPL0-3 without a replacement that allows at least distinguishing a kernel and a user mode.

You should be aware that page tables have the User/Supervisor bit.

And in Protected Mode you cannot use 16 bit values any more?

Firstly it is easy enough to load a 16 bit value by reading 4 bytes of memory and sign extending the 16 bits (or zeroing if its unsigned).

I agree that is debatable, maybe it is better to keep the operand size override prefix. When designing the 386, Intel would have had to predict how often 32 bit operating systems and applications actually work better with 16 bits values. So noone knows the answer to that question. In the long term, I would say that today 16 bit values are rarely used in 32 bit code.

Why add stuff for a legacy mode?

A 386 has a 32 bit physical address space. BIOSes today have to use the Unreal Mode "feature" to access all the address space while in real mode. I would say there needs to be a replacement for Unreal Mode since its not possible under this proposed design.

Intel kept 286 compatibility even though any true 32 bit operating system does not really need the 286 protection model. The opcode map could have been improved with more single byte instructions by getting rid of 286 compatibility. How many transistors are wasted to implement call gates, task gates etc?????

Posted: **Sat Jan 22, 2011 9:02 am**

tom9876543 wrote:
Why is it a good thing to eliminate features?
- Because modern operating systems don't use them.
- Another reason is there is no equivalent on other architectures such as ARM or PowerPC, so any portable operating system simply won't use them.

How am I supposed to write my microkernel (or rather, the drivers for it) on your "improved" CPU that doesn't support IOPL and the I/O Permission Bitmap any more?
I guess your microkernel is NOT portable. Does ARM or PowerPC support IOPL???? If the IOPL / IO Bitmap was never implemented you would not be here today complaining about something that never existed.

They don't have separate IO spaces to control access to either. What is done is done. You need an IOPB (or IO paging, or whatever) on x86 because there is a separate IO space

Also, you certainly don't want to do away with CPL0-3 without a replacement that allows at least distinguishing a kernel and a user mode.
You should be aware that page tables have the User/Supervisor bit.

You still need a way to distinguish the running mode. Perhaps you should have specified your replacement...

And in Protected Mode you cannot use 16 bit values any more?
Firstly it is easy enough to load a 16 bit value by reading 4 bytes of memory and sign extending the 16 bits (or zeroing if its unsigned).

I agree that is debatable, maybe it is better to keep the operand size override prefix. When designing the 386, Intel would have had to predict how often 32 bit operating systems and applications actually work better with 16 bits values. So noone knows the answer to that question. In the long term, I would say that today 16 bit values are rarely used in 32 bit code.

We are only talking about one prefix here. Its one opcode. Its a pretty inexpensive prefix.

Though yes, you could eradicate it and use mov[zs]x[wb] if you wanted. Everyone else lives with that. Hell, eliminate the H/L pairs while you're at it; they are quite useless...

But you'd end up needing an additional 2 opcodes for mov[wb]. Net loss.

Why add stuff for a legacy mode?
A 386 has a 32 bit physical address space. BIOSes today have to use the Unreal Mode "feature" to access all the address space while in real mode. I would say there needs to be a replacement for Unreal Mode since its not possible under this proposed design.

The BIOS is obsolete any way. Far better to dump real mode.

Intel kept 286 compatibility even though any true 32 bit operating system does not really need the 286 protection model. The opcode map could have been improved with more single byte instructions by getting rid of 286 compatibility. How many transistors are wasted to implement call gates, task gates etc?????

You are aware that Windows 9X required those features, right? So here we are, talking about an OS shipped post-386, which still needed 16-bit mode.

Posted: **Sat Jan 22, 2011 11:33 am**

tom9876543 wrote:
Why is it a good thing to eliminate features?
- Because modern operating systems don't use them.

But that's not true for most features that you mentioned:

GDT/LDT, Segments (in protected mode) - used e.g. for Thread Local Storage
Limits, IO Permission Map, eflags I/O privilege level - used for letting userspace processes access hardware
CPL0-3 - Xen PV uses Ring 0, 1 and 3
address / operand size overrides - Probably used a lot by any code
TSS, Call Gates, Instructions like ARPL, and NT flag - You could probably do without these, though tyndur uses some of them for VM86

Another reason is there is no equivalent on other architectures such as ARM or PowerPC, so any portable operating system simply won't use them.

Except that they concern only those parts of the kernel which are architecture specific anyway.

I guess your microkernel is NOT portable. Does ARM or PowerPC support IOPL???? If the IOPL / IO Bitmap was never implemented you would not be here today complaining about something that never existed.

It's not a big problem if you can't use in/out instructions to access the I/O ports from user space with ARM or PPC. In fact, you can't do so in kernel space either, because they don't even exist. So there's nothing to protect on these platforms. Your one absolute requirement was compatibility with 8086, though, so in/out must exist on your improved 386 and you need something to protect against unwanted accesses.

Also, you certainly don't want to do away with CPL0-3 without a replacement that allows at least distinguishing a kernel and a user mode.
You should be aware that page tables have the User/Supervisor bit.

You should be aware that the User/Supervisor bit is checked against CPL which you have just abandoned.

For the rest see Owen's answer, he has already said what I would reply.

Posted: **Sat Jan 22, 2011 2:57 pm**

In some sense, I think this shouldn't be theoretical. IMO, at some point, someone will design a CPU that is the x86-killer, and this is exactly how the design process should start -- by identifying what sux about x86 and all other current CPUs.

However, that 8086-compatibility thing that you put as step 2 wrecks the process. If I'm going to start listing the things that should be removed from x86 to build a more perfect CPU, 8086 compatibility and Rmode (and 32bit pmode, and segmentation, and the FPU, and MMX, and ...) are going to be the first things to go.

Owen wrote: Hell, eliminate the H/L pairs while you're at it; they are quite useless...

I've got to disagree strongly with this statement, though. As an ASM programmer, I know quite well that the byte registers are the main key to getting 2 to 3 times better performance out of a CPU than any optimizing compiler can achieve these days (because the optimizers were written by idiots who don't know what they are doing).

Posted: **Sat Jan 22, 2011 3:38 pm**

They don't have separate IO spaces to control access to either. What is done is done. You need an IOPB (or IO paging, or whatever) on x86 because there is a separate IO space

OK I realise that was going too far. There should be an IOPB, however clearly the TSS doesn't exist any more. Maybe a Control Register (similar to CR3) specifies where the IOPB is for the process. The eflags I/O privilege level is not required though.

You still need a way to distinguish the running mode. Perhaps you should have specified your replacement...

Oops sorry my mistake. I realise you meant the actual CS register in the CPU. The replacement is a single bit in CR0. Only supervisor code can modify it with MOV CR0, x. Also the instruction MOV x, CR0 (read CR0) should be supervisor level only. For Userland to get to supervisor mode, it is via an Interrupt, and then the Interrupt table determines whether the ISR is callable from userland. Also, by simplifying the CPU, it should be possible that in 32 bit protected mode the interrupt table only needs 4 bytes per interrupt, not 8 as is the case now.

But you'd end up needing an additional 2 opcodes for mov[wb]. Net loss.

Ahhh but you miss the point. The new additional opcodes can be 2 byte opcodes. In terms of performance you want the most common instructions as single byte opcodes. I think in modern software 16 bit overrides almost never occur.

The BIOS is obsolete any way. Far better to dump real mode.

Unfortunately when the 386 was first sold to the public, MS-DOS ruled the world and it is essential to maintain compatibility with it.

You are aware that Windows 9X required those features, right? So here we are, talking about an OS shipped post-386, which still needed 16-bit mode.

Microsoft used those features on the CPU because they could. If Microsoft were forced to use V8086 mode for all their 16 bit code then I'm sure they would have done that.

GDT/LDT, Segments (in protected mode) - used e.g. for Thread Local Storage
Limits, IO Permission Map, eflags I/O privilege level - used for letting userspace processes access hardware
CPL0-3 - Xen PV uses Ring 0, 1 and 3
address / operand size overrides - Probably used a lot by any code
TSS, Call Gates, Instructions like ARPL, and NT flag - You could probably do without these, though tyndur uses some of them for VM86

What widely used operating system today uses those features? I only recognise Thread Local Storage (WindowsNT and descendants).
I believe applications today rarely ever use the address / operand size overrides, but I can be proven wrong.

In relation to TLS, how does Microsoft implement that on the ARM CPU (since apparently Windows will soon be running on ARM)?
While TLS uses segments on x86, Microsoft used segments because they could. If Intel never gave them that feature, I'm sure Microsoft could have implemented it in a different way.

Posted: **Sat Jan 22, 2011 5:07 pm**

Hi,

Because of backward compatibility and standardisation, 80x86 achieved market dominance. At any time Intel could break backward compatibility, but if this happens everyone would need to convert any/all software, etc over to Intel's new design, but they could just as easily shift to anyone else's design instead; and Intel would be pushing away existing consumers who were "invested" in 80x86.

Basically backward compatibility is good for market share, and (because design costs are divided by units sold) more market share means lower prices and/or more $$ invested into improving the product, which results in cheaper/faster CPUs, which gets you more market share. It's a cycle of "win". If you have a look through history you'll see that every other CPU design got pushed out of the desktop/server market and is now either dead (68000, Alpha, PA-RISC, SPARC, etc), dying a slow death (POWER/PPC, Itanium) or constrained to the embedded market (ARM, MIPS). Even Intel's own Itanium can't compete with 80x86 for anything other than fault tolerance.

So, what we're looking at is a list of things that are "bad" from a programmer's perspective 15 years later; where (even back when 80386 was first released) programmer's opinions were mostly irrelevant when compared to that cycle of "win".

tom9876543 wrote:- it maintains 8086 compatibility and starts real mode

Ironically, I think the CPU should have started in protected mode (with all segments set to "base = 0, limit = max"). It would've made things easier for firmware (the BIOS normally switches to protected mode anyway before doing RAM sizing, POST, etc), and it would've made AP CPU startup easier for most OSs too. More importantly, it wouldn't have caused a backward compatibility problem because the BIOS switches back to real mode before starting the boot loader anyway and there was no SMP to worry about when 80386 was designed.

tom9876543 wrote:- 386 reserved interrupts should have been the last 32 (224-255), not the first 32. They overlap with hardware interrupts, a stupid design

That was not Intel's fault - since the beginning the first 32 interrupts were reserved by Intel for exceptions. It was IBM who decided to use these reserved interrupts for IRQs. If Intel reserved the last 32 interrupts instead, then IBM could've decided to use those reserved interrupts for IRQs and we still would've had the overlap.

tom9876543 wrote:- the a20 bullshit should have been fixed up properly. One solution could have been a bit in CR0 that forced address lines 31-20 to zero until it was set

This isn't Intel's fault either. Intel make a CPU and had no say in how that CPU is used. A20 wasn't Intel's problem (only IBM's problem) and it was never Intel's solution. However, I'd blame stupid programmers who relied on the "1 MiB address wrap" in the first place, as they're the ones that forced IBM to consider A20 as a way of ensuring old software worked. Intel had to adopt IBM's solution later (when they started combining separate chips into a "southbridge"), even though they probably didn't like it either.

tom9876543 wrote:What widely used operating system today uses those features?

So you're suggesting that back in 1985, Intel should've used some sort of time travel to see what people would/wouldn't be doing 15 years later? Hindsight is easy. Foresight isn't.

Cheers,

Brendan

Posted: **Sat Jan 22, 2011 5:11 pm**

tom9876543 wrote:What widely used operating system today uses those features? I only recognise Thread Local Storage (WindowsNT and descendants).
I believe applications today rarely ever use the address / operand size overrides, but I can be proven wrong.

TLS is implemented with segments both by Linux and Windows, and probably most other OSes on x86 as well. I/O port protection is done at least by Linux, no idea what Windows is doing there. For an example of ring 1 usage I already mentioned Xen. Do you really consider all of them "not widely used"?

In relation to TLS, how does Microsoft implement that on the ARM CPU (since apparently Windows will soon be running on ARM)?
While TLS uses segments on x86, Microsoft used segments because they could. If Intel never gave them that feature, I'm sure Microsoft could have implemented it in a different way.

Sure. And if Intel had never provided paging, we would probably do everything with segments. That's not a strong argument: You can almost always find some workarounds for missing features. The question is if it makes sense to have a feature in the CPU instead of emulating it or designing software differently to work without it.

Posted: **Sat Jan 22, 2011 6:25 pm**

tom9876543 wrote: I believe applications today rarely ever use the address / operand size overrides, but I can be proven wrong.

Almost every time you access a "short", "uint16_t", or "int16_t", in any code, the compiler sticks in an operand size override. You may not happen to use 2byte ints very often in your code, but other people do.

Posted: **Sat Jan 22, 2011 8:15 pm**

Ironically, I think the CPU should have started in protected mode ..... the BIOS switches back to real mode

That is a great idea. 386 and higher CPUs have this convoluted design where the CPU reads the first instruction at 0xFFFFFF0 even though its in real mode. A bad design mistake by Intel.

That was not Intel's fault - since the beginning the first 32 interrupts were reserved by Intel for exceptions.

Yes you are correct, my bad. IBM is supposed to be the pinnacle of computing excellence but they stuffed that up.

Intel make a CPU and had no say in how that CPU is used. A20 wasn't Intel's problem

I would disagree with you there. I found the Intel 8086 Users Manual on the web. It clearly says the following:
- offsets wrap around
- the memory address space is limited to 1 megabyte

It does not clearly say what happens when the physical address is 21 bits, but based on the above, you would assume a wrap around.

Intel's 8086 was poorly designed, it should have thrown an exception if the physical address was 21 bits. So because it doesn't the 386 needs to maintain backwards compatibility.

Intel screwed up the 286 in this regard, it was not 100% backwards compatible with the 8086. So IBM introduced the A20 hack.

So you're suggesting that back in 1985, Intel should've used some sort of time travel to see what people would/wouldn't be doing 15 years later? Hindsight is easy. Foresight isn't.

I am suggesting Intel should have had the following philosophy when creating the 386:
Build a 32 bit CPU that is 100% compatible with the 8086 but make the design as elegant and clean as possible. Get rid of 286 compatibility as its "protection" is primitive and convoluted.

Sure. And if Intel had never provided paging, we would probably do everything with segments.

Lets say you design a car that has 6 wheels instead of the usual 4. You can go on about how wonderful your car is because it has extra wheels. But at the end of the day no other cars have them and the extra wheels are only a burden.

Lets say Intel designs a CPU that has segmentation. You can go on about how wonderful segmentation is, but at the end of the day other CPU designs don't have it and the other CPUs operate fine without it.

Paging exists on most CPU designs, so of course Intel CPUs must have it. Segmentation on the other hand is Intel's burden. Is that clear to you now??

Almost every time you access a "short", "uint16_t", or "int16_t", in any code, the compiler sticks in an operand size override.

I guess that could be replaced by:
mov eax, [memory]
cwde ; from memory sign extends ax -> eax

Noone here is going to get formal statistics on how much code uses 16bit values so it is arguable.

OSDev.org

Theoretical: Why the original 386 design was bad

Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad

Re: Theoretical: Why the original 386 design was bad