Page 5 of 7

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Feb 17, 2018 6:03 am
by OSwhatever
Schol-R-LEA wrote:Getting back to the actual question: what experience do you have with the memory protection systems - not the paging systems, as I said paging and segmentation are orthogonal issues to that - in systems such as ARM, MIPS, 68000 (when matched to a 68881 or some other MMU), PowerPC, IBM POWER Systems (that is, the minis or blade servers, as opposed to the related PowerPC), SPARC, Alpha, VAX, or any other systems with such memory protection?
Speaking about ARM, their page table is very similar to x86 and ARM64 almost blatantly copied x86 PAE. However ARMs real-time like their Cortex R or M line processors without MMU actually have some kind of segment protection, what they named MPU. The MPU let's the programmer setup memory protection regions in the system, usually limited to around tens of regions. Since embedded systems usually have fixed and few regions this is usually not a problem. In my experience this type of protection has worked very well and helps a lot during the development compared to an embedded CPU that would have no protection.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Feb 17, 2018 12:17 pm
by Schol-R-LEA
OSwhatever wrote:Speaking about ARM, their page table is very similar to x86 and ARM64 almost blatantly copied x86 PAE.
Uhm, no it didn't, they both copied designs going back to the mid-1960s - long enough before that the patents held by IBM, DEC, and Honeywell had all expired. This wasn't anything new, it just hadn't been applied to microprocessors before 1981 (by Intel, on the i432 - and they did have to license some of those patents then) and didn't reach consumer-grade systems for almost another decade (IBM and Compaq both announced 386-based PC systems early in 1987 - the PS/2 models 70 and 80, and the Deskpro 386, respectively, with Compaq beating IBM by a few months - but they didn't really hit the market until the end of the year, and again with Compaq going first).

Regarding the page table system, ARM actually came first, as paging was part of the design from the outset - even the ARM1, which was only produced as a prototype, had paged virtual memory using a co-processor (mind you, an MMU for the 80286 was also in the works at the time, and the 68451 for the Motorola 68000 was already taped out). Both the ARM and the MIPS (which had similar initial specs) were designed starting in early 1983, a few months before work on the 80386 began, and none of the three design teams knew about what the other ones were doing - indeed, part of the reasoning behind the commercial development of RISC systems was that after the i432 debacle, most people thought that adding paging to a CISC design was never going to be feasible (though Motorola had already done it with the 68451 by the time ARM and MIPS were taping out).

Also, both Intel PAE and ARM LPAE apply to 32-bit systems; the 64-bit systems don't need to add a 36-bit, 40-bit, or 48-bit address spaces, for obvious reasons.

BTW, while ARM was late to the 64-bit party, most other RISC designs had been 64-bit long before AMD's x86 long mode came out: the DEC Alpha, which came out in 1992, was 64-bit from the start; SPARC rev. 9 was in 1994; the PowerPC 620 was in 1997 (though it only saw use in some obscure Unix servers, so practically speaking it wasn't until the 2003 PowerMacs that 64-bit PPCs were significant outside of IBM); and MIPS64 was in 1999.

So why did AMD get a jump on Intel in 2003? Because Intel was still planning to replace the x86 for server use with the failing Itanium (also 64-bit, first released in 1998, but very definitely not RISC), and didn't see a point in having 64-bits on the desktop.

Comments and corrections welcome.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Feb 24, 2018 7:48 am
by rdos
Schol-R-LEA wrote:So why did AMD get a jump on Intel in 2003? Because Intel was still planning to replace the x86 for server use with the failing Itanium (also 64-bit, first released in 1998, but very definitely not RISC), and didn't see a point in having 64-bits on the desktop.
I think Intel's failure was based on a lousy 64-bit design that emulated 32-bit code, which was bound to fail from the start. AMD did a better job by defining a 32-bit native mode that could run existing code at "normal" speed. Still, their design is pretty bad since it is hard to integrate 32-bit and 64-bit code. They also blew-up the segmentation.

Ideally, Intel should have extended segment registers to 32-bit when doing the original 32-bit design, and then 64-bit mode could have been made as an easy extension without breaking segmentation. A central issue is that 64-bit linear addresses cannot be passed to 32-bit code without aliasing them to a 32-bit linear address. If segmentation hadn't been broken, then segments with 64-bit base addresses could have been constructed by using two entries. Another bad design choice is that 64-bit registers and operations are not possible in 32-bit mode. When Intel created their 32-bit environment, it was still possible to use 32-bit instructions in real mode. Not to mention that 32-bit operations will sabotage the higher order 32-bits of 64-bit registers. This is perhaps the worst of all since it means that all 64-bit registers need to be saved "just in case" when 32-bit code is called.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Feb 24, 2018 9:31 am
by Schol-R-LEA
rdos wrote:
Schol-R-LEA wrote:So why did AMD get a jump on Intel in 2003? Because Intel was still planning to replace the x86 for server use with the failing Itanium (also 64-bit, first released in 1998, but very definitely not RISC), and didn't see a point in having 64-bits on the desktop.
I think Intel's failure was based on a lousy 64-bit design that emulated 32-bit code, which was bound to fail from the start. AMD did a better job by defining a 32-bit native mode that could run existing code at "normal" speed.
I think you misunderstand the point of Itanium. The goal was to go into a market Intel hadn't been in, one where x86 compatibility wasn't important; the x86 emulation was a last minute add-on to appease those server customers who, for some reason, had some piece of MS-DOS or 16-bit Windows software to run on their server systems (the assumption was the Windows 9x software would be ported instead - and even that was seen as unlikely, as they expected commercial Unix systems like HP/UX to be the main OSes, or Windows NT - which at the time wasn't compatible with Win 9x even on x86). Running x86 software was never the primary use case, because the whole point of Itanium was a role no one was using x86 for in 1995.

Let me be blunt: the x86 design is terrible. No one wants to keep it, no one wanted it in the first place, it was introduced solely to fill a gap in Intel's timetable. Its persistence is a historical accident. Yes, AMD put new life into it by creating long mode, but even they think it was a mistake today. They did it because they saw a market, and because they had a now-defunct idea of introducing their own replacement for x86 and needed to fund that. The x86 architecture is the undead nemesis of Intel and AMD, and neither of them seem to be able to put a stake in its heart, because the financial consequences would destroy them both and probably most of the software industry with them.

That, and all of the alternatives have been just as flawed, just in more subtle ways. It is easy to see why something like SUBLEQ is a bad idea; it isn't much harder to see why x86 is bad; but the faults in, say, the M68000, or RISC designs like the MIPS or ARM, or the VLIW designs such as Itanium, are harder to see, and only show up when it is too late to fix them.

The flaws in the Itanic as an ISA? They were never really the problem, the problem was simply getting people to buy it in the first place - the market they were targeting was dominated by Sun's SPARC rack-mounts and IBM iSeries minis, at the time, and Intel couldn't convince anyone that Itanium would be better. They partnered with HP (who did most of the design work that went into the Itanic, actually, starting around 1989 or so, a time when Intel were still pinning their hopes on the i860) mainly because they were seen as a consumer (rather than HPC) chip designer, and needed the credibility when entering that market, but that backfired, as HP was having their own troubles at the time.

The mistake of committing to using RAMBus for all early Itanium systems had a bigger impact than the poor x86 emulation. Why? Because the people who were going to be using it for the most part weren't going to be emulating x86. If anything, they were more likely to be emulating some antique Burroughs or Honeywell mainframe than emulating a PC.

One could also point out that the market they were aiming at was itself dying by then, though the truth is, there are a lot more such systems still around than most realize - and Itanium was a dominant player in that market in the mid-2000s. Fun fact: in 2008, the five most widely deployed current-gen (32-bit and 64-bit - the market for 8-bit and 16-bit microcontrollers like the AVR exceed that for high end processors by orders of magnitude) microprocessors are, in order, the ARM, the x86, the POWER, the MIPS, and... the Itanium. It is, in fact, a success, just one in market that is moribund.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Feb 24, 2018 10:22 am
by Brendan
Hi,
rdos wrote:
Schol-R-LEA wrote:So why did AMD get a jump on Intel in 2003? Because Intel was still planning to replace the x86 for server use with the failing Itanium (also 64-bit, first released in 1998, but very definitely not RISC), and didn't see a point in having 64-bits on the desktop.
I think Intel's failure was based on a lousy 64-bit design that emulated 32-bit code, which was bound to fail from the start. AMD did a better job by defining a 32-bit native mode that could run existing code at "normal" speed. Still, their design is pretty bad since it is hard to integrate 32-bit and 64-bit code. They also blew-up the segmentation.
Itanium's failure was mostly due to "compilers will fix all the problems caused by VLIW" (and pricing it for high-end servers and neglecting desktop machines that people use to write and test software for those high-end servers).

For 80x86 long mode, the only thing that really mattered was that it was relatively easy for Microsoft to recompile their kernel for 64-bit (with minor changes to things like task state saving/loading and paging) and relatively easy for Microsoft to support legacy "win32" executables. Microsoft (like all sane developers) never used segmentation for 32-bit code.
rdos wrote:Ideally, Intel should have extended segment registers to 32-bit when doing the original 32-bit design,
Intel's mistake was supporting segment registers for 32-bit code. Sadly, segmentation for 16-bit code was "needed" (to support "win16" applications that were intended to be compatible with real mode) and at the time Intel were still deluded by "capability" ideas (that were responsible for iAPX 432).
rdos wrote:and then 64-bit mode could have been made as an easy extension without breaking segmentation. A central issue is that 64-bit linear addresses cannot be passed to 32-bit code without aliasing them to a 32-bit linear address.
No; the central issue is that the entire world knows that segmentation is obsolete trash, but you've spent 2 decades convincing yourself that the entire world is wrong because you can't afford to redesign your OS.
rdos wrote:If segmentation hadn't been broken, then segments with 64-bit base addresses could have been constructed by using two entries. Another bad design choice is that 64-bit registers and operations are not possible in 32-bit mode. When Intel created their 32-bit environment, it was still possible to use 32-bit instructions in real mode.
The intention was that legacy 32-bit software (that doesn't use 64-bit instructions) would still run, and all new software would be 64-bit. Nobody was expected to care that 32-bit software can't use 64-bit instructions and almost nobody ever did. Also note that it's obvious AMD struggled to find opcodes that could be used for REX prefixes by the way they scavenged opcodes from previously valid instructions; and this "scavenging" couldn't have worked if they wanted to allow 64-bit instructions in 32-bit code (that used the previously valid instructions) - we would've ended up with 3-byte REX prefixes and very inefficient 64-bit instructions.
rdos wrote:Not to mention that 32-bit operations will sabotage the higher order 32-bits of 64-bit registers. This is perhaps the worst of all since it means that all 64-bit registers need to be saved "just in case" when 32-bit code is called.
If you knew anything about instruction dependencies you'd know that zero extension (which breaks the dependency on the previous value in the register) is important for performance; and you'd know that Intel were "less smart" when they designed 32-bit because they ruined performance for 16-bit operations.


Cheers,

Brendan

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Feb 24, 2018 10:25 am
by Schol-R-LEA
Brendan wrote:Hi,
rdos wrote:
Schol-R-LEA wrote:So why did AMD get a jump on Intel in 2003? Because Intel was still planning to replace the x86 for server use with the failing Itanium (also 64-bit, first released in 1998, but very definitely not RISC), and didn't see a point in having 64-bits on the desktop.
I think Intel's failure was based on a lousy 64-bit design that emulated 32-bit code, which was bound to fail from the start. AMD did a better job by defining a 32-bit native mode that could run existing code at "normal" speed. Still, their design is pretty bad since it is hard to integrate 32-bit and 64-bit code. They also blew-up the segmentation.
Itanium's failure was mostly due to "compilers will fix all the problems caused by VLIW" (and pricing it for high-end servers and neglecting desktop machines that people use to write and test software for those high-end servers).
All of this is predicated on a flawed narrative - the Itanium wasn't a failure, it just wasn't the dramatic success expected, because the market it targets - and is a success in - was already shrinking in 1998, and mostly vanished by 2012. That, and the whole RAMbus debacle, which slowed deployment until they got a chipset that worked with other types of memory (which only took about a year, and was fixed before any systems were released, but it was a huge embarrassment for them).

It is also a flawed narrative because Intel didn't design the damn thing, or at least, they weren't the primary designers. Hewlett Packard were, but they partnered with Intel to allay the development costs, which suited Intel because they wanted a high-end server system and didn't think x86 would cut it.

The compiler issue is apparently vastly overstated; it was less that it didn't work, than that the effort to make them work well was simply never put in by anyone other than Intel and HP themselves. Their own compilers apparently did pretty well with it, and even GCC (which was ported to it in emulation even before Itanium was ready for deployment) ran... well, about as well as you would expect from GCC of the time, not great but not terrible either. The main problem with compilers on Itanium was that no one wanted to write any unless it was already a smashing success, and in the target market, that was never going to be possible.

As for the matter of writing and testing software... I know you know why that is a ludicrous assertion, so why are you making it? Native development was never part of the use case for the Itanium, cross-development and porting was the assumed model. While HP did play around with am Itanium workstation, IIUC putting it into developer desktops, never mind consumer desktops, was never a serious consideration.

But as for it being a failure, well, they did release a new 32nm iteration of it last year... though they have said it would be the last one, because the market for that type of server is dead. Note that they decided not to go for a 14nm or even 22nm version, because there just wasn't any need for one.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Mon Mar 05, 2018 8:26 pm
by StudlyCaps
Schol-R-LEA wrote:All of this is predicated on a flawed narrative - the Itanium wasn't a failure
This reminds me of the narrative that IBM is dead in the water because it's desktop market share disappeared. Just because you don't buy a product, doesn't mean it's a failure, it means you're not the target market.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Mar 10, 2018 9:24 pm
by Thomas
Hi,
I worked on Itanium for quite sometime, mostly debugging kernel and application crashes though. You need write the applications in such way that it is optimized for Itanium or else you may get horrible performance. Even this is quite a while back. You need to make sure that data is aligned properly, alignment faults are extremely expensive in Itanium. One instruction in itanium is called a bundle, A bundle consists of 3 instructions. Itanium can execute 2 bundles in paralell. So if you are lucky you may get up to 6x performance improvement for some applications 8) . I think it is a horrible idea to run x86 applications on itanium as the performance may not be so great. You need to run an operating system and application targeted for itanium.


Best setup would be OpenVMS with Itanium :D. I have been playing chess and not doing anything technical lately other than my day job. I am very rusty, so take everything with a grain of salt.



--Thomas

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sun Mar 11, 2018 2:10 pm
by Schol-R-LEA
I recommend we cut the topic drift on this point short. Comparing Itanium to x86 is at best apples to oranges; comparing x86 to PowerPC (which was aimed at the desktop, being a chopped-down version of the POWER architecture directly meant as a substitute for the x86 and the 68K) makes more sense, though even there the absurd and self-destructive politics within the manufacturing alliance (especially later on, though it was present from the outset) meant it never would be able to for reasons unrelated to the actual technical merits of either ISA.

But we were discussing segmentation, and even that was only because rdos and some others were stubborn, clueless, or misinformed about what segmentation is, what it was for (past tense, except in their minds), and what does and does not count as a segmented architecture.

So let me repeat my earlier question to rdos: what, if any, non-x86 systems with hardware memory protection have you written system memory management software for, and what was your experience with those like?

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Fri Mar 30, 2018 6:00 am
by rdos
Brendan wrote: For 80x86 long mode, the only thing that really mattered was that it was relatively easy for Microsoft to recompile their kernel for 64-bit (with minor changes to things like task state saving/loading and paging) and relatively easy for Microsoft to support legacy "win32" executables. Microsoft (like all sane developers) never used segmentation for 32-bit code.
No wonder. Their older platforms were horrible merges of DOS, DOS-extenders and a new Windows system built on top of it. Such a horrible design was bound to fail from start. So, they gave up on their legacy and designed a new system, using what was available (a flat memory model).

Which, in fact, was their only choice since Intel hadn't extended segment registers to 32-bit, and so selectors would always be a scarce resource you would run out of.

I think it was mostly Microsoft's fault that Intel did it this way. They were too concerned with 16-bit portability and didn't look forward to larger systems.
Brendan wrote: Intel's mistake was supporting segment registers for 32-bit code. Sadly, segmentation for 16-bit code was "needed" (to support "win16" applications that were intended to be compatible with real mode) and at the time Intel were still deluded by "capability" ideas (that were responsible for iAPX 432).
Not a mistake at all. The mistake was to not make it useful. With 32-bit selectors OSes would not run out of selectors. Ever. That would have been an easy addition at that point, but one that would break 16-bit code.
Brendan wrote: No; the central issue is that the entire world knows that segmentation is obsolete trash, but you've spent 2 decades convincing yourself that the entire world is wrong because you can't afford to redesign your OS.
Highly useful, not trash. Makes it possible to construct a complex & stable OS in assembly. Something that is largely impossible with a flat memory model.
Brendan wrote: The intention was that legacy 32-bit software (that doesn't use 64-bit instructions) would still run, and all new software would be 64-bit. Nobody was expected to care that 32-bit software can't use 64-bit instructions and almost nobody ever did. Also note that it's obvious AMD struggled to find opcodes that could be used for REX prefixes by the way they scavenged opcodes from previously valid instructions; and this "scavenging" couldn't have worked if they wanted to allow 64-bit instructions in 32-bit code (that used the previously valid instructions) - we would've ended up with 3-byte REX prefixes and very inefficient 64-bit instructions.
Intel made 32-bit extensions available to real mode and 16-bit code, so why would that not be possible for a 64-bit design? After all, there is a special descriptor for 64-bit mode, and addresses could have been extended for 32-bit code to allow 64-bit register use. I'm not saying it must be very efficient (the data & address overrides are not efficient), just the possibility would have provided great advantages.

And, besides, most 64-bit code still use 32-bit relative addresses & operands, something they could have fixed by creating a 64-bit GDT with 64-bit bases addresses & 64-bit limits.

Also, note the horrible cludge AMD designed for FS and GS. They use an MSR instead of a GDT entry to set a 64-bit base address.
Brendan wrote: If you knew anything about instruction dependencies you'd know that zero extension (which breaks the dependency on the previous value in the register) is important for performance; and you'd know that Intel were "less smart" when they designed 32-bit because they ruined performance for 16-bit operations.
At least, this was easy to accomplish for 16-bit code. After all, accessing AX in 16-bit code doesn't clobber upper part of EAX. So, I actually don't believe this is a real issue, but more like a bug. Also, note that this is an issue in compatibility-mode, and one that might slow that code down, and not an issue with native 64-bit code.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Fri Mar 30, 2018 6:24 am
by rdos
Schol-R-LEA wrote: But we were discussing segmentation, and even that was only because rdos and some others were stubborn, clueless, or misinformed about what segmentation is, what it was for (past tense, except in their minds), and what does and does not count as a segmented architecture.
I think it is mostly you that is misinformed.
Schol-R-LEA wrote: So let me repeat my earlier question to rdos: what, if any, non-x86 systems with hardware memory protection have you written system memory management software for, and what was your experience with those like?
I have a whole lot of experience with applications using flat memory model (that's the native mode of RDOS), and how that compares to a segmented model (kernel). I'd say that these applications typically accumulate lots of bugs that sometimes are very hard to resolve, and that often passes testing. Replacements for new and free can help locate them, but can only run during the test stage, so won't help in finding bugs in production code. Drivers, on the other hand, very seldom have these bugs, and when they have, often gives exceptions that are directly indicative of the problem. These run-time tests of course remain in production code as they are enforced by segmentation.

The main culprit of the flat memory model is the heap, and the fact that all data are aggregated to a small memory region without any limit & validity checking. The paging system typically won't catch these issues as paging uses a 4k granularity. Paging might catch uninitialized pointers, but that's it.

Which is what is "fixed" in Java and other interpreted languages, at the huge cost of interpretation (or just-in-time compilation).

But I'm sure that you will persist to claim this is not a problem....

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Fri Mar 30, 2018 9:54 am
by Schol-R-LEA
rdos wrote:
Schol-R-LEA wrote: But we were discussing segmentation, and even that was only because rdos and some others were stubborn, clueless, or misinformed about what segmentation is, what it was for (past tense, except in their minds), and what does and does not count as a segmented architecture.
I think it is mostly you that is misinformed.
Schol-R-LEA wrote: So let me repeat my earlier question to rdos: what, if any, non-x86 systems with hardware memory protection have you written system memory management software for, and what was your experience with those like?
I have a whole lot of experience with applications using flat memory model (that's the native mode of RDOS), and how that compares to a segmented model (kernel).
You are evading the question. I am asking about systems other than x86, which use a memory protection system.
rdos wrote:But I'm sure that you will persist to claim this is not a problem....
I never said it wasn't a problem. For the record? I don't give a damn about flat versus segmented. Hell, I favor hardware capability-addressing architectures with tagged memory, but the rest of the world seems to have forgotten about those, because for most people security costs more than insecurity.

(Or as Ivan Godard of the Mill project put it: "I would love to make a capability architecture, but I can't sell a capability architecture, and neither can anyone else." The horrible lack of security in mainstream systems is a blight on the world, and the real costs of it are far beyond what the the costs of a more secure system would be, but if no one is willing to pay the up-front price in both money and effort to use a system that doesn't actively undermine security - which all the mainstream OSes do - then nothing will improve. Short-sightedness is, and always has been, a key factor in everything people do.)

[EDIT: I mangled the Godard quote a bit; his actual statement, in the first lecture in the series on the Mill ISA, can be found here. It goes: "It is not a capability architecture. I would love to do a capability architecture, I think that caps are the most wonderful thing in the world. I know how to build one; I don't know how to sell one. We're in the goal (sic) to make money."]

But the real point is the segmentation isn't about protection. It was never about protection. The 80286 protection mechanisms worked with segmentation because segmentation was already there, and paging would have required a lot of extra chip real estate that wouldn't be available for at least two more nodes (the design of the 80186 and 80286 were both begun in mid-1980; the design of the 80386 began in 1983, just a little more than 36 months later).

After those two nodes, they added paging to the 80386. Again, this wasn't about protection; it was about making virtual memory easier. The protection system worked with it, but again, that was only because they were adding paging anyway.

They kept segmentation in the 80386 only because they would have to have some form of it for backwards compatibility, and the added hardware for the protected mode form wasn't a significant cost. The chip designers attitude towards it is reflected clearly in how little work they put into the p-mode segmentation, and how many flaws the new system had. They saw segmentation for what it was - a solution to a specific hardware problem that was no longer necessary.

Does segmentation have advantages in OS software? Perhaps, though I am not convinced either way. What I do know is that most current OSes blindly follow the models of Unix and VMS (which was the main design influence on Windows NT, which in turn became the primary basis for consumer Windows kernels from XP onward), neither of which ran on segmented systems originally and both of which made assumptions where segmentation didn't fit.

But all that is beside the point: for most purposes, using segmentation ties you to using a segmented architecture - the specific one at that, as all segmentation systems differ more than the resemble each other - and for better or worse, since the mid-1980s, the general trend is towards making portability easier (even if, as Brendan often points out, actual portability in an OS is a mirage). Windows and MacOS are exceptions to this, but since they dominated the actual consumer computer market for so long, that trend in OS design is often missed (since parts of the designs of both of those were fixed before Unix made portable designs popular - and even then, both made abortive attempts at portable versions later, and more recently are doing so again).

The fact that no new OS designs have had a significant market impact since Amiga Exec came out in 1985 (the newer OSes which have mattered even a little bit, namely Linux, FreeBSD, and Mach, are all re-implementations of Unix and thus they don't count), doesn't change what is happening among people trying to write new OSes.

I have no problem with you using segmentation in RDOS. Honestly, if you can make it work, and fit it into your protection system, all good. But most OS devs right now don't seem to want to tie themselves to the x86, which might be wise given the current directions of things.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Wed Apr 11, 2018 11:42 pm
by linguofreak
rdos wrote:
Brendan wrote: Intel's mistake was supporting segment registers for 32-bit code. Sadly, segmentation for 16-bit code was "needed" (to support "win16" applications that were intended to be compatible with real mode) and at the time Intel were still deluded by "capability" ideas (that were responsible for iAPX 432).
Not a mistake at all. The mistake was to not make it useful. With 32-bit selectors OSes would not run out of selectors. Ever. That would have been an easy addition at that point, but one that would break 16-bit code.
I can actually think of a kludge that would have allowed 32-bit selectors with at least some back-compatibility with 16-bit code. You divide each selector into an upper and a lower half, and allow a segment register to be loaded using either the lower half of a selector (with the legacy opcodes) or the whole selector at once (with new opcodes). An upper-half selector indexes into a descriptor directory. Each entry in the descriptor directory can have one of three types: 8086 legacy, 80286 legacy, and native, as well as having some permissions bits that dictate when that DDE can be loaded. Upper-half selector 0xffff has some special properties, to be described later.

A native type DDE contains the address of a Native Descriptor Table. When the upper-half selector loaded into a segment register points to such a DDE, the lower-half selector is used as an index into the corresponding NDT. A Native Descriptor Table entry contains protection information for a segment, plus the address of a page directory. A native segment is purely a paged address space, with none of the legacy offset/limit nonsense (but unlike actual x86 paging, you don't have just one paged address space in use at a time: CS can be using one address space, DS another, SS yet another).

An 8086 legacy DDE contains a full 32-bit selector for a native segment. When the upper-half selector loaded points to an 8086 legacy DDE, the lower half selector is treated just like a segment register value in an 8086-style address calculation within the first megabyte of the address space for the corresponding native segment. Attempts to switch into protected mode by the 80286 legacy method result in a trap to upper-half-selector aware code, which handles it according to the situation and operating system policy (perhaps it terminates the running program, perhaps it replaces all loaded 8086 legacy segments with 80286 legacy segments, perhaps something else).

An 80286 legacy DDE also contains a full 32-bit selector for a native segment. When the upper-half selector loaded points to an 80286 legacy DDE, the lower half selector is treated as an 80286 protected mode segment selector and the first 16 MiB of the corresponding native segment's address space is used as the linear address space for 286 segmentation. The way I'm thinking the 286 legacy GDTR and LDTR would be done is that there would be one of each for the whole CPU. This would result in weird behavior if 80286 legacy segments were loaded with upper-half selectors that brought in different address spaces (at least, if the GDT and LDT addresses were not constant across address spaces), but would give behavior consistent with what legacy programs would expect on performing GDT and LDT loads.

The CPU would not have separate "real" and "protected" modes, nor a state in which paging was disabled. The distinction between legacy real and protected segmentation modes (and, for that matter, native, non-legacy segmentation) would be entirely a matter of the DDE loaded in each segment register. Allowing the CPU to operate before segmentation and paging structures are set up at boot is done by assigning special properties to upper-half selector 0xffff: namely, no paging is performed for memory accesses through any segment register loaded with upper-half selector 0xffff. As long as the DDE indexed by that selector has its valid bit set (or at boot before a valid Descriptor Directory Register has been loaded), all such accesses will be to physical memory. If the type for the relevant DDE entry is 8086 legacy or 80286 legacy, addressing through the lower-half selectors of segment registers loaded with upper-half selector 0xffff will behave exactly like real/protected mode segmentation in the lower 1 MiB/16MiB of physical memory, respectively. If the type for the relevant DDE entry is native, the lower-half selector is ignored and upper-half selector 0xffff behaves as a selector for a segment covering all of physical memory untranslated. At boot, all segment registers are loaded with permission values appropriate to operation at boot, 8086 legacy segment type, and selector 0xffff-ffff for CS and 0xffff-0000 for all other segment registers, yielding behavior identical to the 8086.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Thu Apr 12, 2018 4:39 am
by tom9876543
It is interesting to read about people's ideas of what Intel should have done with their CPU designs.

The big problem was Intel focussed on the iAPX 432 processor. Intel thought the 432 would become their premier CPU and biggest seller.
Intel neglected the 8086 and the 80286 was built by second rate engineers, the "best" engineers were working on the 432.

Intel failed miserably with the 80286 design.
If my memory is correct, virtual memory paging (similar to the 386) had existed on mainframe CPUs since the 1960s.
What Intel should have done for 80286:
- implement CPUID
- extend all 8086 registers to 32 bit
- because cpu space was limited, do NOT change the 16bit 8086 instructions e.g. all existing instructions are unchanged and remain 16 bit - when they update a register top 16bits are sign/zero extended
- because cpu space was limited, only implement an absolute minimum of new 32bit instructions e.g. PUSH32, POP32, MOV32, LEA32, AND32, OR32, XOR32, ADD32, SHL32, SHR32
- the new instructions MOV32 and LEA32 will always operates in 32 bit flat mode. As an example, MOV32 AX, [BP + DI] - would load 4 bytes from 32bit memory address [BP + DI] into AX register. Segments are ignored. This allows access to all physical RAM, and all 32bits of registers, from any cpu mode.
- add CR0 to the 8086. The "PM" bit is NOT required in CR0. If Paging is enabled, the User/Supervisor bit determines access to privileged instructions
- then obviously the PG bit enables paging, CR3 has base address etc.
- then proposed 80286 CR3 bit 0 should have been a "32BITADDR" bit. When it is set, addressing is 32bit flat mode via all 32bits of the register, for existing 8086 instructions. CS DS ES SS are ignored for addressing and can actually be used for holding 32bit values. Segment overrides are invalid (opcodes can be reused in future). Note the 8086 instructions still have 16bit / 8bit operands, it is not a true 32 bit CPU due to limited CPU die space. If 32BITADDR is not set, addressing works exactly same as 8086. As this value is in CR3, it is per process. This design allows 8086 16bit code to work unchanged with paging turned on.
- when 32BITADDR is set, IP and SP are 32bit flat values. Far calls / jmps are invalid. Example: CALL [CX] - loads 32bit value from CX into 32bit IP and pushes 32bit return value onto [SP].
- make sure PG can be turned off, so CPU can revert back to 8086 mode, keeping full compatibility with original 8086 :)


The 80286 should have been a 16bit 8086 cpu with 32bit paging bolted on, as described above.
Instead Intel built a pile of crap.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Thu Apr 12, 2018 9:08 am
by Schol-R-LEA
I would argue that what they 'should have done' was scrap the 8086 line entirely, and I am pretty sure they would have preferred to, but they were seeing it as a stopgap on top of a stopgap. They had to make it backwards compatible, because they were realizing (even before the IBM deal that locked them into it) that there would be a market for more advanced 8086s (for some limited values of 'advanced') capable of running a modern operating system. They were also using it as a testbed for things meant to go into the 432.

I am guessing that they were hoping that, by the time the 80286 was ready for full-scale production, they would have the kinks in the 432 ironed out, and could then relegate the 80286 to having been just a trial run for 432 production processes and never have to actually market it. But between the start of the 80286 project, and the tape-out of the design, the IBM deal came along, and to everyone's surprise, the PC make headway in the market (rather than fail as most expected), but not enough to give IBM the leverage to kill the home computer industry the way both IBM and Intel wanted it to. Anyway, by the time the dies were ready to ship, it had become clear that the problems with the 432 were terminal, so they were forced to push forward with a product they didn't want.

As for those suggestions @tom9876543 made, the problem with it is that it would introduce a lot of complexity, and perhaps more importantly, demand a lot of IC real estate. The team developing the 80286 had the same problem that was one of the major factor in the 432's failure: they couldn't fit it onto a single chip without making it so large that the failure rate would make it economically infeasible.¹ The 80286 was right at the far edge of what the then-current Moore's Law node allowed; the 432 was at least four nodes past, probably more like six or seven (that is to say, it would have needed between 16 and 128 times as many transistors per mm as were actually available in 1980), meaning that even as a two-chip set, the 432 simply was too much to produce economically with acceptable performance - and that's without even considering the design flaws in the CPU itself.

The suggested approach would have pushed the 80286 over the limit of transistor densities of the time, and resulting in what would have been, in effect, a new design altogether, at which point they might as well have scrapped the 80286 and done a completely new design. Since the entire point was to make it easier to port the existing assembly code to the new chips - a consideration which was part of why the 8086 was a mess to begin with (the whole idea was to make it familiar to 8080 programmers) - that idea would have been a non-starter.

We are going well afield of the topic at this point, however.

footnote
1. This issue is still with us, for that matter; I have seen at least one video analyzing the differences between the Epyc CPUs and current Xeons, and arguing that the real advantage AMD is aiming for is maximizing dies per wafer yield - by using four dies connected with the 'Infinity Fabric' internal bus, rather than trying for one massive die, it reduces both the impact of die faults (as a larger die means that a flaw in just part of the die can lead to the whole die getting binned down, while a flaw in a different die means that the adjacent dies are still viable), and makes better geometric use of the wafer as a whole (by reducing edge loss). By relying on ∞Fa, they get less overall performance for tightly coupled parallel problems - something you was to avoid in parallelization anyway - but can sell the whole CPU package for a lot less money, and reduce the average TDP, and improve cycles per watt, at the same time. As Ivan Godard often stresses in discussing the Mill, for servers and HPC, P/W/D is the real key factor, meaning that the Epyc has a real advantage even if the absolute performance is worse than the Xeons.