Page 6 of 7

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Thu Apr 12, 2018 11:24 am
by Schol-R-LEA
It just occurs to me that rdos might be conflating 'segmentation' with the concept of a 'modified Harvard architecture'.

Just to recap on this: the Harvard architecture, named after the Harvard Mark I electromechanical computer, is a type of stored-program computer in which the instructions store is physically separate from the data store. In the Mark I, this was done for practical reasons relating to how the instructions and data were routed to the CPU and ALU (which in most early systems were also physically separate units) - instructions would go to the CPU, computation data would go to the ALU, and the CPU would tell the ALU which operation to perform.

There was no straightforward way to transfer between the two memories. This wasn't seen as a problem, because the whole idea of a stored program system was in its infancy, and it was assumed that program stores would always be the smaller of the two. Most of the other systems of the time (the Colossus, the Atanasoff-Berry Computer, the Zuse Z3, and the ENIAC, and so forth) weren't stored-program systems at all (though ENIAC was later rebuilt as one), and the importance of that approach wasn't appreciated until around 1946.

The Von Neumann architecture (after Johnny von Neumann), which arose a bit later following the e 'Summer Camp' conference in 1946 and first used in the EDVAC and EDSAC computers, is the other common way to design a stored-program computer, and became almost but not quite universal in the 1950s and later. In this design, a single memory is used for both the instructions and the data, and the instructions are capable of modifying other instructions on the fly. This was risky, but was sometimes useful, or in some really early systems, necessary for some basic operations such as function calls (like with most things when they are being done the first few times, the early designers were often guessing at what would or wouldn't be needed, and often made a lot of mistakes - some of which would get permanently enshrined in the system later).

As with the Harvard architecture, this was originally just an engineering solution - with memories based on things such as mercury delay lines, and a CPU and ALU built on vacuum tubes, it was easy to re-route the signals to the right part of the system, but expensive to build two separate memories.

There was a disagreement from the start about whether the Harvard design was safer that the Von Neumann design, apparently, but in the end, practical engineering issues trumped the questions about how safe self-modifying code was for the first and second generations of stored-program electronic computers.

By the time transistor-based systems with ferrite-core memories were making those engineering considerations moot, the Von Neumann approach had proven useful, if not necessarily as safe. Computer designers started trying to come up with a way to secure the instructions part of the time, while still allowing the privileged system software to load programs as needed, or even monkey-patch code (e.g., in a combination loader and linker) before locking it again in order to run the program securely.

This led to the the 'Modified Harvard architecture', which is what is what we are really talking about when we discuss 'memory protection'. Any modern system with memory protection built into the CPU's memory management is, in effect, a Modified Harvard system (even though most introductory textbooks will still call it a 'Von Neumann' architecture). This would come to be standard on mainframes by 1970, and minis by around 1978 or so, but wouldn't start to supplant pure Von Neumann designs in microcomputers until the late 1980s.

Paging? Segmentation? Separate matters entirely. They both solve a different set of problems from the memory protection, as well as from each other. Segmentation, as I said before, is about stuffing an m-bit address space into n address lines when n < m. Paging is about moving part of the data or instructions in a fast memory storage to a slower one and back in a way that is transparent to the application programmers (that is, without have to explicitly use overlays and the like).

All three overlap with yet another separate idea, virtual address spaces. While VAS is often mistakenly thought to provide additional memory protection, this is not actually the case on the x86 - it is always possible to access other memory address spaces, if the memory protection doesn't prevent it, because the separate address spaces are all built on top of either paging, or segmentation, or in the case of the x86, both at the same time. However, by default the memory protection does do this for all non-privileged (i.e., user) code.

A memory protection system may need to work in conjunction with whatever other memory management sub-systems exist on a CPU, and may even incorporate side properties of them in order to organize the memory being managed (more on this shortly), they are orthogonal concerns from both memory protection and from each other. You can have memory protection without either segmentation or paging at all.

Caching adds some complexity to this, but since cache consistency is a problem anyway, those issue get resolved as part of the caching itself. Caches basically add a limited form of content addressable memory, where the memory tag is says block of memory is is caching, and those cache blocks may or may not correspond to pages or segments - mind you, paging fits better, since cache blocks are of a fixed, small size, which can be mapped to similarly sized pages (hence the performance difference sometimes seen when actively manipulating segmentation on the x86). However, no current systems applies tagging to the entire memory space, nor are the cache's tags accessible to the system software - they are entirely internal and managed by the hardware.

However, since both paging and segmentation involve breaking the memory into separate blocks, and the memory protection has to do the same, the memory protection can just use the blocks defined by the other sub-system rather than having its own blocks. This works out well, because the protection system has to check the validity of every memory access, while in most implementations, both paging and segmentation are translating between the effective addresses and the physical memory locations on each and every main memory access. Since both the protection checks and the translations are both necessary every time, it is easiest to do them together as much as feasible - no point in doing operations repeating operations that overlap a lot.

To sum up: on the x86 in 32-bit protected mode, there is no difference whatsoever in the degree of protection one gets by actively using segments from that gotten from by setting the segments to a flat virtual space. None. Period.

Segmentation only wins over paging if you are using separate segments for every individual data element - as in, every variable has its own segment. Even then, the only advantage is in how well the segment size matches the object's size; you can do the same thing with pages, but since the sizes are fixed it almost always has a size mismatch.

This isn't practical for either segments or paging on the x86, in any case, because of how the page tables and segment registers work. Individually protecting every object would require a radical redesign, something along the lines of... well, a capability-addressing system.

Capability-based addressing would be part of the memory protection as well, being basically a more fine-grained form of the same memory protections, except that it checks the source of the access rather than the element being accessed. Since the burden of proof is on the requester, rather than the object's state, it turns the entire approach on its head, and becomes a lot more flexible and secure.

My guess is that this is what rdos think segmentation is giving the system, but if so, he is mistaken. The memory protection system doesn't really provide that at all in any current CPU design, which is a damn shame because it would do exactly what he (and several others, myself included) seems to be looking for.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Fri Apr 13, 2018 5:50 am
by tom9876543
Schol-R-LEA wrote: As for those suggestions @tom9876543 made, the problem with it is that it would introduce a lot of complexity, and perhaps more importantly, demand a lot of IC real estate. The team developing the 80286 had the same problem that was one of the major factor in the 432's failure: they couldn't fit it onto a single chip without making it so large that the failure rate would make it economically infeasible ......
The suggested approach would have pushed the 80286 over the limit of transistor densities of the time
I disagree Schol-R-LEA.

My proposed 80286 does NOT have the following:
16 bit protected mode segments
limit checking
GDT
LDT
TSS
4 privilege levels
complicated transition between privilege levels
hardware task switching
instructions such as ARPL, LAR, LSL etc

My guess is the number of transistors required for a simple 32bit paging and "flat mode", would be similar to the number of transistors wasted on 16bit protected mode.
So my suggestion is feasible.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Fri Apr 13, 2018 8:58 am
by Schol-R-LEA
tom9876543 wrote:
Schol-R-LEA wrote: As for those suggestions @tom9876543 made [...] The suggested approach would have pushed the 80286 over the limit of transistor densities of the time
I disagree [...] My proposed 80286 does NOT have the following:
[ ... ]
My guess is the number of transistors required for a simple 32bit paging and "flat mode", would be similar to the number of transistors wasted on 16bit protected mode.
OK, I am not a CPU designer, but if I understand correctly, all of those things together would have been dwarfed by the addition of any kind of 'simple 32-bit paging'. Memory management units were seen as very expensive, rightly so given the limits of the technology of the time.

No one designing microprocessors had yet done paging on the same die as a CPU at that point, and it is my understanding that the reason for this was because paging would have taken up as much of the die as the rest of the CPU.

Indeed, no one had even made one for any of the mainstream microchips as a co-processor, AFAIK - for example, the M68451 MMU co-processor for the M68010 was released in the same year as the 80186 and 80286. The 8086 line had pinouts to communicate with one from the start (as someone - octocontrabass I think - pointed out already), but I don't think anyone made any for the 8086 until around that time, either. I believe that there were experimental ones being made in 1980 (e.g., the Berkeley RISC I and Stanford MIPS projects were getting started around then, though I don't know if an MMU was part of the plan at that stage given that they were supposed to be student projects, and I seem to recall that an effort to put a LispM on a set of chips was going on then too), and the 432 was certainly intended to have an MMU co-processor, but no one had actually made one for sale to the best of my knowledge.

So they were expensive to include. They were in general also seen as unnecessary given the uses that microprocessors were expected to be put to at the time. Even companies who took the home computer market seriously, such as MOSTek (who by then had been bought up by Commodore), Zilog, and Motorola, didn't see a need for them as an integral part of the CPU design. My understanding is that most of the industry thought Intel were crazy for trying to make one a required sub-system for the 432 (especially one using implementing capabilities) - and at the time, they were right to be skeptical.

I may be wrong about this, however, so if anyone more familiar with topic can chime in, I would appreciate it.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Fri Apr 13, 2018 7:13 pm
by tom9876543
Schol-R-LEA wrote:
tom9876543 wrote:
Schol-R-LEA wrote: As for those suggestions @tom9876543 made [...] The suggested approach would have pushed the 80286 over the limit of transistor densities of the time
I disagree [...] My proposed 80286 does NOT have the following:
[ ... ]
My guess is the number of transistors required for a simple 32bit paging and "flat mode", would be similar to the number of transistors wasted on 16bit protected mode.
OK, I am not a CPU designer, but if I understand correctly, all of those things together would have been dwarfed by the addition of any kind of 'simple 32-bit paging'. Memory management units were seen as very expensive, rightly so given the limits of the technology of the time.

No one designing microprocessors had yet done paging on the same die as a CPU at that point, and it is my understanding that the reason for this was because paging would have taken up as much of the die as the rest of the CPU.
The Motorola MC68451 MMU seems to have used 34,000 transistors:
http://blog.ehliar.se/post/58268464354/ ... 68851-pmmu
https://patpend.net/technical/68000/68000faq.txt

Wikipedia states the 8086 had about 29,000 transistors and the 80286 had about 134,000 transistors.

It is fairly clear: the 80286 could have been an 8086 + 32bit flat addressing mode + 32bit mmu.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Apr 14, 2018 3:13 pm
by Brendan
Hi,
tom9876543 wrote:
Schol-R-LEA wrote:OK, I am not a CPU designer, but if I understand correctly, all of those things together would have been dwarfed by the addition of any kind of 'simple 32-bit paging'. Memory management units were seen as very expensive, rightly so given the limits of the technology of the time.

No one designing microprocessors had yet done paging on the same die as a CPU at that point, and it is my understanding that the reason for this was because paging would have taken up as much of the die as the rest of the CPU.
The Motorola MC68451 MMU seems to have used 34,000 transistors:
http://blog.ehliar.se/post/58268464354/ ... 68851-pmmu
https://patpend.net/technical/68000/68000faq.txt

Wikipedia states the 8086 had about 29,000 transistors and the 80286 had about 134,000 transistors.
The Motorola MC68451 MMU didn't support paging - it had 96 "variable sized blocks" (segments). 80286 protected mode had up to 16383 segments (split into "global segments" and "local segments").
tom9876543 wrote:It is fairly clear: the 80286 could have been an 8086 + 32bit flat addressing mode + 32bit mmu.
It might be clear in hindsight, but foresight is never clear.

If you could travel back in time to when 80286 was being designed (around 1980) and tell Intel's engineers to include paging, they probably would have told you that paging isn't worth bothering with because almost nobody uses a multi-tasking OS and almost nobody cares about security. Even when Intel did include paging (80386, several years later) it was "mostly unused" for an entire decade (until Windows 95 was released).


Cheers,

Brendan

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sun Apr 15, 2018 1:37 pm
by Octocontrabass
Brendan wrote:The Motorola MC68451 MMU didn't support paging - it had 96 "variable sized blocks" (segments).
It does support paging. You can set those segments to be all the same size, and use them as a TLB for a much larger translation table. In fact, it's very similar to the MIPS R4000, except the R4000 manual calls them "pages" instead of "segments".

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sun Apr 15, 2018 3:01 pm
by simeonz
Octocontrabass wrote:
Brendan wrote:The Motorola MC68451 MMU didn't support paging - it had 96 "variable sized blocks" (segments).
It does support paging. You can set those segments to be all the same size, and use them as a TLB for a much larger translation table. In fact, it's very similar to the MIPS R4000, except the R4000 manual calls them "pages" instead of "segments".
Personally, I find such option interesting, because it could be useful in specific cases. At the same time, it would be sacrificing quite a lot of flexibility.

First, it means no demand swap of memory. Then again, you probably shouldn't swap in a well designed system.

Also, no caching of memory mapped storage. That is - no zero-copy I/O path that provides system-wide caching. You can manually cache buffered I/O in the application, but that could be slower and has no way to respond to system-wide memory pressure. Nonetheless, it will be possible to share the allocation of file content, as long as it is kept entirely resident.

There will be fragmentation issues as well. For example, if a process grows its heap, it will have to relocate the heap segment to a larger vacancy in memory. Also, there wont be enough segments for every thread stack, so stacks will have to be heap allocated, thus they will have to be of fixed size (the pointers to stack variables cannot be easily redirected).

That said, there will be smaller latency on random access. This can be preferable to the large pages in x64, where the page sizes are either too small or too large. Or alternatively, this may enable simpler CPU design without things like out-of-order execution which mitigate the various latencies. But you will be missing OS functionalities, which may ultimately cause more redundant memory or even storage device traffic.

I find this interesting nonetheless, for things like special purpose embedded CPUs or as an alternative to large pages. I cannot see it becoming part of the mainstream without paging present to supplement it.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Mon Apr 16, 2018 4:12 am
by Octocontrabass
simeonz wrote:
Octocontrabass wrote:
Brendan wrote:The Motorola MC68451 MMU didn't support paging - it had 96 "variable sized blocks" (segments).
It does support paging. You can set those segments to be all the same size, and use them as a TLB for a much larger translation table. In fact, it's very similar to the MIPS R4000, except the R4000 manual calls them "pages" instead of "segments".
Personally, I find such option interesting, because it could be useful in specific cases. At the same time, it would be sacrificing quite a lot of flexibility.
I think you misunderstand. The "segments" in the MC68451 are functionally equivalent to pages, and absolutely nothing like segments in x86. They are always a fixed power-of-two size, with both physical and virtual addresses that are a multiple of the size. A single virtual address space may contain as many segments as you want; when you want more than 32 segments in a single address space you'll have to swap segments in and out of the MC68451 much like how an x86 CPU must periodically swap page definitions in and out of its TLB.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Mon Apr 16, 2018 7:32 am
by simeonz
Octocontrabass wrote:A single virtual address space may contain as many segments as you want; when you want more than 32 segments in a single address space you'll have to swap segments in and out of the MC68451 much like how an x86 CPU must periodically swap page definitions in and out of its TLB.
I didn't know you had limited size control. So - does it maybe generate TLB miss exception that the OS handles and you have OS controlled TLB thrashing rather than hardware controlled TLB thrashing. Apparently (according to wikipedia) Itanium also had option for this kind of thing. But that is not the same performance advantage. So, I wonder, what would be the advantage - CPU circuitry optimization or could it benefit software in particular scenarios?

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Mon Apr 16, 2018 8:17 am
by Octocontrabass
simeonz wrote:So - does it maybe generate TLB miss exception that the OS handles and you have OS controlled TLB thrashing rather than hardware controlled TLB thrashing.
Yep, this is exactly how the MC68451 and R4000 work.
simeonz wrote:So, I wonder, what would be the advantage - CPU circuitry optimization or could it benefit software in particular scenarios?
The main advantage is simpler (and therefore cheaper) MMU circuitry. Hardware TLB management takes a lot more transistors than software TLB management, and the way the MC68451 does it is about as simple as you can get. The flexibility it provides can also be an advantage, since you can define your own page table format and come up with your own TLB fill algorithm. (You can also mix page sizes, but you can do that with the MC68851 too, so it's not an advantage specific to the software-controlled TLB.)

The main disadvantage is that, unless your TLB fill algorithm is very very good, it will be slower than a hardware TLB fill.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Tue Apr 17, 2018 4:02 am
by simeonz
Octocontrabass wrote:The main advantage is simpler (and therefore cheaper) MMU circuitry. Hardware TLB management takes a lot more transistors than software TLB management, and the way the MC68451 does it is about as simple as you can get. The flexibility it provides can also be an advantage, since you can define your own page table format and come up with your own TLB fill algorithm. (You can also mix page sizes, but you can do that with the MC68851 too, so it's not an advantage specific to the software-controlled TLB.)

The main disadvantage is that, unless your TLB fill algorithm is very very good, it will be slower than a hardware TLB fill.
Understood. Thanks for clarifying.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Jul 07, 2018 6:21 am
by Qbyte
Brendan wrote:In my experience, people that say things like "segmentation consumes less memory" and "segmentation is faster" are people that have no idea how to use paging properly to avoid consuming a massive amount of RAM, and have no idea how expensive fragmentation/compaction of the physical address space becomes in practice.
The thing is, segmentation can be implemented in many ways, not just via the textbook "segment number + offset" method or Intel's dubious hack. For example, one such way is as follows: each entry in the segment table consists of a base address, a limit address (which together define a contiguous range of addresses) and an offset (and if desired, some r/w/x/d flags). If a virtual address issued by the CPU is within a defined range, the offset for that range is added to the virtual address to form the physical address, otherwise a segfault is raised. "Protection domains" are also trivially supported by setting the offset of a segment to 0 (so that a single address space can be shared and accessed by all processes, where everything can be directly referenced by pointers). The MMU could also be designed such that it can be configured to disable checks for reads and/or writes and/or executes from memory (on a per-process basis), so that a process could, for example (depending on the permissions it has been granted by the user/OS), read from anywhere in memory and execute code from anywhere in memory, but can only write to designated segments to prevent it from clobbering external code and data. This scheme combines the benefits of both paging and segmentation (and more), without introducing any meaningful drawbacks.

From segmentation, it inherits the major advantage of being able to map arbitrarily large contiguous regions with a single TLB entry, which is extremely relevant in practice because almost every program or object can take advantage of this (for example, mapping each library or even all libraries on the entire system into a single segment). Many user-space programs (like those that don't require dynamic memory) will even be able to utilize the scheme to such an extent that they could be mapped entirely using anywhere from 1-3 segments. On a system with, say, 1024 TLB entries in the MMU, this would mean that many hundreds of processes could share the MMU simultaneously, practically eliminating TLB misses/thrashing and enabling cheap context switches.

Then from paging, it inherits the ability to efficiently avoid fragmentation/compaction issues, expand the address space as needed, and support efficient copy-on-write. It can do the former if required by simply emulating paging: the base address of each segment is set as the one after the limit address of the segment before it (supporting a flat model). Additionally, it can do this with whatever page granularity suits the application best, down to a single byte, on a per-page and/or per-process basis. Then to efficiently expand the address space, either the last segment of the data structure that is growing can extend its limit if there would be no conflict in physical memory, or a new segment(s) can simply be appended and mapped to a free portion of physical memory, obviating the need to ever compact memory. In a way, this form of segmentation acts as a run-length encoded version of paging, with additional benefits like being able to efficiently support a non-flat memory model, whereby if you have multiple dynamic objects within the same process, they can grow/shrink independently without affecting each other.
Brendan wrote:Let's design an OS based on persistent objects.
A major idea you seemed to neglect in your design was the concept of a single address space OS, which segmentation and 64-bit architectures lend themselves to extremely well, especially in the case where the architecture makes use of non-volatile RAM to create a persistent single level store for all code and data. Those features, taken together, should serve to drastically reduce the complexity of an OS and its API, while also increasing system-wide performance. Essentially, the OS would be an exokernel which does little more than allocate resources such as memory, CPU time and I/O to processes and objects. Each process would have direct access to the resources that have been allocated to it by the OS, such as certain I/O ports, GPU memory, etc, which would, among many other benefits, enable fast and secure user-space drivers. Global optimizations are now a non-issue because there is no need to worry about page replacement, caching disk blocks, planning of disk I/O, or related concerns, which is all made possible by the NVRAM. Simply put, I think you have underestimated the profound benefits that a persistent single level store can bring to the table for OS design and performance.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Jul 07, 2018 8:00 pm
by Schol-R-LEA
Qbyte wrote:A major idea you seemed to neglect in your design was the concept of a single address space OS, which segmentation and 64-bit architectures lend themselves to extremely well,
WHAT.

I'm sorry, but... uhm... one of us is confused as to how one of the terms 'segmentation' and 'single address space OS' are defined, and I am not convinced it is me. Segmentation by its very nature involves dividing up the memory address space (in all forms I am familiar with, anyway - my impression is that what you seemed to be grasping towards earlier wasn't segmentation so much per-object protection using rwx ACLs, or put another way, capability-based addressing but without any actual, you know, capabilities - and if you aren't familiar with how capability tokens work, well, you aren't alone, as much as I like the idea myself I will admit I am a bit shaky on the details).

Now, I will admit that I have been a bit too adamant about equating segmentation with address space extension, but there is a reason for that - the terminology as used in the past has been inconsistent, as noted with how the Motorola docs for the MC68451 used 'segments' to describe what most of the world calls 'pages', and for that matter the complete different meanings (plural) of the term as used in object file formats (and while that topic and this one should be separate enough contexts that the overloading isn't ambiguous - even if the overloading within each of those contexts is - but from another thread going today I get the sense that might be anyway). Calling a horse a rabbit isn't going to foster clear communication, and neither is trying to reconcile two or more vocabularies that have significant independent evolution without making an effort to pick one meaning over another.

(Pardon me while I go bludgeon down the snarky part of my brain that just said something the effect of, "And yet we continue to write in English..." . OK, it is true that my native tongue happens to be The Great Thief of Vocabulary and Grammar, and has an orthography that would count as a crime against humanity were it deliberate, but for better or worse, it is the the language in which the overwhelming majority of computer theory and documentation is written in, and the only one which most of the posters here have in common with all the others.)

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sat Jul 07, 2018 10:32 pm
by Qbyte
Schol-R-LEA wrote:I'm sorry, but... uhm... one of us is confused as to how one of the terms 'segmentation' and 'single address space OS' are defined, and I am not convinced it is me.
The term "segment" has indeed been used in different ways by various manufacturers over the years, but fundamentally, a segment is nothing more than a contiguous range of addresses of arbitrary length, in contrast to a page which is a contiguous range of addresses of some fixed length. Whether or not these two schemes are used to implement memory virtualization (address translation) is a separate matter. Protection and virtualization are different concepts, but they are usually rolled into one in a given implementation.

In a 64-bit single address space OS, virtualization is not required (but can still be included) since every physical address within the entire machine can be directly accessed with a 64-bit reference. This means that all code and data can reference each other using their absolute addresses and the practical necessity for each process to have its own virtual address space like in a 32-bit system no longer exists.

In this scenario, segmentation can be used purely as a protection mechanism: each process is allocated 1 or more regions (segments) of physical memory where it is allowed to write to and it can't write to anywhere in memory outside of those regions, in order to protect the rest of the system from a buggy or malicious program. There's no address translation going on here; the physical memory addresses generated by the process are simply checked to make sure they are within an allowed region. Paging could be used to achieve the exact same thing. In this case, the hardware that sits between the CPU and memory that performs this task is called a Memory Protection Unit. Both segmentation and paging can then implement memory virtualization on top of that, by remapping a valid memory address generated by a process to another one designated by the OS, at which point the afformentioned hardware unit becomes a Memory Managment Unit.

Re: Memory Segmentation in the x86 platform and elsewhere

Posted: Sun Jul 08, 2018 2:51 am
by simeonz
Qbyte wrote:With circa 2 TB of on-chip non-volatile RAM, segmentation would be well positioned to undergo a renaissance.
NVRAM would indeed simplify things, making page replacement and journaling (for filesystems and databases) redundant, but I am not sure how affordable such amount of on-chip memory is going to be. Even if it became commodized, which would drastically reduce its costs, it may not be a drop-in replacement financially. For analogy, hdds are essentially a niche technology at this point, and yet, ssds are still fundamentally more expensive to produce. Their grace is that most people don't need large storage, but need faster storage. On the other hand, for corporate nearline storage/active archives, it will be difficult to make ssds a drop-in replacement in terms of their total cost of ownership. (Unless the hdds become unprofitable, at which point there will be simply no choice.) The point is, are you suggesting that our hardware manufacturing methods will mature, or that our economy will grow?

I myself would prefer SAS, because process context switches are conceptually redundant. But when you think about it, the question is not between segments and pages (pages also offer large and huge variants), but between fine and coarse grained address space translation. And I am inclined to think that fine grained translation cannot be shunted easily. It is instrumental in dealing with memory fragmentation without software translating structures, or moving compaction, or relying on coalescing allocators. It also enables space saving tricks like COW. The first question actually is - should address space translation be performed in software or hardware. One day the raw processing power may permit us to perform most hardware functions in software, but it cannot be predicted with the current halt in production technology. Alternatively, unifying the memory and storage with NVRAM as you suggested may replace the file system translation layer with the hardware translation mechanism. There is no point in having both, one way or the other. The second question is, whether the translation should use intervals of arbitrary sizes, indexed by ordered trees, or fixed hierarchy of block sizes, indexed by prefix trees. The latter are generally faster and easier to implement in hardware, but tend to be taller. Whether we call the keys segments and use ordered trees is more or less an implementation detail. For large amounts of memory and after sufficiently long operation, the trees will become heavily populated with small physical fragments (small segments/pages). This will make the tree nodes slow to resolve. It has less to do with the choice of translating structure, as it has to do with the non-sequential dynamic nature of the physical allocations. With SAS, we may sometimes require slightly more lookups, since we are intermixing code that might access different data, but it may require fewer lookups as well, because we are not breaking up code that accesses the same data artificially at the protection boundary. The translations are likely to be reusable in TLB for both models. At the moment, the closest you could get to SAS is PCID support with hybrid (cooperative and preemptive) multitasking, but it is not optimal. And on a final related note - a third lookup technique can be used - hashtable directories, like the ones in Itanium. This approach improves the average performance at the cost of some performance unpredictability. I personally think it has a serious potential, despite its shortcoming.

P.S. In a nutshell, what I meant to say is - SAS vs MAS is not the same as translated vs untranslated (and coarse-grain vs fine-grain translated). And you can't have your usual ext3 prefix tree mapping in software for your NVRAM storage and consider the need for fine-grained hardware translation mechanism on top of it, because one of the two is then redundant.