Designing a new, protected, memory model

rdos · Post by **rdos** » Sat Jan 29, 2011 3:19 am

I think I'll implement a new memory model that will at least partly be able to resue the large code-base for 32-bit flat memory models without all the problems of device driver isolation.

There is basically two different approaches to this:

* Using a flat memory model in kernel, and letting device drivers corrupt each others code & data freely. This is fast, but can lead to problems that are almost impossible to find.

* Running device drivers in userspace in a separate process context. This provides very good isolation between device drivers at the cost of lousy performance in inter-device-driver calls

I propose a third alternative:

* Assign a single code and data segment to each device driver. Provide a heap allocator in the data segment in order for the device driver code to see all local memory in relation to the same segment (default data segment). This provides both good isolation and a fast solution for most inter device-driver communication, except when pointers are passed. Pointers either must be mapped into the default data segment, or the code needs to be able to handle 48 bit pointers. The most important issue is that the bulk of the code in the device-driver can use C/C++ written for flat-memory model adding the advantage of reuse of existing code.

tom9876543 · Post by **tom9876543** » Sat Jan 29, 2011 4:05 am

It sounds like your memory model will be x86 specific. Do you care about portability to other CPUs?

rdos · Post by **rdos** » Sat Jan 29, 2011 4:31 am

tom9876543 wrote:It sounds like your memory model will be x86 specific. Do you care about portability to other CPUs?

Not a bit. The only thing I care about is being able to use a larger code base for RDOS. RDOS is already x86 (32-bit) only, and will not be made into a portable OS.

bewing · Post by **bewing** » Sat Jan 29, 2011 5:44 am

Hmmm. This already sounds interesting to me. My OS is also x86-specific, and I'm very willing to break the "microkernel rules" to get extra performance.

I was previously thinking of having two tiers of drivers for each device (that correspond exactly to your two categories), depending on how much the superuser trusted the coding skills of the provider. Obviously, your way simplifies my life.

b.zaar · Post by **b.zaar** » Sat Jan 29, 2011 6:03 am

If you are using data segments and 48 bit addressing does it still count as a flat memory model?

rdos · Post by **rdos** » Sat Jan 29, 2011 8:15 am

b.zaar wrote:If you are using data segments and 48 bit addressing does it still count as a flat memory model?

No, it is not a flat memory model. Watcom calls this a 32-bit small memory model. It has one code segment and one data segment, but then (normally) it would have a heap that would be far and use 48 bit addressing. The thing is, with a far heap, most flat memory model code will not work, and additionally, C compilers are really bad at far addresses. By placing the heap in the data segment, there are no far pointers internally, and most flat memory model C code would work, but it could never address data outside of its scope since it doesn't know about segmentation. The only thing that won't work is to read/write code from a flat memory model as CS will not be identical to DS, but indirect calls will work just fine.

The trick is to provide the heap for C in the default data segment. I think this can easily be done by paging. When the heap is exhausted, the C runtime would simply allocate a new, larger linear address space, copy all the page tables from the old linear address space, and simply remap the GDT data selector. The device driver won't notice that the data segment now is mapped to a new linear address range with a larger limit.

There is one additional problem though. Normally, SS will not be identical to DS, so pointers on the stack must be distinguished from pointers to data. This could either be solved by forcing pointers to stack to be 48-pointers, or providing a stack in the data segment, and switch to this upon entry to a function. The latter will not handle multitasking, rather there would be a need to lock upon entry to a method. I suspect both methods have their utility. Another problem is that many C programs expect to have a huge stack, while RDOS normally only assigns a 512 byte stack to ring 0. Stack-switching would solve this issue for C code that uses too much stack. In order to protect the stack from overflows, it should be placed at the very beginning of the data segment, as overflows then would protection-fault instead of overwriting data. The heap should be at the end (it grows the size of the data segment). Open Watcom also offers "based pointers" as an alternative solution.

The model would imply that device drivers are separate modules, and not linked into a huge kernel.

OSwhatever · Post by **OSwhatever** » Sat Jan 29, 2011 12:31 pm

If you look at the old Atari ST is has some early version of translation but not with the same performance penatly.

http://www.dadhacker.com/blog/?p=1355&c ... ent-144484

The MMUs I knew about did page table walks of a multi-level tree; those multiple indirections implied complex, stateful and slow machinery. There was no room in the ST’s memory controller for the caches required to make a table-based system perform reasonably, even if the gate count of table-lookup hardware had been possible. The ST was no VAX. We had to pay dearly for chip area, schedules were tight, and DRAM timing was even tighter. Nobody wanted to pay for a feature they’d never use.

Non-MMU-based systems used base-and-bounds; a favorite technique in mainframes and minis from the 60s and 70s. We could get protection by checking user accesses against limit registers, a pretty cheap operation, but that wouldn’t get you relocation. To do that you had to muck with the address bits, and do an addition.

The problem was, there wasn’t time to do an addition with the necessary carry-propagation on every single address issue, not to mention the gate count.

So how does a typical Unix process grow? The text and data are at the bottom of the address space and don’t move; the bss follows those, and grows up via the “brk.” The stack grows down. That’s it. Very simple, very hippy 70s.

So imagine something really minimal, like replacing a bunch of address lines with a chosen constant value for user-mode accesses. Leave everything untouched for supervisor accesses. That’s it, that’s your total relocation. It’s really simple to implement in hardware, just a latch and some muxes to choose which address lines go through and which get replaced.

For address space growth you have another register, a mask or a bit count, that checks to see if some number of the issued upper address bits are either all zero or all one. You start the stack at 0xfffffffe and grow down. You start the bss low and grow up. A variable number of bits in the middle of each address are simply substituted. If the upper N bits aren’t all 0000…00 or 11111…11 then you generate a fault.

Now you have a system that both relocates addresses and handles growth in two directions in powers of two. You use throwaway instructions to do stack growth probing (dud sequences that don’t need to be restarted if they fault), and that needs a little compiler work, but it’s not too bad. Processes are tiled in memory at power-of-two addresses, so there’s more physical copying going on than you probably like when stuff grows, but again, it’s not too bad. Welcome to the world of doesn’t-totally-suck, and probably runs rings around a PDP-11 despite its limitations. AT&T SVR-N didn’t have paging anyway (like I said, they should have stuck with phones).

Is this something you might be after? I think this can be a protection on very simple systems like small embedded ones. However in larger systems you want full paging despite the performance penalty.

Still this early model requires continous physical memory and you probably need to pre-partion up the memory from the beginning. As you have to pre-partion them up, it basically becomes almost the same as the MPU regions found in ARM CPUs without MMU.

Combuster · Post by **Combuster** » Sun Jan 30, 2011 7:47 am

This basically boils down to a system using small address spaces with kernel-mode privileges. You only have to run driver code in ring 3 and you have the L4 system which actually is protected but still faster than normal microkernels - the logical performance optimisation is to reserve half the address space for SAS drivers and you still have a microkernel with fast accessible drivers.

rdos · Post by **rdos** » Sun Jan 30, 2011 8:27 am

Combuster wrote:This basically boils down to a system using small address spaces with kernel-mode privileges.

No, it is not. It will isolate drivers from each others, while still allowing communication without address space switches or IPC.

Combuster wrote: You only have to run driver code in ring 3 and you have the L4 system which actually is protected but still faster than normal microkernels - the logical performance optimisation is to reserve half the address space for SAS drivers and you still have a microkernel with fast accessible drivers.

Splitting the address space in two halves is not even remotely similar to this concept.

rdos · Post by **rdos** » Sun Jan 30, 2011 8:33 am

When recompiling the ACPICA driver with SS != DS it emits a huge amount of errors, so this concept does not work with a small memory model. A more appropriate memory model is the 32-bit compact memory model. It will have a single code segment (all procedure calls are near) and will use far pointers as a default. This memory model can compile the ACPICA driver without errors with SS != DS.

To optimize code, providing near pointer versions of all C-library functions with pointers would be an advantage, while still providing compability with eisting code (but with poorer performance due to many segment register loads).

Combuster · Post by **Combuster** » Sun Jan 30, 2011 11:14 am

rdos wrote:
Combuster wrote:This basically boils down to a system using small address spaces with kernel-mode privileges.
No, it is not. It will isolate drivers from each others, while still allowing communication without address space switches or IPC.

Yes it is. You are switching address spaces using segmentation. Shared memory is still IPC. And please, please look up your definitions before making errorenous comments.

OSDev.org

Designing a new, protected, memory model

Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model

Re: Designing a new, protected, memory model