Implementing Segmented Memory

tjhastings · Post by **tjhastings** » Wed Feb 27, 2008 12:40 am

Hello all,

I have been researching operating systems for a while now and would like to start designing my own. I am developing for x86 (32bit protected mode) and plan on using segmented memory instead of paging. Please no "SEGMENTS SUCK" replies; I am aware of the issues of segments vs. paging. This is simply a project for my own learning pleasure and not necessarily somehting practical. I would really like your opinions on the implementation I describe next, as well as any tips from others who use segments.

The basics are as follows:

The kernel classifies segments as static or dynamic. The difference being static segments cannot be resized, but dynamic segments can. For simplicity all code and stack segments are considered static. Data segments can be either static or dynamic.

Because static segments are never resized they can all be stuffed next to each other at one end of memory to help reduce external fragmentation.

Dynamic segments are placed at the other end of memory using a variation of the worst-fit allocation policy. The largest chunk of free memory is split into three parts; the new segment and two free chunks. The new segment ends up inbetween the two free chunks. This allows the new dynamic segment room to grow and leaves room below it for a different dynamic segment to grow.

I realize this placement policy leaves memory heavliy fragmented, but the goal is to try to keep dynamic segments as distant as possible so they can grow without having to move segments around memory.

Segments have to be moved to avoid telling processes they cannot have memory when memory gets crowded. The first thing I can think of to improve performance is to move the smallest segment possible. For example: if segment A could not grow because segment B was in the way, move the smaller of A or B.

So what do you think about that?

- TJ

Brendan · Post by **Brendan** » Wed Feb 27, 2008 1:22 am

Hi,

tjhastings wrote:Segments have to be moved to avoid telling processes they cannot have memory when memory gets crowded. The first thing I can think of to improve performance is to move the smallest segment possible. For example: if segment A could not grow because segment B was in the way, move the smaller of A or B.

So what do you think about that?

Just for fun, write a prototype physical memory manager for research purposes. The prototype doesn't need to manage any memory and can be a simple C application running on any OS (e.g. something that just manages a linked list of "start + size + type" structures that describe areas of pretend physical memory).

Then add a random number generator and see how it handles random operations of random sizes, and take measurements (e.g. average total bytes copied per operation, worst case bytes copied per operation, etc). You could probably also get RAM bandwidth measurements (e.g. RAM bandwidth measurements from SiSoft like this) and calculate time used.

Then, you'd be able to tweak things and/or try alternative algorithms to see if you can improve performance, and wouldn't need to waste time writing the rest of the OS to find out.

Note: Absolute worst case would be copying about 3.5 GB of RAM just to allocate 2 bytes. For a modern computer with 1000 MB/s RAM bandwidth, it works out to about 1.75 seconds per byte. Of course the absolute worst case would be extremely unlikely...

Cheers,

Brendan

tjhastings · Post by **tjhastings** » Thu Feb 28, 2008 5:23 pm

Brendan wrote:
Just for fun, write a prototype physical memory manager for research purposes. The prototype doesn't need to manage any memory and can be a simple C application running on any OS (e.g. something that just manages a linked list of "start + size + type" structures that describe areas of pretend physical memory).

That is an excellent idea. Thanks for the tip.

- TJ

tjhastings · Post by **tjhastings** » Mon Mar 31, 2008 10:51 am

Hello again,

I have revised my design.

My original plan was to keep the process simple and implement a basic system using segmentation, and then add paging later on. However, after some thinking, I beleive it is probably better to include paging in the design from the beginning. Otherwise I may end up spending a lot of time reworking the entire system (or polishing a turd).

The more I think about it, the more I like the idea of using both paging and segmentation together. I think they compliment each other quite well.

Here is some of what I had in mind:

The entire system will use a signle set of page tables; i.e. one 4GB linear space that is carved up into segments. Through the magic of paging the problems with segments (nearly) vanish. They no longer require contiguous physical memory, or even have to consume as much physical memory as the length of the segment. Moving/compacting segments in linear space (manipulating page tables) is orders of magnitude faster than compacting segments in physical memory. Plus the system can take advantage of other niceties such as demand paging.

Having all processes share the 4GB address space is more limiting than 4GB per process. However, my system targets low-end PCs and I doubt this will be a problem. I suppose it could be argued that by the time I finish a PC with 4 GB of memory could be considered low-end...

Segmentation will be used for address space separation and protection in a typical manner. Each process will have at least 4 segments; one for code, one for data, one for a stack and one for a kernel stack (perhaps one more for thread local storage?). Sbrk/morcore/whatever is used to grow the data segment. Stack segments will grow automatically. A process will be able to request more segments from the kernel. Processes will be able share segments with each other. All of a process' segment descriptors are indexed in its LDT. The kernel and shared libraries are indexed in the GDT. No surprises here.

Actually, the system would not have to force a process to use multiple segments. (Although, I cannot fathom why somebody would not want to use segments...

) An executable could be written to use one of a number of different memory models; kind of like DOS. Indeed, trivial programs (Hello, World!) could operate in one small, fixed-size flat space. Hmmm. Kind of like EROS or L4's small spaces, except the size of the space is set at load time instead of fixed at 128KB. Other programs could have ds=ss and use a separate segment for code.

I was stuck for a bit trying to figure out how the system would keep track of segments and their associated pages, how to find a place in linear space to place a new segment, etc. I was thinking too hard. It is actually quite simple if each segment has a base that starts on a page boundry, and is a multiple of 4 kilobytes long. All the information the system requires is already available in the segment descriptors and page tables! When a segment is ready to be destroyed, and all of its associated pages freed, it is easy to calculate what range of PTEs to free using the segment's base and limit. Similarly, finding a valid base/limit for a new segment is as easy as searching the page directory/tables. Neat.

Scanning the page tables every time a segment is allocated is not the best/fastest method. However, the page directory can be used to skip 1024 pages at a time. Perhaps clever use of the extra bits in the PDE/PTEs can be used to speed things up? Hmmm. Also, a pointer to the last free spot found could be kept; not unlike the method used when searching a large bitmap.

I mentioned shared libraries being indexed in the GDT. If the index is constant, i.e. stdlib always has a segment selector 0x5B, then loading and linking is extreamly simple. Indeed there is no linking, or relocation tables or anything else. For examle:

Code: Select all

; stdlib.asm

malloc:
 jmp stdlib_malloc_func
free:
 jmp stdlib_free_func
printf:
 jmp stdlib_printf_func
...


; stdlib.inc

%define STDLIB 0x5B

%define malloc 0x0
%define free 0x4
%define printf 0x8
...


; hello.asm

[BITS 32]
[ORG 0x0]
%include "stdlib.inc"

[SEGMENT .text]
mov esi, a_string
call STDLIB:printf

a_string: db "Hello, World!",0

During system initialization, all the segment selectors in the GDT for shared libraries will be marked as 'not present'. When a program attempts to make a far call to a library a segmenation fault will occur. The segfault handler will have to determine that a shared library has to be loaded, and create a valid segment descriptor for it. Small libraries can be loaded all at once, and larger libraries can be demand paged in. Simple!

Those are the basics of the design thus far. I would like to think I am using segments for what they are good at: address space seperation/sharing; and using paging for what it is good at: efficiently using physical memory. My plans may change, but that is alright because I am designing and not coding!

Questions, comments, and criticisms are welocme.

- TJ

Brendan · Post by **Brendan** » Mon Mar 31, 2008 6:13 pm

Hi,

tjhastings wrote:Questions, comments, and criticisms are welocme.

tjhastings wrote:Actually, the system would not have to force a process to use multiple segments. (Although, I cannot fathom why somebody would not want to use segments... )

AFAIK there are no high level languages that support 32-bit segmentation (I know there's old 16-bit compilers for DOS, but they're 16-bit, and old, and not open source). No high level languages means everything (regardless of how mundane) must be implemented in assembly language, so you can say goodbye to 95% of all potential application developers. Worse, even for someone that's been using assembly language for a long time (e.g. me) segmentation is a pain in the neck, so you can say goodbye to the remaining 5% of potential application developers.

So, now you've got an OS with no applications (sure, ok, maybe 2 applications that you write yourself), that's slower than any other OS because you've got the overhead of segment overrides and segment register reloads everywhere and the overhead of paging (oh, yes).

You see different processes will be using entirely different sets of pages (in the same linear address space). The TLB isn't large enough to cover the entire linear address space, so you get TLB misses after switching processes (even though the pages are in the same linear address space). In general, caches are really good if the "working set" fits in the cache. The problem with task switching is that you're switching between working sets (where the previous task's working set is in the cache after the task switch, and the new task's working set isn't in the cache after the task switch).

Sharing the same linear address space will reduce the number of TLB misses, especially when one process switches to another process that switches back quickly (which is common for synchronous IPC), but (IMHO) most good OSs don't use synchronous IPC much (they're either monolithic kernels that use shared libaries and near calls for most things, or micro-kernels that use asynchronous IPC for most things).

That's enough of the disadvantages. Time for the advantages!

The first advantage is that you'd be able to allocate areas that are less than 4 KB in size without wasting memory.

tjhastings wrote:I was stuck for a bit trying to figure out how the system would keep track of segments and their associated pages, how to find a place in linear space to place a new segment, etc. I was thinking too hard. It is actually quite simple if each segment has a base that starts on a page boundry, and is a multiple of 4 kilobytes long.

Um, ok, forget the first advantage.

The second advantage is that (in theory) programmers could use seperate segments for seperate purposes to improve their chance of finding bugs (e.g. dodgy pointers, exceeding array bounds, etc). In practice, unless you write a high level language that does this automatically nobody will want to do it because it overcomplicates things, and anyone silly enough to do try to do it manually will probably get it wrong anyway. If people care it's easier to do your own checks without segmentation (e.g. the "BOUND" instruction) and/or use assertions.

Are there any practical advantages? Are the practical advantages likely to outweight the disadvantages?

Cheers,

Brendan

mrvn · Post by **mrvn** » Tue Apr 01, 2008 5:33 am

tjhastings wrote: Having all processes share the 4GB address space is more limiting than 4GB per process. However, my system targets low-end PCs and I doubt this will be a problem. I suppose it could be argued that by the time I finish a PC with 4 GB of memory could be considered low-end...

The cluster we sell usualy have 4-32GB per node and are all 4bit cpus.
Notebooks regulary come with 1-2GB ram and also many 64bit cpus.
Wall-Mart like PCs pretty much the same.

You are already developing for a dying dinosaur.

Amd64/Em64t doesn't have segments because nobody uses them in 32bit anymore. If you base your OS on segments then you won't be able to port it to 64bit without major redesign. Given the negatives of segments in the first palce and this why not take them out of the picture altogether?

If you use one flat address space with paging for all processes you have one problem though. You can't implement fork(). fork() would need a copy-on-write copy of the processes memory and that would give you 2 different pages at the same address. A common flat address space can't do that.

Now that sounds really really bad. But it actually isn't. There are quite a few systems without fork() capabilities that have no problem. The common uses of fork() fall into two categories:
- fork() + exec(): Use vfork() and exec()
- expensive threads: use actual threads and possibly thread local storage

The idea of thread local storage is a nice one and you have 2 choices of implementing it:
- use a register to point to the thread local storage (segment?)
- not so flat address space. You could reserve a top level page table entry for thread local storage. On each task switch you would reload that entry with the tasks/threads own entry.

My 2c.
Mrvn

tjhastings · Post by **tjhastings** » Wed Apr 02, 2008 8:04 am

Brendan wrote: AFAIK there are no high level languages that support 32-bit segmentation...

Yes, I am aware of the non-existant support for segmentation in HLLs (even Intel's tools do not support it!). Indeed "I cannot fathom why somebody would not want to use segments..." was meant to be a joke.

Brendan wrote: You see different processes will be using entirely different sets of pages (in the same linear address space). The TLB isn't large enough to cover the entire linear address space, so you get TLB misses after switching processes (even though the pages are in the same linear address space).

Good point. I did not think about that.

Brendan wrote: The first advantage is that you'd be able to allocate areas that are less than 4KB in size without wasting memory.
tjhastings wrote: I was stuck for a bit trying to figure out how the system would keep track of segments and their associated pages, how to find a place in linear space to place a new segment, etc. I was thinking too hard. It is actually quite simple if each segment has a base that starts on a page boundry, and is a multiple of 4 kilobytes long.
Um, ok, forget the first advantage.

Yes, the 4KB granularity is not something I like and I did not intend to leave it at that.

Brendan wrote:The second advantage is that (in theory) programmers could use seperate segments for seperate purposes to improve their chance of finding bugs (e.g. dodgy pointers, exceeding array bounds, etc).

I was reading, somewhere, somebody wanted to implement something like that. He wanted to place every string/linkedlist/array/object inside its own segment; kind of hyper-segmented. I aim somewhere inbetween flat and hyper-segmented.

Brendan wrote: Are there any practical advantages? Are the practical advantages likely to outweight the disadvantages?

Good question. Practical advantages? Probably none. Does there have to be? I consider this a learning project, and I like to experiment. I do not want to produce something useless, but I will not be heartbroken if that is the result as long as I learn something.

I did have one idea that some may find interesting, though I am not sure if it is possible. Even if it is, I am unsure how useful it could be.

Consider a segmented kernel. The code and data in separate segments. Each thread/process has its own kernel stack segment. Each call/trap/interrupt gate in the IDT refers to the kernel via both a selector and an offset. Thus, I wonder how difficult it would be to swap kernels at runtime by atomically changing the kernel's code segment descriptor in the GDT. Once the descriptor has been changed, all new interrupts/system calls go to the new kernel code. I do not think anything executing during the swap would affected, and should continue to run and finish safely.

Ther are, of course, a number of problems and limitations. The new code segment would have to have the same interrupt/syscall entry points as the original. The new code would not be able to change any of the system's data structures; which limits the changes that can be made. I am unsure what would happen in an MP box with a GDT per CPU. I am not sure how one would make sure nothing is executing in the old code segment anymore, so the memory can be reclaimed.

Thanks for the insight, good questions too. Makes one think. I like that.

mrvm wrote: You are already developing for a dying dinosaur.

Amd64/Em64t doesn't have segments because nobody uses them in 32bit anymore. If you base your OS on segments then you won't be able to port it to 64bit without major redesign.

I am well aware x86-64 does not support segmentation in long mode, nor does most any other architecture (at least not segmentation as Intel implimented). Yes, I am developing for 'dying dinosaurs', but that is all I have to work with. Porting to other architectures was not part of my plan. (If somebody wanted to donate a shiny new 64-bit box let me know! I am not picky and would settle for an Alpha.

) I could use an emulator, but that is not a very good replacement for the real thing.

mrvm wrote: Given the negatives of segments in the first palce and this why not take them out of the picture altogether?

That seems to be the common sentiment. I only recall seeing two hobby OSs that utilize segmentation for protection purposes, but I have yet to examine them closley. I am simply exploring possibilities, nothing in my design is set in stone.

mrvm wrote: If you use one flat address space with paging for all processes you have one problem though.

I would have more than one problem, unless the goal was a single-tasking OS.

mrvm wrote: The idea of thread local storage is a nice one and you have 2 choices of implementing it:
- use a register to point to the thread local storage (segment?)
- not so flat address space. You could reserve a top level page table entry for thread local storage. On each task switch you would reload that entry with the tasks/threads own entry.

I honestly know little about it, though I hope to learn more. I think Microsoft uses the FS register to locate TLS. And I read somewhere that AMD left FS and GS segment registers useable in longmode so Microsoft could continue to do so. I am not sure if that is true, but Microsoft is influential enough.

- TJ

Ryu · Post by **Ryu** » Wed May 28, 2008 10:48 pm

tjhastings wrote: Yes, I am aware of the non-existant support for segmentation in HLLs (even Intel's tools do not support it!). Indeed "I cannot fathom why somebody would not want to use segments..." was meant to be a joke.

For the record my dead OS was designed around segmentation models.

I found that designing around segmentation was the right choice for me. It basically accomplished my goal to understand the x86 processors more and goes hand in hand with assembly. I believe segmentation models were designed around x86 assembly anyway. If your looking to keep things simple for educational purposes then I do recommend segmentation over paging. I would think just about everyone of us no matter how much we try to prevent design changes to the kernel it will be reworked at some point. Just my opinion.

Nice to see old faces still working on thier OS in the forums :

OSDev.org

Implementing Segmented Memory

Implementing Segmented Memory

Re: Implementing Segmented Memory

Re: Implementing Segmented Memory