OSDev.org

Posted: **Tue Dec 18, 2012 11:35 am**

At first, I choose Single address space for my OS, but now I noticed that all my code modules rarely access data outside of the context they run in. So what is the point of single address space?

As I had said in the previous posts, I divided memory space into two main segments, code and data, the others are private stacks. Both code and data are shared between modules. Now I think that only code is need to be shared (module dynamic linking) and data is private.
This will simplify programming because working and managing local address are easier than with global address.

Posted: **Tue Dec 18, 2012 11:53 am**

The main advantage of a single address space is not the ability for multiple processes to access each other's data - you can employ shared memory for this. The main benefit is that you do not have the expense of changing address spaces when switching tasks. On x86, a write to cr3 invalidates all TLB entries. Obviously with a single address space you cannot rely on the hardware isolating processes from each other.

Regards,
John.

Posted: **Tue Dec 18, 2012 12:17 pm**

jnc100 wrote:The main advantage of a single address space is not the ability for multiple processes to access each other's data - you can employ shared memory for this. The main benefit is that you do not have the expense of changing address spaces when switching tasks. On x86, a write to cr3 invalidates all TLB entries. Obviously with a single address space you cannot rely on the hardware isolating processes from each other.Regards,John.

So single address space is used to reduced cost of context switch due to page table changing. In x86 term, that means every process using same linear address space, right?

Posted: **Tue Dec 18, 2012 12:57 pm**

Hi,

jnc100 wrote:The main benefit is that you do not have the expense of changing address spaces when switching tasks. On x86, a write to cr3 invalidates all TLB entries.

That's the theoretical reason for it.

In practice there isn't much difference between flushing the TLB and leaving the TLB full of entries for the previous task that the next task won't use. To quantify this a little; both Nehalem and Sandy Bridge have 512 TLB entries for normal 4 KiB pages. This means that if TaskA was running, then other tasks were run, then TaskA runs again; if those other tasks (and the kernel itself) touch more than 2 MiB of the virtual address space then any TLB entries that TaskA used the first time will be gone before TaskA runs again.

Basically, it only avoids TLB misses if tasks don't do much (e.g. TLB entries for TaskA are still in the TLB after other tasks have been run, because those other tasks didn't do much and touched less than 2 MiB of virtual address space). However if you've got a system where you're frequently switching between tasks that don't do much, then performance is going to be bad because of all the task switches anyway.

Cheers,

Brendan

Posted: **Tue Dec 18, 2012 1:12 pm**

Hi,

Congdm wrote:So single address space is used to reduced cost of context switch due to page table changing. In x86 term, that means every process using same linear address space, right?

For 80x86, that means every process uses the same virtual address space; and every process can access all other process' pages and trash each other.

I have seen someone try to fix the massive gaping security problem by changing page attributes during task switches, so that a process can only access its own pages. Of course to change page attributes you have to invalidate the effected TLB entries, so this approach completely destroys the main (theoretical) benefit of using a single address space.

Also note that Intel added "Process-Context Identifiers (PCIDs)" in Westmere CPUs (launched in 2010). This feature allows the TLB to hold entries for several different processes at the same time, and avoids TLB flushing when you switch virtual address spaces. This means that (with a recent enough CPU and an OS that supports it) you get the same TLB benefits without using single address space.

Cheers,

Brendan

Posted: **Tue Dec 18, 2012 1:29 pm**

But as you pointed out above, I doubt there is any observable benefit since the chance of a complete task list cycle touched less than 2MB memory is narrow.

PCID may be used to identify two processes: the kernel and "all applications", to help kernel/driver stay in TLB; but then as with all kind of cache it comes with pollution problems.

Posted: **Tue Dec 18, 2012 2:12 pm**

It depends on your kernel model - if you have a microkernel where switching between servers is frequent and equally as likely as switching between user processes you may have the situation where relatively little is done in each time slice and/or very small areas of memory are touched, then the overhead of tlb flushing compared with not is no longer negligible. For example, my text mode vga driver only touches its stack (one page), its static variables (one page) and the text mode frame buffer (one page). The PS2 mouse/keyboard server likewise has a small footprint. If your heap is shared (per core) and continuous then its tlb entries may also be relatively preserved between task switches. For a larger example consider the effect of pressing a key on a keyboard when a console is active in a gui context in a microkernel: your messages may go something like: keyboard driver -> gui -> console window -> shell application -> console window -> gui -> video driver and these may require a task switch between each process (i.e. 6 task switches) to pass a relatively small amount of information and it is unlikely that many tlb entries will be overwritten in this sequence. Normally for this process you'd batch up several messages and process them at once but if you make the overhead of task switches minimal (e.g. with a single address space) you can significantly reduce your batch size and thus improve responsiveness. Obviously this is all just guesswork and unfortunately I'm several years before I'm able to produce meaningful benchmarks.

As regards PCIDs for separating the kernel from 'all processes' you can achieve something similar on older processors (P6 and above) with the use of global pages for the kernel.

Regards,
John.

Posted: **Tue Dec 18, 2012 3:15 pm**

Hi,

jnc100 wrote:It depends on your kernel model - if you have a microkernel where switching between servers is frequent and equally as likely as switching between user processes you may have the situation where relatively little is done in each time slice and/or very small areas of memory are touched, then the overhead of tlb flushing compared with not is no longer negligible. For example, my text mode vga driver only touches its stack (one page), its static variables (one page) and the text mode frame buffer (one page). The PS2 mouse/keyboard server likewise has a small footprint. If your heap is shared (per core) and continuous then its tlb entries may also be relatively preserved between task switches. For a larger example consider the effect of pressing a key on a keyboard when a console is active in a gui context in a microkernel: your messages may go something like: keyboard driver -> gui -> console window -> shell application -> console window -> gui -> video driver and these may require a task switch between each process (i.e. 6 task switches) to pass a relatively small amount of information and it is unlikely that many tlb entries will be overwritten in this sequence.

Note that "unlikely that many TLB entries will be overwritten" also means "unlikely that many TLB entries will be used". For example (your example) if the keyboard driver only touches 3 pages (one code, one stack and one data) then there's a maximum of 3 TLB misses you'd be avoiding; and if these pages were recently used they're likely to be in the L2 or L3 cache and therefore these TLB misses wouldn't be as expensive as a full "fetch page tables, etc from RAM" TLB miss.

However, what was happening before the key was pressed? Possibly more likely is that the CPU was busy doing some background task and originally the TLB contained none of the entries. The exact sequence may be more like

Switch to keyboard driver. Keyboard driver touches several pages and causes several TLB misses
Switch to GUI. GUI touches several pages and causes several TLB misses
Switch to console window. Console window touches several pages and causes several TLB misses
Switch to shell application. Shell application touches several pages and causes several TLB misses
Switch to console window. Console window touches several pages and causes several TLB misses because the video output code and data is in different pages to the keyboard input code and data. Note: Here you might avoid one whole TLB miss because the stack is still in the TLB!
Switch to GUI. GUI touches several pages and causes several TLB misses because the video output code and data is in different pages to the keyboard input code and data. Note: Here you might avoid one whole TLB miss because the stack is still in the TLB!
Switch to video driver. Video driver touches many pages (I'm assuming it updates the screen, so about 1 MiB of pages copied to another 1 MiB of pages), causes many TLB misses and wipes out the entire previous contents of the TLB.
Switch back to that background task, which finds none of its old TLB entries are still present.

If you add that up, there may have been about 550 TLB misses that occur despite the single address space stuff, plus a relatively insignificant 2 TLB misses that the single address space stuff successfully avoided.

Cheers,

Brendan

Posted: **Tue Dec 18, 2012 3:40 pm**

Regardless of TLB misses, the whole point of single address space is that you don't need a TLB and an MMU at all. Roughly with an MMU enabled, your CPU core is going to do ~1.3 times as many memory references than if the MMU wasn't used at all. An MMU basically slows down your system. Also, on silicon, an MMU together with a HW page tables walker takes a lot of area compared to the rest of the core (not including caches), which also increase the power consumption.

Also in a kernel, with single address space you will avoid many of the complexities in the memory manager if the MMU is removed.

The MMU might in some cases be a handy thing but I'm not sure if that was the right thing to do which began around the beginning of the 80s. I'm personally not too keen on an MMU and it feels similar to a flash translation layer for flash memory which is something necessary to hide drawbacks with a technology.

Posted: **Tue Dec 18, 2012 4:46 pm**

You're forgetting that paging is quite efficient at solving the fragmentation problem, and that it doesn't need the additional complexity and bugs of a proving compiler.

Small address spaces using segmentation gets the best of both worlds - if you grow out of the space it typically means you'll be trashing too much of the tlb anyway.

Posted: **Tue Dec 18, 2012 5:08 pm**

Hi,

OSwhatever wrote:Regardless of TLB misses, the whole point of single address space is that you don't need a TLB and an MMU at all. Roughly with an MMU enabled, your CPU core is going to do ~1.3 times as many memory references than if the MMU wasn't used at all.

Worst case would be random accesses where every access causes a TLB miss (which causes up to 4 extra memory fetches in long mode), resulting in up to 5 times as many memory references. Best case is "amost zero difference" (e.g. repeatedly accessing the same pages). For most software (where most of the time is spent in a small part of the code) the extra overhead of paging is typically insignificant.

OSwhatever wrote:An MMU basically slows down your system.

This is wrong. Copying large amounts of data around to cope with fragmentation issues, copying large amounts of data because you can't do "copy on write", loading data you won't ever need from disk because you can't do memory mapped files, wasting lots of RAM (and having a lot less for things like file caches) because you can't do things like allocate on demand, etc. Without an MMU performance suffers badly.

Basically, the relatively insignificant extra overhead of the MMU is dwarfed by the increased performance you get from using the MMU properly, which results in a large performance increase rather than a performance decrease.

OSwhatever wrote:Also, on silicon, an MMU together with a HW page tables walker takes a lot of area compared to the rest of the core (not including caches), which also increase the power consumption.

For which toy CPU/s is the core so small and poorly optimised that the MMU's size is actually relevant? I'm guessing you're thinking of something without floating point or SIMD or out-of-order execution or anything else (maybe Z80?).

OSwhatever wrote:Also in a kernel, with single address space you will avoid many of the complexities in the memory manager if the MMU is removed.

Wrong. Paging makes a lot of things easier because you don't need to care about processes trashing each other or which physical pages are being used. It only gets complicated when you start doing advanced things that are almost impossible without an MMU (where "complicated" is a lot less complicated than "OMG I need to write an emulator and run processes in a virtual machine just to do swap space efficiently because the CPU is worthless piece of puss").

OSwhatever wrote:The MMU might in some cases be a handy thing but I'm not sure if that was the right thing to do which began around the beginning of the 80s. I'm personally not too keen on an MMU and it feels similar to a flash translation layer for flash memory which is something necessary to hide drawbacks with a technology.

Trust me - when people think the MMU is "bad" it has nothing to do with the MMU and everything to do with the person's lack of experience. They're learning, the MMU seems big and scary and they don't see the advantages or understand how the MMU can be used properly to improve performance, so they look for reasons to avoid it. It's natural.

[EDIT] I just wanted to add that not having an MMU *can* make sense for some small embedded systems - e.g. where you only have one "application" and can use fixed physical addresses for most things, and don't have any disk/storage IO. For example, for a little MIPS CPU in an ethernet switch or router you wouldn't need an MMU (and probably wouldn't even bother with malloc()/free() ). [/EDIT]

Cheers,

Brendan

Posted: **Tue Dec 18, 2012 7:09 pm**

Brendan wrote:Wrong. Paging makes a lot of things easier because you don't need to care about processes trashing each other or which physical pages are being used. It only gets complicated when you start doing advanced things that are almost impossible without an MMU (where "complicated" is a lot less complicated than "OMG I need to write an emulator and run processes in a virtual machine just to do swap space efficiently because the CPU is worthless piece of puss").

That right, virtual memory, be it paging or segmentation, makes things easier because managing global address is harder than managing local address of each process.

Brendan wrote:This is wrong. Copying large amounts of data around to cope with fragmentation issues, copying large amounts of data because you can't do "copy on write", loading data you won't ever need from disk because you can't do memory mapped files, wasting lots of RAM (and having a lot less for things like file caches) because you can't do things like allocate on demand, etc. Without an MMU performance suffers badly.

Brendan, can you explain why you can't do memory mapped files without virtual memory? I don't know much about it. About fragmentation issues, in 32 bit address space, with the size of memory near 4 GB, paging is irrelevant, right?

Posted: **Tue Dec 18, 2012 7:51 pm**

Hi,

Congdm wrote:
Brendan wrote:This is wrong. Copying large amounts of data around to cope with fragmentation issues, copying large amounts of data because you can't do "copy on write", loading data you won't ever need from disk because you can't do memory mapped files, wasting lots of RAM (and having a lot less for things like file caches) because you can't do things like allocate on demand, etc. Without an MMU performance suffers badly.
Brendan, can you explain why you can't do memory mapped files without virtual memory? I don't know much about it. About fragmentation issues, in 32 bit address space, with the size of memory near 4 GB, paging is irrelevant, right?

The normal way (for paging) is to wait until a page is accessed before allocating a page of RAM and loading the data into that page. This means that if you memory map a 12345 GiB file and only access 2 bytes of it then you only allocate 1 page of RAM and only load 1 page from disk. It also means that if you need that RAM for something else you can simply free it (and fetch the data from the file again later, if necessary).

With segmentation but without paging, you can't do that. You can load "x bytes" at the start of the segment or at the end of the segment and then increase the size of the segment when you get a general protection fault; but if you only read 2 bytes in the middle then this might mean loading half the massive file anyway. Increasing the size of the segment is also a huge nightmare due to fragmentation issues - for example if the application reads 1 byte at the start of the file and another byte at the end of the file, then you'll probably have to shift everything in RAM just to find a large enough piece of RAM to use for the file. For this method, the OS could still shrink the segment if it needs more free RAM but only at one end (which may not be the "least recently used" part that is less likely to be needed).

Without segmentation or paging, you can't do anything except allocate a huge amount of (contiguous) RAM and load 12345 GiB of data from disk for almost no reason at all. This is obviously silly - it defeats the point of bothering with memory mapped files, and it'd be much better to make the application deal with the hassle of loading and caching what it needs (and freeing what it doesn't need when the OS says it needs some RAM back).

Cheers,

Brendan

Posted: **Tue Dec 18, 2012 9:32 pm**

Thanks, now I understand the problem. With large memory mapped file, the physical address space feel so small, so we need to extend the address space by paging. Segmentation can not do that, its main purpose is separating, while paging main purpose is extending.

Posted: **Wed Dec 19, 2012 12:08 am**

Brendan wrote:Note that "unlikely that many TLB entries will be overwritten" also means "unlikely that many TLB entries will be used".

This is true for servers but not necessarily for user processes. My point was that a large number of task switches between servers may only overwrite a small number of TLB entries that have been occupied by user program data, which would generally use a much larger working set (think database or modelling app etc). In a monolithic kernel you'd isolate each process in its own address space assuming that the large working set used by each process means that preserving TLB entries between task switches is not worth it, and that any time the kernel is called the kernel would be marked as global pages anyway and therefore use the global TLBs which wouldn't have been trashed. For a traditional microkernel you can't do this any every switch would trash all the TLB entries of that large user process. If, on the other hand, you use a single address space with lightweight servers then the switch through the keyboard/console/display servers etc would (hopefully) trash relatively few of the TLB entries of the user process.

Brendan wrote:Switch to video driver. Video driver touches many pages (I'm assuming it updates the screen, so about 1 MiB of pages copied to another 1 MiB of pages), causes many TLB misses and wipes out the entire previous contents of the TLB.

Whilst I agree with most of your calculations I think that any graphics driver which copies 1 MiB of pages through the CPU (rather than using hardware DMA copying) for updating a single character within one small (e.g. 16x16 pixel) area of the screen probably needs a rewrite.

Regards,
John.

OSDev.org

What is the point of single address space?

What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?

Re: What is the point of single address space?