mapping kernel address space into user address space

proxy · Post by **proxy** » Sun May 02, 2004 9:28 pm

A thought just occured to me. So we implement independant address spaces by each process having it's own page directory right? And it is common practice to hace the kernel address space mapped directly into the user space to simplify things...well when the kernel does dynamic allocation, does that mean we have to iterate through all processes and add these new pages to there memory maps as well, or is there some neat trick I am unaware of?

proxy

Candy · Post by **Candy** » Sun May 02, 2004 10:59 pm

proxy wrote: A thought just occured to me. So we implement independant address spaces by each process having it's own page directory right? And it is common practice to hace the kernel address space mapped directly into the user space to simplify things...well when the kernel does dynamic allocation, does that mean we have to iterate through all processes and add these new pages to there memory maps as well, or is there some neat trick I am unaware of?

proxy

the neat trick is probably having all kernel space page tables permanently mapped.

You could also use PAE, then you have a PD for each 1GB, and since kernel space is usually exactly 1 or 2 GB, you end up with one or two PD's you can map in each new process.

distantvoices · Post by **distantvoices** » Mon May 03, 2004 1:59 am

hmmm ... would yu suggest mapping them into an extra *kernel space page directory* too? then, the pagetables of the kernel 'd be present in each process adress space and by updating them in the kernel page directory the mappings in the processes 'd be updated automagically(tm).

The only thing: It would require a page dir switch at every allocation of page frames in kernel land. Could one consider this a fair penalty for being saved the need of crawling throu' all the processes page directories?

Candy · Post by **Candy** » Mon May 03, 2004 2:45 am

beyond infinity wrote: hmmm ... would yu suggest mapping them into an extra *kernel space page directory* too? then, the pagetables of the kernel 'd be present in each process adress space and by updating them in the kernel page directory the mappings in the processes 'd be updated automagically(tm).

No, not strictly. In PAE you have a PDPT that points to 4 PD's. In my setup the top one maps the kernel and the ones below that map the user space. When you adjust a part of the top one you automatically see that in all places where it's mapped (and I intend to use 2M pages there, so the TLB entries are not shared with user space). With SMP or NUMA you'd have to spread an IPI for TLB invalidation of that address, after wbinvd'ing the cache line. (can you come up with a more complex sentence?)

The only thing: It would require a page dir switch at every allocation of page frames in kernel land. Could one consider this a fair penalty for being saved the need of crawling throu' all the processes page directories?

Each time you allocate a page it's mapped in the One True Ring - eh - page directory. You only need to wbinvd() and invlpg() it once for each processor, then every process & processor can use that page. No reloading PD or CR3 for this, it's easier.

BI lazy · Post by **BI lazy** » Mon May 03, 2004 2:53 am

hmmm ... I'm thinking complicated. There'd be no need for that kernel page directory if I think correctly.

What, if enough page tables were present in kernel space? You enter a page frame and zapzarapp it pops up in every process adress space - for the kernel page tables are shared per se. you'd only need to update process page directories for page tables of the kernel.

Or am I talking rubbish and now near to becoming a fully qualified dodderer?

ah ... complex sentences: nay, no need for that. Let's keep 'em as simple as possible but not simpler

PAE: thats this 36 bit stuff isn't it? I'd need to research about this.

Candy · Post by **Candy** » Mon May 03, 2004 3:24 am

BI lazy wrote: hmmm ... I'm thinking complicated. There'd be no need for that kernel page directory if I think correctly.

True, in case of non-pae.

What, if enough page tables were present in kernel space? You enter a page frame and zapzarapp it pops up in every process adress space - for the kernel page tables are shared per se. you'd only need to update process page directories for page tables of the kernel.

Yes, but.

(there should be no need for explanation, but in case it's not clear, here it is anyway): you still need invlpg, wbinvd and possible ipi's.

PAE: thats this 36 bit stuff isn't it? I'd need to research about this.

That's the way to do NX-protection on 64-bit amd's (not the 64-bit iX86-64's, they're too stupid to realise the usefulness). Also, it's required for 64-bit on amd

. And, it allows you to use a policy for the top 1G, removing the need for 64 page tables (and 256k) being in use statically so you can use an easy top level map, it supports any amount of memory up to 4PB physical on 64-bit, 64GB physical on 32-bit, you have something like 14 available bits instead of 3, only improvements IMO

. And, you can still pretend the PDPT is the same as the top PD is the same as the top PT is the same as the top page. Same goes for AMD64, but then you get another level, PML4, above it. Still same principle though, see my pager code for an overly obvious implementation (map.c and unmap.c).

Pype.Clicker · Post by **Pype.Clicker** » Mon May 03, 2004 3:58 am

i don't want to be 'flaming', but isn't a bit restrictive to ask people to have AMD64 cpu just in order to run multiple address space ? ... Are there major concerns that would prevent the simple "have kernel-tables mapped in all directories" to work ?

Okay, i we have multiprocessors, removing/changing an entry in the kernel area will involve a signal to be sent to other processors so that they no longer rely on the TLB for that virtual address. but that's all i can see, and i'm not sure you can bypass this anyway...

So imho

Code: Select all

    pgEntry *KernelReference[]={
        [DIRECTORY] : PDE(pdbr,SYSTEM_ONLY|PRESENT),
        [KSTATIC]      : PDE(kstatic_table,SYSTEM_ONLY|PRESENT),
        [KDYNAMIC]  : PDE(kernel_heap_table,SYSTEM_ONLY|PRESENT),
     };

would be nice. When you want to create a user directory, you will have

Code: Select all

   ud=newTable();
   UserDirectory = TemporaryMapped();
   UserDirectory[DIRECTORY] = PDE(ud,SYSTEM_ONLY|PRESENT);
   UserDirectory[KSTATIC] = KernelReference[KSTATIC];
   UserDirectory[KDYNAMIC] = KernelReference[KDYNAMIC];

Candy · Post by **Candy** » Mon May 03, 2004 4:29 am

Pype.Clicker wrote: i don't want to be 'flaming', but isn't a bit restrictive to ask people to have AMD64 cpu just in order to run multiple address space ? ... Are there major concerns that would prevent the simple "have kernel-tables mapped in all directories" to work ?

You're not flaming. You're misunderstanding.

To use all those features I just named, you need an AMD64. For all those, but without the NX bit, you need a ppro or higher (that is, ppro, p2, p3, p4, derivates of that, k6-2 (afaik), athlon, duron, higher, things like that. Probably efficeons too.

And even then, we also illustrated a method that works on 386+es, in a similar fashion, including drawbacks.

Okay, i we have multiprocessors, removing/changing an entry in the kernel area will involve a signal to be sent to other processors so that they no longer rely on the TLB for that virtual address. but that's all i can see, and i'm not sure you can bypass this anyway...

Imo:

Code: Select all

// Mapping a kernel page
pager_map_page(virt, phys, flags);
apic_ipi(IPI_INVALIDATE, virt);

Code: Select all

void create_process_space() {
// creating a new process
int virt = masa_alloc(1);
struct pte *ptes = (struct pte *)virt << 12;
int phys=fp_get_zeroed_page()
pager_map_page(virt, phys, flags);
#ifdef PAE
  ptes[0x1FF].p = 1;
  ...
  ptes[0x1FF].addr = phys;
#else
  ptes[0x3FF].p = 1;
  ...
  ptes[0x3FF].addr = phys;
#endif
pager_unmap_page(virt);
return phys;
}

the last one of which is a function called from a create_process or such.

PS, just came up with another thought about why in 32-bit PAE you need kernel space to be from 2-3G, but I'm going to work that out first

. Just assume my kernel screwes up in pae mode.

Pype.Clicker · Post by **Pype.Clicker** » Mon May 03, 2004 4:43 am

You're not flaming, you're misunderstanding ...

doh. That's even worse ... i really need to get a tutorial about that "physical addresses extensions" feature ...

Brendan · Post by **Brendan** » Mon May 03, 2004 4:44 am

Hi,

Candy wrote: PS, just came up with another thought about why in 32-bit PAE you need kernel space to be from 2-3G, but I'm going to work that out first . Just assume my kernel screwes up in pae mode.

If your kernel screws up it's not likely to matter where it is ::)

If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.

Cheers,
Brendan

Candy · Post by **Candy** » Mon May 03, 2004 4:56 am

Brendan wrote:
Candy wrote: Just assume my kernel screwes up in pae mode.
If your kernel screws up it's not likely to matter where it is ::)

Well, yes it does in this case. The bug is that mapping a page does work with the algo in 32-bit nonpae, it does NOT work properly in 32-bit pae, but it does work again in 64-bit pae. The point is that the 32-bit pae second level table does not hold only the page tables, but a lot more. Updating them is not going to work nicely, so it bugs out with some mappings and multiple address spaces. Hard bug to find in practical tests, but easy in theory.

If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.

Why do you say that? My idea about syscalls is that the users executes a =SYSCALL= instruction, after which he is transferred automatically to the kernel running at that moment. Using a sysret he then returns to the normal code, running in 32-bit mode again. There's no difference whatsoever between 32- and 64-bit mode for me.

The only thing worth watching is the uasa, user address space allocator, that it doesn't hand out 64-bit addresses for a 32-bit program

. Am considering making two different ones at different syscalls...

Brendan · Post by **Brendan** » Mon May 03, 2004 8:41 am

Hi,

Candy wrote:
If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.
Why do you say that?

Hmm - must've been an air bubble passing through the arteries in my brain at the time

There are minor reasons that are easy to avoid.

BTW - why do you say you need to use WBINVD each time you insert a page into the linear address space? I never have, and the caches use ranges in the physical address space (and are kept up to date on multi-processor systems via. cache snooping).

Intel's recommended method is (system programmers manual, section 7.3):

1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing

Cheers,

Brendan

Candy · Post by **Candy** » Mon May 03, 2004 9:10 am

Brendan wrote: Hmm - must've been an air bubble passing through the arteries in my brain at the time There are minor reasons that are easy to avoid.

BTW - why do you say you need to use WBINVD each time you insert a page into the linear address space? I never have, and the caches use ranges in the physical address space (and are kept up to date on multi-processor systems via. cache snooping).

You're not the only one with bubbles through the arteries

Intel's recommended method is (system programmers manual, section 7.3):

1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing

I don't like abysmal performance on 4-way computers, so I really don't like disabling all processors. My idea is that you don't disable all processors, for kernel space you unmap the page / map a different page (if all is well, the page shouldn't have been in use anyway, and the pager code is mutexed per page) and don't mind about other processors, in user space you can alter the page / unmap it / anything like that without notifying the other processors, until you pass them an invlpg for that address using some sort of directed IPI, handled by a pure-asm very fast invlpg routine that should fit in a cache line. Seems faster to me...

Brendan · Post by **Brendan** » Mon May 03, 2004 12:02 pm

Hi,

Candy wrote:
Intel's recommended method is (system programmers manual, section 7.3):

1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing
I don't like abysmal performance on 4-way computers, so I really don't like disabling all processors. My idea is that you don't disable all processors, for kernel space you unmap the page / map a different page (if all is well, the page shouldn't have been in use anyway, and the pager code is mutexed per page) and don't mind about other processors, in user space you can alter the page / unmap it / anything like that without notifying the other processors, until you pass them an invlpg for that address using some sort of directed IPI, handled by a pure-asm very fast invlpg routine that should fit in a cache line. Seems faster to me...

Not disabling processors is a good thing (how about an 8 way opteron motherboard

) I'm a bit dubious as to the costs though... Also "if all is well, the page shouldn't have been in use anyway" doesn't necessarily apply when you consider pages sent to swap and memory mapped files, or space used for rapidly changing message queues.

Firstly if a page isn't present you can just map a page and flush the TLB on the local CPU. If another processor tries to access the page it might still be 'not present' in the TLB so you'd need to try INVLPG and see if that fixes things before mapping a page. INVLPG is much quicker than messing about with IPIs so we can forget about the costs involved with mapping a page.

Next, if a page is unmapped in part of the address space that is only ever used by a single thread (my OS supports this & calls it "thread space") then you can forget about TLBs on the other CPU.

If a page is unmapped in part of the address space that is shared by multiple threads (my OS calls this "process space"), then you only have to worry about flushing TLB's on other CPUs if they are running threads that belong to the same process.

My processes (applications, drivers, etc) have a flag in the executable file header that says if it supports multi-processor. If this flag is clear the scheduler will not allow more than one thread from the process to be running. I expect that most processes will not set this flag. Also if the process only has 1 thread there's no problem. For a very rough estimate I guess that I'll only need to worry about TLBs on other CPUs in process space about %15 of the time.

For parts of the address space that are shared by all threads (I call it "kernel space") then you have to worry about TLBs on other CPUs when unmapping a page.

My kernel rarely frees a page! All kernel pages are locked (never swapped out) and the kernel never uses memory mapped files. Dynamic data structures (for e.g. space used for thread state data, message queues, etc) normally allocate pages when needed, but don't free them when they are no longer needed - this is done to minimize CPU the overhead of allocating pages when the OS is under load. If the OS is running low on free physical RAM it will free some of the pages used by the kernel, but this is done during idle time (when there's nothing important to do) so the overhead can be discounted.

If my OS is running under a steady load (e.g. server) the kernel won't have any/much overhead from TLB flushes. If the kernel is under fluctuating load (multi-processor desktop machine?) then there's some overhead in idle time (negligable).

In addition it's a micro-kernel (device drivers, VFS, etc are processes) - it's not going to be changing pages as much as a monolithic kernel.

Summary - IMHO some of the overhead will avoidable and the remainder won't be very severe..

Cheers,

Brendan

Candy · Post by **Candy** » Mon May 03, 2004 12:25 pm

Brendan wrote: Not disabling processors is a good thing (how about an 8 way opteron motherboard ) I'm a bit dubious as to the costs though... Also "if all is well, the page shouldn't have been in use anyway" doesn't necessarily apply when you consider pages sent to swap and memory mapped files, or space used for rapidly changing message queues.

Pages sent to swap, and unmapped pages of memory mapped files were not in use recently. When they are being mapped, you first load it, then map it and then invlpg yourself. No problemo.

Firstly if a page isn't present you can just map a page and flush the TLB on the local CPU. If another processor tries to access the page it might still be 'not present' in the TLB so you'd need to try INVLPG and see if that fixes things before mapping a page. INVLPG is much quicker than messing about with IPIs so we can forget about the costs involved with mapping a page.

True, for mapping pages you don't have to signal anybody.

Next, if a page is unmapped in part of the address space that is only ever used by a single thread (my OS supports this & calls it "thread space") then you can forget about TLBs on the other CPU.

If a page is unmapped in part of the address space that is shared by multiple threads (my OS calls this "process space"), then you only have to worry about flushing TLB's on other CPUs if they are running threads that belong to the same process.

My OS doesn't support thread space, for instance. When unmapping a page you unmap it (1), you clear your own cache (2), you send messages to all processors (not only those running threads, see later) (3), and you continue doing whatever you were doing.

Theoretical situation about a race condition: You have a process A and a process B, and a process C. A uses 3 pages(0-2), B uses 3 pages (3-5). Process A is reading from all three pages on processor 1, B is reading from all three pages, C has no relevant issues other than not being A. Processor 0 switches from A to C, processor 1 sees B needing a page, unmaps page #2 from space in process A, process not active, no ipi's. Processor 0 switches back to A, still using the old TLB entry mapping page 2 of A to the page in the cache, and thereby overwrites B's data.

You want to signal all processors.

My processes (applications, drivers, etc) have a flag in the executable file header that says if it supports multi-processor. If this flag is clear the scheduler will not allow more than one thread from the process to be running. I expect that most processes will not set this flag. Also if the process only has 1 thread there's no problem. For a very rough estimate I guess that I'll only need to worry about TLBs on other CPUs in process space about %15 of the time.

That's very ugly. Multiprocessor things also occur with ALL multithreading programs, and if you don't multithread there's nothing the other processor can do about it. What's the use of the bit?

If my OS is running under a steady load (e.g. server) the kernel won't have any/much overhead from TLB flushes. If the kernel is under fluctuating load (multi-processor desktop machine?) then there's some overhead in idle time (negligable).

If it's under a steady load you'll see much processes starting, terminating, worker threads picking up tasks, buffer pages being swapped in & out, so you'll have a lot of work on your hands.

In addition it's a micro-kernel (device drivers, VFS, etc are processes) - it's not going to be changing pages as much as a monolithic kernel.

What's different between a monolithic kernel and a microkernel that makes you say this? I actually dare say you'll keep getting TLB flushes. You might be different from a traditional monolithic kernel that you don't load the code you never use in the first place. That doesn't make you any better though, all your processes are in separate pages, giving a load of overhead a monolithic kernel can beat easily. (yes, an optimized microkernel can be faster than a non-optimized monolithic kernel, pretty *duh* if you ask me). I'm still going for hybrid

OSDev.org

mapping kernel address space into user address space

mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space

Re:mapping kernel address space into user address space