mapping kernel address space into user address space
mapping kernel address space into user address space
A thought just occured to me. So we implement independant address spaces by each process having it's own page directory right? And it is common practice to hace the kernel address space mapped directly into the user space to simplify things...well when the kernel does dynamic allocation, does that mean we have to iterate through all processes and add these new pages to there memory maps as well, or is there some neat trick I am unaware of?
proxy
proxy
Re:mapping kernel address space into user address space
the neat trick is probably having all kernel space page tables permanently mapped.proxy wrote: A thought just occured to me. So we implement independant address spaces by each process having it's own page directory right? And it is common practice to hace the kernel address space mapped directly into the user space to simplify things...well when the kernel does dynamic allocation, does that mean we have to iterate through all processes and add these new pages to there memory maps as well, or is there some neat trick I am unaware of?
proxy
You could also use PAE, then you have a PD for each 1GB, and since kernel space is usually exactly 1 or 2 GB, you end up with one or two PD's you can map in each new process.
-
- Member
- Posts: 1600
- Joined: Wed Oct 18, 2006 11:59 am
- Location: Vienna/Austria
- Contact:
Re:mapping kernel address space into user address space
hmmm ... would yu suggest mapping them into an extra *kernel space page directory* too? then, the pagetables of the kernel 'd be present in each process adress space and by updating them in the kernel page directory the mappings in the processes 'd be updated automagically(tm).
The only thing: It would require a page dir switch at every allocation of page frames in kernel land. Could one consider this a fair penalty for being saved the need of crawling throu' all the processes page directories?
The only thing: It would require a page dir switch at every allocation of page frames in kernel land. Could one consider this a fair penalty for being saved the need of crawling throu' all the processes page directories?
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
BlueillusionOS iso image
Re:mapping kernel address space into user address space
No, not strictly. In PAE you have a PDPT that points to 4 PD's. In my setup the top one maps the kernel and the ones below that map the user space. When you adjust a part of the top one you automatically see that in all places where it's mapped (and I intend to use 2M pages there, so the TLB entries are not shared with user space). With SMP or NUMA you'd have to spread an IPI for TLB invalidation of that address, after wbinvd'ing the cache line. (can you come up with a more complex sentence?)beyond infinity wrote: hmmm ... would yu suggest mapping them into an extra *kernel space page directory* too? then, the pagetables of the kernel 'd be present in each process adress space and by updating them in the kernel page directory the mappings in the processes 'd be updated automagically(tm).
Each time you allocate a page it's mapped in the One True Ring - eh - page directory. You only need to wbinvd() and invlpg() it once for each processor, then every process & processor can use that page. No reloading PD or CR3 for this, it's easier.The only thing: It would require a page dir switch at every allocation of page frames in kernel land. Could one consider this a fair penalty for being saved the need of crawling throu' all the processes page directories?
Re:mapping kernel address space into user address space
hmmm ... I'm thinking complicated. There'd be no need for that kernel page directory if I think correctly.
What, if enough page tables were present in kernel space? You enter a page frame and zapzarapp it pops up in every process adress space - for the kernel page tables are shared per se. you'd only need to update process page directories for page tables of the kernel.
Or am I talking rubbish and now near to becoming a fully qualified dodderer?
ah ... complex sentences: nay, no need for that. Let's keep 'em as simple as possible but not simpler
PAE: thats this 36 bit stuff isn't it? I'd need to research about this.
What, if enough page tables were present in kernel space? You enter a page frame and zapzarapp it pops up in every process adress space - for the kernel page tables are shared per se. you'd only need to update process page directories for page tables of the kernel.
Or am I talking rubbish and now near to becoming a fully qualified dodderer?
ah ... complex sentences: nay, no need for that. Let's keep 'em as simple as possible but not simpler
PAE: thats this 36 bit stuff isn't it? I'd need to research about this.
Re:mapping kernel address space into user address space
True, in case of non-pae.BI lazy wrote: hmmm ... I'm thinking complicated. There'd be no need for that kernel page directory if I think correctly.
Yes, but.What, if enough page tables were present in kernel space? You enter a page frame and zapzarapp it pops up in every process adress space - for the kernel page tables are shared per se. you'd only need to update process page directories for page tables of the kernel.
(there should be no need for explanation, but in case it's not clear, here it is anyway): you still need invlpg, wbinvd and possible ipi's.
That's the way to do NX-protection on 64-bit amd's (not the 64-bit iX86-64's, they're too stupid to realise the usefulness). Also, it's required for 64-bit on amd . And, it allows you to use a policy for the top 1G, removing the need for 64 page tables (and 256k) being in use statically so you can use an easy top level map, it supports any amount of memory up to 4PB physical on 64-bit, 64GB physical on 32-bit, you have something like 14 available bits instead of 3, only improvements IMO . And, you can still pretend the PDPT is the same as the top PD is the same as the top PT is the same as the top page. Same goes for AMD64, but then you get another level, PML4, above it. Still same principle though, see my pager code for an overly obvious implementation (map.c and unmap.c).PAE: thats this 36 bit stuff isn't it? I'd need to research about this.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:mapping kernel address space into user address space
i don't want to be 'flaming', but isn't a bit restrictive to ask people to have AMD64 cpu just in order to run multiple address space ? ... Are there major concerns that would prevent the simple "have kernel-tables mapped in all directories" to work ?
Okay, i we have multiprocessors, removing/changing an entry in the kernel area will involve a signal to be sent to other processors so that they no longer rely on the TLB for that virtual address. but that's all i can see, and i'm not sure you can bypass this anyway...
So imho
would be nice. When you want to create a user directory, you will have
Okay, i we have multiprocessors, removing/changing an entry in the kernel area will involve a signal to be sent to other processors so that they no longer rely on the TLB for that virtual address. but that's all i can see, and i'm not sure you can bypass this anyway...
So imho
Code: Select all
pgEntry *KernelReference[]={
[DIRECTORY] : PDE(pdbr,SYSTEM_ONLY|PRESENT),
[KSTATIC] : PDE(kstatic_table,SYSTEM_ONLY|PRESENT),
[KDYNAMIC] : PDE(kernel_heap_table,SYSTEM_ONLY|PRESENT),
};
Code: Select all
ud=newTable();
UserDirectory = TemporaryMapped();
UserDirectory[DIRECTORY] = PDE(ud,SYSTEM_ONLY|PRESENT);
UserDirectory[KSTATIC] = KernelReference[KSTATIC];
UserDirectory[KDYNAMIC] = KernelReference[KDYNAMIC];
Re:mapping kernel address space into user address space
You're not flaming. You're misunderstanding.Pype.Clicker wrote: i don't want to be 'flaming', but isn't a bit restrictive to ask people to have AMD64 cpu just in order to run multiple address space ? ... Are there major concerns that would prevent the simple "have kernel-tables mapped in all directories" to work ?
To use all those features I just named, you need an AMD64. For all those, but without the NX bit, you need a ppro or higher (that is, ppro, p2, p3, p4, derivates of that, k6-2 (afaik), athlon, duron, higher, things like that. Probably efficeons too.
And even then, we also illustrated a method that works on 386+es, in a similar fashion, including drawbacks.
Imo:Okay, i we have multiprocessors, removing/changing an entry in the kernel area will involve a signal to be sent to other processors so that they no longer rely on the TLB for that virtual address. but that's all i can see, and i'm not sure you can bypass this anyway...
Code: Select all
// Mapping a kernel page
pager_map_page(virt, phys, flags);
apic_ipi(IPI_INVALIDATE, virt);
Code: Select all
void create_process_space() {
// creating a new process
int virt = masa_alloc(1);
struct pte *ptes = (struct pte *)virt << 12;
int phys=fp_get_zeroed_page()
pager_map_page(virt, phys, flags);
#ifdef PAE
ptes[0x1FF].p = 1;
...
ptes[0x1FF].addr = phys;
#else
ptes[0x3FF].p = 1;
...
ptes[0x3FF].addr = phys;
#endif
pager_unmap_page(virt);
return phys;
}
PS, just came up with another thought about why in 32-bit PAE you need kernel space to be from 2-3G, but I'm going to work that out first . Just assume my kernel screwes up in pae mode.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:mapping kernel address space into user address space
doh. That's even worse ... i really need to get a tutorial about that "physical addresses extensions" feature ...You're not flaming, you're misunderstanding ...
Re:mapping kernel address space into user address space
Hi,
If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.
Cheers,
Brendan
If your kernel screws up it's not likely to matter where it is ::)Candy wrote: PS, just came up with another thought about why in 32-bit PAE you need kernel space to be from 2-3G, but I'm going to work that out first . Just assume my kernel screwes up in pae mode.
If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:mapping kernel address space into user address space
Well, yes it does in this case. The bug is that mapping a page does work with the algo in 32-bit nonpae, it does NOT work properly in 32-bit pae, but it does work again in 64-bit pae. The point is that the 32-bit pae second level table does not hold only the page tables, but a lot more. Updating them is not going to work nicely, so it bugs out with some mappings and multiple address spaces. Hard bug to find in practical tests, but easy in theory.Brendan wrote:If your kernel screws up it's not likely to matter where it is ::)Candy wrote: Just assume my kernel screwes up in pae mode.
Why do you say that? My idea about syscalls is that the users executes a =SYSCALL= instruction, after which he is transferred automatically to the kernel running at that moment. Using a sysret he then returns to the normal code, running in 32-bit mode again. There's no difference whatsoever between 32- and 64-bit mode for me.If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.
The only thing worth watching is the uasa, user address space allocator, that it doesn't hand out 64-bit addresses for a 32-bit program . Am considering making two different ones at different syscalls...
Re:mapping kernel address space into user address space
Hi,
BTW - why do you say you need to use WBINVD each time you insert a page into the linear address space? I never have, and the caches use ranges in the physical address space (and are kept up to date on multi-processor systems via. cache snooping).
Intel's recommended method is (system programmers manual, section 7.3):
1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing
Cheers,
Brendan
Hmm - must've been an air bubble passing through the arteries in my brain at the time There are minor reasons that are easy to avoid.Candy wrote:Why do you say that?If you map the kernel from 0 - 1Gb it will be easier to write a compatible 64 bit kernel that will run the same 32 bit software as your 32 bit OS.
BTW - why do you say you need to use WBINVD each time you insert a page into the linear address space? I never have, and the caches use ranges in the physical address space (and are kept up to date on multi-processor systems via. cache snooping).
Intel's recommended method is (system programmers manual, section 7.3):
1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:mapping kernel address space into user address space
You're not the only one with bubbles through the arteriesBrendan wrote: Hmm - must've been an air bubble passing through the arteries in my brain at the time There are minor reasons that are easy to avoid.
BTW - why do you say you need to use WBINVD each time you insert a page into the linear address space? I never have, and the caches use ranges in the physical address space (and are kept up to date on multi-processor systems via. cache snooping).
I don't like abysmal performance on 4-way computers, so I really don't like disabling all processors. My idea is that you don't disable all processors, for kernel space you unmap the page / map a different page (if all is well, the page shouldn't have been in use anyway, and the pager code is mutexed per page) and don't mind about other processors, in user space you can alter the page / unmap it / anything like that without notifying the other processors, until you pass them an invlpg for that address using some sort of directed IPI, handled by a pure-asm very fast invlpg routine that should fit in a cache line. Seems faster to me...Intel's recommended method is (system programmers manual, section 7.3):
1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing
Re:mapping kernel address space into user address space
Hi,
Firstly if a page isn't present you can just map a page and flush the TLB on the local CPU. If another processor tries to access the page it might still be 'not present' in the TLB so you'd need to try INVLPG and see if that fixes things before mapping a page. INVLPG is much quicker than messing about with IPIs so we can forget about the costs involved with mapping a page.
Next, if a page is unmapped in part of the address space that is only ever used by a single thread (my OS supports this & calls it "thread space") then you can forget about TLBs on the other CPU.
If a page is unmapped in part of the address space that is shared by multiple threads (my OS calls this "process space"), then you only have to worry about flushing TLB's on other CPUs if they are running threads that belong to the same process.
My processes (applications, drivers, etc) have a flag in the executable file header that says if it supports multi-processor. If this flag is clear the scheduler will not allow more than one thread from the process to be running. I expect that most processes will not set this flag. Also if the process only has 1 thread there's no problem. For a very rough estimate I guess that I'll only need to worry about TLBs on other CPUs in process space about %15 of the time.
For parts of the address space that are shared by all threads (I call it "kernel space") then you have to worry about TLBs on other CPUs when unmapping a page.
My kernel rarely frees a page! All kernel pages are locked (never swapped out) and the kernel never uses memory mapped files. Dynamic data structures (for e.g. space used for thread state data, message queues, etc) normally allocate pages when needed, but don't free them when they are no longer needed - this is done to minimize CPU the overhead of allocating pages when the OS is under load. If the OS is running low on free physical RAM it will free some of the pages used by the kernel, but this is done during idle time (when there's nothing important to do) so the overhead can be discounted.
If my OS is running under a steady load (e.g. server) the kernel won't have any/much overhead from TLB flushes. If the kernel is under fluctuating load (multi-processor desktop machine?) then there's some overhead in idle time (negligable).
In addition it's a micro-kernel (device drivers, VFS, etc are processes) - it's not going to be changing pages as much as a monolithic kernel.
Summary - IMHO some of the overhead will avoidable and the remainder won't be very severe..
Cheers,
Brendan
Not disabling processors is a good thing (how about an 8 way opteron motherboard ) I'm a bit dubious as to the costs though... Also "if all is well, the page shouldn't have been in use anyway" doesn't necessarily apply when you consider pages sent to swap and memory mapped files, or space used for rapidly changing message queues.Candy wrote:I don't like abysmal performance on 4-way computers, so I really don't like disabling all processors. My idea is that you don't disable all processors, for kernel space you unmap the page / map a different page (if all is well, the page shouldn't have been in use anyway, and the pager code is mutexed per page) and don't mind about other processors, in user space you can alter the page / unmap it / anything like that without notifying the other processors, until you pass them an invlpg for that address using some sort of directed IPI, handled by a pure-asm very fast invlpg routine that should fit in a cache line. Seems faster to me...Intel's recommended method is (system programmers manual, section 7.3):
1. Stop all other processors
2. Change the paging data
3. Invalidate TLBs on all CPUs
4. Resume processing
Firstly if a page isn't present you can just map a page and flush the TLB on the local CPU. If another processor tries to access the page it might still be 'not present' in the TLB so you'd need to try INVLPG and see if that fixes things before mapping a page. INVLPG is much quicker than messing about with IPIs so we can forget about the costs involved with mapping a page.
Next, if a page is unmapped in part of the address space that is only ever used by a single thread (my OS supports this & calls it "thread space") then you can forget about TLBs on the other CPU.
If a page is unmapped in part of the address space that is shared by multiple threads (my OS calls this "process space"), then you only have to worry about flushing TLB's on other CPUs if they are running threads that belong to the same process.
My processes (applications, drivers, etc) have a flag in the executable file header that says if it supports multi-processor. If this flag is clear the scheduler will not allow more than one thread from the process to be running. I expect that most processes will not set this flag. Also if the process only has 1 thread there's no problem. For a very rough estimate I guess that I'll only need to worry about TLBs on other CPUs in process space about %15 of the time.
For parts of the address space that are shared by all threads (I call it "kernel space") then you have to worry about TLBs on other CPUs when unmapping a page.
My kernel rarely frees a page! All kernel pages are locked (never swapped out) and the kernel never uses memory mapped files. Dynamic data structures (for e.g. space used for thread state data, message queues, etc) normally allocate pages when needed, but don't free them when they are no longer needed - this is done to minimize CPU the overhead of allocating pages when the OS is under load. If the OS is running low on free physical RAM it will free some of the pages used by the kernel, but this is done during idle time (when there's nothing important to do) so the overhead can be discounted.
If my OS is running under a steady load (e.g. server) the kernel won't have any/much overhead from TLB flushes. If the kernel is under fluctuating load (multi-processor desktop machine?) then there's some overhead in idle time (negligable).
In addition it's a micro-kernel (device drivers, VFS, etc are processes) - it's not going to be changing pages as much as a monolithic kernel.
Summary - IMHO some of the overhead will avoidable and the remainder won't be very severe..
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:mapping kernel address space into user address space
Pages sent to swap, and unmapped pages of memory mapped files were not in use recently. When they are being mapped, you first load it, then map it and then invlpg yourself. No problemo.Brendan wrote: Not disabling processors is a good thing (how about an 8 way opteron motherboard ) I'm a bit dubious as to the costs though... Also "if all is well, the page shouldn't have been in use anyway" doesn't necessarily apply when you consider pages sent to swap and memory mapped files, or space used for rapidly changing message queues.
True, for mapping pages you don't have to signal anybody.Firstly if a page isn't present you can just map a page and flush the TLB on the local CPU. If another processor tries to access the page it might still be 'not present' in the TLB so you'd need to try INVLPG and see if that fixes things before mapping a page. INVLPG is much quicker than messing about with IPIs so we can forget about the costs involved with mapping a page.
My OS doesn't support thread space, for instance. When unmapping a page you unmap it (1), you clear your own cache (2), you send messages to all processors (not only those running threads, see later) (3), and you continue doing whatever you were doing.Next, if a page is unmapped in part of the address space that is only ever used by a single thread (my OS supports this & calls it "thread space") then you can forget about TLBs on the other CPU.
If a page is unmapped in part of the address space that is shared by multiple threads (my OS calls this "process space"), then you only have to worry about flushing TLB's on other CPUs if they are running threads that belong to the same process.
Theoretical situation about a race condition: You have a process A and a process B, and a process C. A uses 3 pages(0-2), B uses 3 pages (3-5). Process A is reading from all three pages on processor 1, B is reading from all three pages, C has no relevant issues other than not being A. Processor 0 switches from A to C, processor 1 sees B needing a page, unmaps page #2 from space in process A, process not active, no ipi's. Processor 0 switches back to A, still using the old TLB entry mapping page 2 of A to the page in the cache, and thereby overwrites B's data.
You want to signal all processors.
That's very ugly. Multiprocessor things also occur with ALL multithreading programs, and if you don't multithread there's nothing the other processor can do about it. What's the use of the bit?My processes (applications, drivers, etc) have a flag in the executable file header that says if it supports multi-processor. If this flag is clear the scheduler will not allow more than one thread from the process to be running. I expect that most processes will not set this flag. Also if the process only has 1 thread there's no problem. For a very rough estimate I guess that I'll only need to worry about TLBs on other CPUs in process space about %15 of the time.
If it's under a steady load you'll see much processes starting, terminating, worker threads picking up tasks, buffer pages being swapped in & out, so you'll have a lot of work on your hands.If my OS is running under a steady load (e.g. server) the kernel won't have any/much overhead from TLB flushes. If the kernel is under fluctuating load (multi-processor desktop machine?) then there's some overhead in idle time (negligable).
What's different between a monolithic kernel and a microkernel that makes you say this? I actually dare say you'll keep getting TLB flushes. You might be different from a traditional monolithic kernel that you don't load the code you never use in the first place. That doesn't make you any better though, all your processes are in separate pages, giving a load of overhead a monolithic kernel can beat easily. (yes, an optimized microkernel can be faster than a non-optimized monolithic kernel, pretty *duh* if you ask me). I'm still going for hybridIn addition it's a micro-kernel (device drivers, VFS, etc are processes) - it's not going to be changing pages as much as a monolithic kernel.