Page 1 of 2
Virtual mm solutions
Posted: Thu Mar 02, 2006 12:55 pm
by rootel77
i'm actually learning about virtual memory management, and i'm looking at various solutions to the following problem:
- suppose we have paging activated. to map virtual pages to pages frames, the kernel must be able to access page directory and pages tables. and to be accessible, those elements must be somewhere in the kernel virtual adress space.
there are solutions i've found:
- 1st solution: maintain some relation between phys. adress and virt. adress, this is the linux way: by identity mapping - with or sin offset translation - the entire physical memory. the kernel can access easily the physical memory and thus modify pages tables. however, the physical memory that can be mapped directly is limited by the kernel virt. adress space (minus virt space for other purposes like mem mapped devices). and the kernel cant access directly the physical mem above that limit. the second problem is since the physical memory is identity mapped. we cant make continuous virtual regions by using fragmented phys. pages frames. so we have to implement some continuous space allocation in the phys. alocator.
- the 2nd solution i've found, is to mantains some relations between page directory virt adress and pages tables virt adresses. the most common solution is "the mirroring" ie place all pages tables and page directory in the some 4mb window. and treat the page dir as the page table for its page frame and all other pages tables'frames. this solution permits mapping non continuous phys. space to continous virt page. but the "hic" is that when a kernel mode thread is running inside a vir adr. space of an user process and this thread modifies the kernel page entries in the page dir. page dirs of all other processes must be synchronized.
all soluttions i've found implements either the 1st or the 2nd solution. and i like to know if someone knows another solution.
Re:Virtual mm solutions
Posted: Thu Mar 02, 2006 1:22 pm
by FlashBurn
There is another method, self-mapping. You use the first 3 GiB for userspace and the last GiB for kernel space. Then you use, eg. the last 4MiB for the page tables and another 4KiB address for the page directory, so that you can access the page tables from kernel space of the actual running process.
To map for example the first 4KiB page in the actual address space you do the following:
-> write the physical address of the page table into the pd(0)
-> write the physical address of the page table into the pd(1023) pt(0)
-> now you write the physical address of the page into the first 4Byte of the last 4MiB
I hope I could explain it so that you can understand it. If not ask what you haven?t understood.
Its too late to go more in detail!
Re:Virtual mm solutions
Posted: Thu Mar 02, 2006 1:36 pm
by rootel77
i think this falls into the 2nd solution, especially the mirroring, the 4mb window used for mapping pages tables is determined by the index you choose for the page dir entry to map itself. if you choose the index 0, you must use the 1st 4mb window and the page dir 0 map the page dir itself. if you choose the last index, then you must place the 4mb window in the top of the 4gb and the page dir must be the last "page table" in that window.
Re:Virtual mm solutions
Posted: Thu Mar 02, 2006 10:38 pm
by Brendan
Hi,
rootel77 wrote:all soluttions i've found implements either the 1st or the 2nd solution. and i like to know if someone knows another solution.
There is at least one other method..
3) Temporarily map what you need into kernel space, make changes and then unmap it. For example, if you want to add a page to a page table you'd get CR3, map the 4 KB containing the page directory somewhere, find the physical address of the page table, map the page table at the same place, then add your new page to it and unmap the page table.
This is complicated and slow, but allows the OS to change address spaces that aren't currently being used (where a simple implementation of method 2 doesn't).
For example, imagine you need to add a new page table to kernel space. The new page table needs to be put into every address space because the kernel is in every address space. For method 2 (where the page directory is inserted as a page directory entry for example) this doesn't work because there's no way to access the page directories for other address spaces.
The tricks used to get around this problem depend on the type of paging being used.
For "32 bit paging", I put the page directory as the highest page directory entry, so that the highest 4 MB becomes a page table mapping. Then I also map every page directory into kernel space as a normal page. This creates a table of page directory contents (4 KB per address space), so that I can add a new page table to every page directory without problems.
For PAE it's different because there's 4 seperate page directories (CR3 points to a PDPT or "page directory pointer table" with 4 entries). This actually makes things much easier - I have one "kernel page directory" that is used for all address spaces (i.e. put into every page directory pointer table). Because of this there is no problem with adding kernel page tables. Here, I map the kernel's page directory as the highest kernel page directory entry (creating a 2 MB kernel page table map at the end of address space), and then map the user level page directories into the highest user level page directory entries (creating a 6 MB user level page table mapping from 0xBFA00000 to 0xBFFFFFFF).
For long mode there's multiple page directory pointer tables and a "PML4" (page map level 4 table), where CR3 contains the address of the PML4. In this case I set the highest PML4 entry to the address of the PML4, which creates a 512 GB page table mapping for the current address space. The problem here is adding a new page directory pointer table in kernel space (it's the same problem as adding a new page table in kernel space with 32 bit paging). Here I ignore the problem and hope that the kernel will never need more than one page directory pointer table. This restricts the kernel to 512 GB (it could use up to 128 TB if I allowed it to add new kernel PDPTs), which IMHO is large enough.
Of course I've simplified a lot here - for my OS design address spaces are actually split into 4 parts (process space, thread space, shared kernel space and domain specific kernel space) which creates more problems when using method 2.
Still I prefer method 2, as method 1 breaks when there isn't enough kernel space to use and method 3 is too slow.
Cheers,
Brendan
Re:Virtual mm solutions
Posted: Fri Mar 03, 2006 3:40 am
by rootel77
This is complicated and slow, but allows the OS to change address spaces that aren't currently being used (where a simple implementation of method 2 doesn't).
i assume if you have a page dir, you have already allocated it by some method, and probably have a pointer to it. this applies to all others page dirs allocated. so i dont see any problem to change adress space that are not currently used. a simple loop over a linked list will do.
For PAE it's different because there's 4 seperate page directories (CR3 points to a PDPT or "page directory pointer table" with 4 entries). This actually makes things much easier - I have one "kernel page directory" that is used for all address spaces (i.e. put into every page directory pointer table). Because of this there is no problem with adding kernel page tables. Here, I map the kernel's page directory as the highest kernel page directory entry (creating a 2 MB kernel page table map at the end of address space), and then map the user level page directories into the highest user level page directory entries (creating a 6 MB user level page table mapping from 0xBFA00000 to 0xBFFFFFFF).
For long mode there's multiple page directory pointer tables and a "PML4" (page map level 4 table), where CR3 contains the address of the PML4. In this case I set the highest PML4 entry to the address of the PML4, which creates a 512 GB page table mapping for the current address space. The problem here is adding a new page directory pointer table in kernel space (it's the same problem as adding a new page table in kernel space with 32 bit paging). Here I ignore the problem and hope that the kernel will never need more than one page directory pointer table. This restricts the kernel to 512 GB (it could use up to 128 TB if I allowed it to add new kernel PDPTs), which IMHO is large enough.
certainly, this make things a lot easier. in PAE mode, i still prefer the method 2. however in "long mode" where the kernel can map a large space for itself, i would opt for the linux way, ie identity map the whole physical memory to simplify access and implements some continuous memory alocation policy.
Re:Virtual mm solutions
Posted: Fri Mar 03, 2006 6:39 am
by Brendan
Hi,
rootel77 wrote:This is complicated and slow, but allows the OS to change address spaces that aren't currently being used (where a simple implementation of method 2 doesn't).
i assume if you have a page dir, you have already allocated it by some method, and probably have a pointer to it. this applies to all others page dirs allocated. so i dont see any problem to change adress space that are not currently used. a simple loop over a linked list will do.
If you mean changing CR3 to each address space that needs to be modified, then yes this would work. Unfortunately it'd also wipe out the TLB (and might cause problems for interrupt/IRQ handlers if they assume that the current address space belongs to the currently running process, unless you disable interrupts).
rootel77 wrote:certainly, this make things a lot easier. in PAE mode, i still prefer the method 2. however in "long mode" where the kernel can map a large space for itself, i would opt for the linux way, ie identity map the whole physical memory to simplify access and implements some continuous memory alocation policy.
This is what people thought when 32 bit protected mode was introduced. It didn't take long (relatively speaking) for the amount of RAM to exceed the amount of linear space the kernel could use. For me, 512 GB of kernel space sounds huge at the moment, but if you assume that RAM sizes will double every 2 years (and know that you can currently buy
"off the shelf" servers with up to 32 GB of RAM) then you can estimate that 512 GB of kernel space will become a problem in less than ten years (or in about the same amount of time that it'll probably take for my OS to become usefull).
Of course for the "architectural maximum kernel space size" of 128 TB it will take much longer - about 25 years before the kernel's linear address space is inadaquate for mapping all of physical RAM. Still, I doubt I'd like the idea of rewriting my kernel's code in 25 years time...
To be honest, part of my reasoning is that I'm doing 3 different kernels in parallel (writing a plain 32 bit kernel, a PAE kernel and the long mode kernel at the same time). It's beneficial for me to make them as similar as possible - it's hard enough to keep track of them already.
Cheers,
Brendan
Re:Virtual mm solutions
Posted: Fri Mar 03, 2006 9:45 am
by rootel77
If you mean changing CR3 to each address space that needs to be modified, then yes this would work. Unfortunately it'd also wipe out the TLB (and might cause problems for interrupt/IRQ handlers if they assume that the current address space belongs to the currently running process, unless you disable interrupts).
i dont see what we must change the the CR3 each time we have to modify a page dir other than the current. this is an example extracted from a french os tutorial SOS (
http://sos.enix.org/fr/PagePrincipale) that loop over all user's page dirs to synchronize them
Code: Select all
sos_ret_t sos_mm_context_synch_kernel_PDE(unsigned int index_in_pd,
sos_ui32_t pde)
{
sos_ui32_t flags;
struct sos_mm_context * dest_mm_context;
int nb_mm_contexts;
sos_disable_IRQs(flags);
list_foreach_forward(list_mm_context, dest_mm_context, nb_mm_contexts)
{
sos_ui32_t * dest_pd;
SOS_ASSERT_FATAL(dest_mm_context->ref_cnt > 0);
dest_pd = (sos_ui32_t*) dest_mm_context->vaddr_PD;
dest_pd[index_in_pd] = pde;
}
sos_restore_IRQs(flags);
return SOS_OK;
}
Re:Virtual mm solutions
Posted: Fri Mar 03, 2006 11:17 am
by Brendan
Hi,
For my original comment:
"
This is complicated and slow, but allows the OS to change address spaces that aren't currently being used (where a simple implementation of method 2 doesn't)."
By the words "simple implementation of method 2" I meant putting the page directory as the highest page directory entry without any other than mappings (and then I went on to explain how the problem of changing other address spaces is usually dealt with, when using method 2).
For SoS, based on this line:
Code: Select all
dest_pd = (sos_ui32_t*) dest_mm_context->vaddr_PD;
As I read it, the source code line literally means "dest_pd = a pointer to the virtual address of the page directory for the destination memory manager context", and SOS uses "method 2", and does exactly what I described earlier:
"
For "32 bit paging", I put the page directory as the highest page directory entry, so that the highest 4 MB becomes a page table mapping. Then I also map every page directory into kernel space as a normal page. This creates a table of page directory contents (4 KB per address space), so that I can add a new page table to every page directory without problems."
i assume if you have a page dir, you have already allocated it by some method, and probably have a pointer to it. this applies to all others page dirs allocated. so i dont see any problem to change adress space that are not currently used. a simple loop over a linked list will do.
If you mean changing CR3 to each address space that needs to be modified, then yes this would work. Unfortunately it'd also wipe out the TLB (and might cause problems for interrupt/IRQ handlers if they assume that the current address space belongs to the currently running process, unless you disable interrupts).
i dont see what we must change the the CR3 each time we have to modify a page dir other than the current. this is an example extracted from a french os tutorial SOS (
http://sos.enix.org/fr/PagePrincipale) that loop over all user's page dirs to synchronize them
The conversation above may have confused me - were you saying "For method 3
i assume if you have a page dir, you have already allocated it by some method, and probably have a pointer to it.", or alternatively were you saying "For method 2
i assume if you have a page dir, you have already allocated it by some method, and probably have a pointer to it."?
If you were refering to method 3, then the simple answer is that the pointer to the page directory is a physical address not a virtual/linear address, and therefore is useless
unless you use the physical address to map the page directory into the current address space.
If you were refering to method 2, then we've gone around in a circle
- you'd use the physical address to map the page directory into the "table of page directories" that is in kernel space (in every address space).
Of course for "method 1" the physical address of the page directory is not useless - you can add a constant to it to find where the physical page was mapped in kernel space. This might be what SoS is doing (I may have been wrong above) - " dest_mm_context->vaddr_PD" might be equal to "dest_mm_context->paddr_PD + some_constant", but for some unknown reason I doubt this (I haven't downloaded their source and checked though).
Cheers,
Brendan
Re:Virtual mm solutions
Posted: Fri Mar 03, 2006 11:46 am
by rootel77
Of course for "method 1" the physical address of the page directory is not useless - you can add a constant to it to find where the physical page was mapped in kernel space. This might be what SoS is doing (I may have been wrong above) - " dest_mm_context->vaddr_PD" might be equal to "dest_mm_context->paddr_PD + some_constant", but for some unknown reason I doubt this (I haven't downloaded their source and checked though).
no, you are entirely right, SOS use method 2. thus it must synchronize modified kernel entreis in each page dir.
Re:Virtual mm solutions
Posted: Sun Mar 05, 2006 10:43 am
by JAAman
mapping all physical memory into kernel-space won't work once you start paging out to disk -- then the page tables for less commonly used processes may not be in memory
it is quite rare that you will need to update more than a single address-space (actually i cannot think of any reason) -- the destination of a shared memory (or passed, but not shared) request (if passing, then the current address-space would also need updateing -- but thats easy)
what if you mark the transaction in a table in global space, then flag the destination process in the task handler to enter kernel mode first -- that way it can be handled before the task is reentered at which point there will be a CR3 change anyway, so TLB invalidation wont cost anything extra, yet you are only changing it within the address space? -- this would solve all problems -- including the possibility that the page tables are not even in physical memory (taken a step further, if they aren't in memory, you could wait to update until they are referenced -- at which time a page fault would occure, and the entry could be updated at that time)
Re:Virtual mm solutions
Posted: Sun Mar 05, 2006 1:22 pm
by Brendan
Hi,
JAAman wrote:mapping all physical memory into kernel-space won't work once you start paging out to disk -- then the page tables for less commonly used processes may not be in memory
In this case you'd need to read the page table back into memory (and update all page directories that it's being used in) when you modify it's page table entries. I don't see why this wouldn't work (although I'd try to avoid it for performance reasons - e.g. don't swap shared page tables out to begin with).
JAAman wrote:it is quite rare that you will need to update more than a single address-space (actually i cannot think of any reason) -- the destination of a shared memory (or passed, but not shared) request (if passing, then the current address-space would also need updateing -- but thats easy)
For most OS's, there's 1 GB or more of "kernel space" that is mapped into every address space. For example, for my OS, if the kernel needs another page table and it's running 1000 processes, then it needs to allocate a physical page and then put it into 1000 address spaces at the same time (while other CPUs may be running in some of those other address spaces).
Depending on the OS, shared memory, page swapping, "copy on write" process forking and memory mapped files (combined with file caches) may be other reasons for it.
JAAman wrote:what if you mark the transaction in a table in global space, then flag the destination process in the task handler to enter kernel mode first -- that way it can be handled before the task is reentered at which point there will be a CR3 change anyway, so TLB invalidation wont cost anything extra, yet you are only changing it within the address space? -- this would solve all problems -- including the possibility that the page tables are not even in physical memory (taken a step further, if they aren't in memory, you could wait to update until they are referenced -- at which time a page fault would occure, and the entry could be updated at that time)
That would work, but there's some things you'd need to take care of. For example, if the target is sleeping and doesn't get any CPU time for an hour then you'd need to keep a "queue of address space changes" in case there's a lot of changes it needs to make when it does get CPU time. Because different processes would get the CPU at different times, you'd need to have a seperate "queue of address space changes" for each process.
To prevent these queues from becoming huge, you'd need to check if a new change cancels out another queued change (you couldn't just add the new change to the end of the queue). For example, if a shared page table is allocated and then freed, you'd want to check the queue to see if the "allocate" is still on it and then remove that "allocate" from the queue instead of putting the "free" on the queue. Otherwise, if a process doesn't get CPU time for a very long time you'd need a very long queue.
Also, when the process does get CPU time you'd need to switch to that process (load CR3), process it's queue of address space changes and then invalidate the TLB again before that process runs any code that relies on those changes. This could become a problem for things like IRQ handlers.
For example, for a monolithic kernel, imagine you've got a serial port driver and you want to send 64 KB of data to the serial port. You might allocate a page (and page table) in kernel space to put the serial port data in, and then whenever the serial port generates a "transmit buffer empty" IRQ you'd send then next byte/s of data. If the CPU switches to another process then this IRQ may happen before the data is mapped into the address space. For a microkernel that uses message queues, you'd have the same sort of problem with data used for the message queues. Disabling IRQs until the address space has been updated would fix this (and increase IRQ latency).
Cheers,
Brendan
Re:Virtual mm solutions
Posted: Sun Mar 05, 2006 9:57 pm
by proxy
I was thinking about your queue solution and i feel that it can be replaced entirely with a static sized structure.
Basically, you could have a global object which is a copy of the kernel portion of the page directory. Every time you add/remove a kernel page table, you mirror the write to this object. combine this with a timestamp meaning "last updated".
on any irq, you test the time stamp for the current address space vs the time stamp for your copy, if they are out of sync, update the address space and it's time stamp.
Since the copy would only reflect the most recent state of the page directory it should remove the need for a queue entirely (only most recent is relavent anyway right?). I also figure testing a few vars shouldn't be more expensive time wise than testing if a queue is empty.
Perhaps for performance, you could omit areas known to not change or have a few timestamps for variable sized regions.
proxy
Re:Virtual mm solutions
Posted: Mon Mar 06, 2006 6:32 am
by rootel77
@JAAman: well, if you choose method 1, and user's pagedirs and page tables are mapped into the kernel space (ie for the kernel they are just normal data structures with virtual adresses). it is not a good practice to swap them out. if you are low in memory it is more natural to choose pages from user process to swap them out.
Re:Virtual mm solutions
Posted: Mon Mar 06, 2006 7:26 am
by FlashBurn
I don?t know if this was already said, but you could map the 253 entries into the pd as pts and so you have all pts and don?t have to update the entries everytime you page in a new page. Because you use the same pts for the kernel space for every process. I think 1MiB is not much in todays pcs.
Re:Virtual mm solutions
Posted: Mon Mar 06, 2006 9:06 am
by JAAman
For most OS's, there's 1 GB or more of "kernel space" that is mapped into every address space. For example, for my OS, if the kernel needs another page table and it's running 1000 processes, then it needs to allocate a physical page and then put it into 1000 address spaces at the same time (while other CPUs may be running in some of those other address spaces).
you should
never have to update globally-shared page-tables, as these will be the same in
all address spaces, you just point all address spaces to the same actual tables, and all address spaces will be updated when you update the current address space
rootel77 wrote:
@JAAman: well, if you choose method 1, and user's pagedirs and page tables are mapped into the kernel space (ie for the kernel they are just normal data structures with virtual adresses). it is not a good practice to swap them out. if you are low in memory it is more natural to choose pages from user process to swap them out.
but then, of course, you are limited as to the max number of processes -- i prefer not to have this limit (even if you have a full 4GB usable RAM, you are limited to ~1000 processes (less actually, because other tables and code that must be in memory at all times, my more conservative 256MB, would only support 64 processes, and forget about running at all on less than 64MB -- which is pretty close the the MAX possible on most P5 boards, and 8MB is the most you can have on most 386/486 systems -- which i do intend to support)