"kernel page directory" in sync with "user pa

samueldotj · Post by **samueldotj** » Wed Jun 04, 2008 2:48 am

Hi,

How kernel page directory can be made in sync with user page directory?

To be clear,
1) Kernel page directory contains the initial page directory entries.
2) Task 1 started after copying kernel page directory.
3) Task 1 -> sleep()
4) Some kernel_thread_x() allocated 10MB for some reason, so memory manager created new entry in the kernel page directory.
5) Task 1 -> wakeup()
6) Task 1-> Some sys_call() -> switches to kernel mode and tries to access the memory(VA) created by kernel_thread_x(). This will result in page fault because Task 1â€™s page directory doesnâ€™t have entry for that VA.

The simplest solution is to copy the kernel page directory during page fault if the VA is in kernel address range. But is there any other way to solve it?

How this is handled in your OS?

How Linux handles it? Which source file/function handles kernel page fault in Linux?

Thanks
Sam

JamesM · Post by **JamesM** » Wed Jun 04, 2008 3:23 am

samueldotj wrote:Hi,

How kernel page directory can be made in sync with user page directory?

To be clear,
1) Kernel page directory contains the initial page directory entries.
2) Task 1 started after copying kernel page directory.
3) Task 1 -> sleep()
4) Some kernel_thread_x() allocated 10MB for some reason, so memory manager created new entry in the kernel page directory.
5) Task 1 -> wakeup()
6) Task 1-> Some sys_call() -> switches to kernel mode and tries to access the memory(VA) created by kernel_thread_x(). This will result in page fault because Task 1â€™s page directory doesnâ€™t have entry for that VA.

The simplest solution is to copy the kernel page directory during page fault if the VA is in kernel address range. But is there any other way to solve it?

How this is handled in your OS?

How Linux handles it? Which source file/function handles kernel page fault in Linux?

Thanks
Sam

Hi,

A normal way of handling this is to force the creation of page tables for all of kernel space.

For example - say your kernel code, data, heap, modules etc all reside at the address range 0xC0000000..0xFFFFFFFF. When you initialise paging you'd create page tables to cover this entire range.

Note that you don't need to create page table entries - just the tables themselves. This shouldn't take up too much extra RAM, and it means that when you create a new process you can reuse the same page tables for the area 0xC0000000..0xFFFFFFFF - so any changes that occur in one virtual address space are effective in any other virtual address space too (but only for 0xC0000000..0xFFFFFFFF - the rest of the address space is independent as normal).

Hope this helps,

James

samueldotj · Post by **samueldotj** » Wed Jun 04, 2008 4:05 am

JamesM wrote: Hi,

A normal way of handling this is to force the creation of page tables for all of kernel space.

For example - say your kernel code, data, heap, modules etc all reside at the address range 0xC0000000..0xFFFFFFFF. When you initialise paging you'd create page tables to cover this entire range.

Hope this helps,

James

Allocating kernel page tables at the beginning seems simple and alternative option. However this method will reserve 1MB for kernel page tables on IA-32. On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.

Thanks
Sam

JamesM · Post by **JamesM** » Wed Jun 04, 2008 4:14 am

On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.

That depends on just how much address space you reserve for kernel use. I believe it is done the same way on linux/windows, although don't quote me on that.

kmcguire · Post by **kmcguire** » Wed Jun 04, 2008 7:08 pm

Allocating kernel page tables at the beginning seems simple and alternative option. However this method will reserve 1MB for kernel page tables on IA-32. On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.

The version of that which uses less memory at startup is to only allocate tables as they are needed. Once allocated go through all page directories and update their table entry. Then just leave the tables allocated. If you create a new page directory (create a new process) then just copy the reserved section of the page directory from another process to the page directory.

AJ · Post by AJ » Thu Jun 05, 2008 5:41 am

samueldotj wrote:On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.

It doesn't reserve more memory at all. You still only need to pre-assign the top level paging structures - i.e. create the shared PML4 entries and associated PDPT's. If you want to have a reserved space of 0x8000000000 bytes for kernel space, this means you need 1 PML4 entry (takes up no RAM - you need the PML4 anyway) and 1 PDPT (takes 1KiB of RAM). You still don't need to create the PD's and PT's until you actually need them - as long as the PML4 entry is the same across all address spaces, everything else follows.

Cheers,
Adam

samueldotj · Post by **samueldotj** » Fri Jun 06, 2008 12:00 am

Found how it is done in Windows from book "Inside Windows 2000 3rd Edition" Page 435.

The page tables that describe system space are shared among all processes, however. When a process is created, system space PDEs are initialized to point to the existing system page tables. But as shown in Figure 7-12, not all processes have the same view of system space. For example, if paged pool expansion requires the allocation of a new system page table, the memory manager doesn't go back and update all the process page directories to point to the new system page table. Instead, it updates the process page directories when the processes reference the new virtual address.

proxy · Post by **proxy** » Fri Jun 06, 2008 7:53 am

Yea, an approach (which I beleive is similar to what Windows does) is to have a ghost of your "master system page dir" which is always in sync with what you have allocated in the past. An example will illustrate better:

process A makes a system call which triggers kernel heap extension.
process A adds some new pages/page tables to it's PD.
same pages/page tables are added to the master system PD.

process B makes a system call which uses the same memory process A just dragged in
the OS will scan the master PD first and copy any relevant entries if they exist, otherwise do what we just did for process A.

Personally, I find the pre-allocate PTs first easier, but this other approach is nice and has very little overhead (1 extra page)

hope this helps

AJ · Post by AJ » Fri Jun 06, 2008 7:58 am

I'm all for people designing their OS's as they see fit, but what is the advantage of this method over pre-allocating? Although this other method has little memory overhead, it seems as if it could have a good deal of cpu time overhead.

Cheers,
Adam

proxy · Post by **proxy** » Fri Jun 06, 2008 8:06 am

eh, CPU overhead isn't too crazy if you think about it, it mostely boils down to":

masterPD[index] = entry;

keep in mind, only top level PTs are relevent, so at most there are 4096 - size of user space PTs. If you do the usualy 3/1 split, then that's just 1024 entries at most you need to track.

a few extra moves that occur only during page faults isn't too horrible. The benefit...eh, if i have to be devil's advocate, I'd have to say space (not much, but it's there), and convenience, it's a very simple scheme which would work.

like i said, I went with the pre-allocate top level PTs approach anyway.

proxy

AJ · Post by AJ » Fri Jun 06, 2008 8:53 am

like i said, I went with the pre-allocate top level PTs approach anyway.

Understood. I wasn't too clear, but I wasn't trying to criticise your previous post - I was genuinely asking the question of what was to be gained in this second method

Like you, I'm sticking with the preallocate approach in both my 32 and 64 bit kernel, because I would like to try to avoid the unnecessary PFE's. Although the solution is just PD[index] = entry, you still have the fact that that all the overhead that goes with a PFE has to happen in the first place.

Cheers,
Adam

OSDev.org

"kernel page directory" in sync with "user pa

"kernel page directory" in sync with "user pa

Re: "kernel page directory" in sync with "use

Re: "kernel page directory" in sync with "use