"kernel page directory" in sync with "user pa

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
samueldotj
Member
Member
Posts: 32
Joined: Mon Nov 13, 2006 12:24 am

"kernel page directory" in sync with "user pa

Post by samueldotj »

Hi,

How kernel page directory can be made in sync with user page directory?

To be clear,
1) Kernel page directory contains the initial page directory entries.
2) Task 1 started after copying kernel page directory.
3) Task 1 -> sleep()
4) Some kernel_thread_x() allocated 10MB for some reason, so memory manager created new entry in the kernel page directory.
5) Task 1 -> wakeup()
6) Task 1-> Some sys_call() -> switches to kernel mode and tries to access the memory(VA) created by kernel_thread_x(). This will result in page fault because Task 1’s page directory doesn’t have entry for that VA.

The simplest solution is to copy the kernel page directory during page fault if the VA is in kernel address range. But is there any other way to solve it?

How this is handled in your OS?

How Linux handles it? Which source file/function handles kernel page fault in Linux?

Thanks
Sam
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: "kernel page directory" in sync with "use

Post by JamesM »

samueldotj wrote:Hi,

How kernel page directory can be made in sync with user page directory?

To be clear,
1) Kernel page directory contains the initial page directory entries.
2) Task 1 started after copying kernel page directory.
3) Task 1 -> sleep()
4) Some kernel_thread_x() allocated 10MB for some reason, so memory manager created new entry in the kernel page directory.
5) Task 1 -> wakeup()
6) Task 1-> Some sys_call() -> switches to kernel mode and tries to access the memory(VA) created by kernel_thread_x(). This will result in page fault because Task 1’s page directory doesn’t have entry for that VA.

The simplest solution is to copy the kernel page directory during page fault if the VA is in kernel address range. But is there any other way to solve it?

How this is handled in your OS?

How Linux handles it? Which source file/function handles kernel page fault in Linux?

Thanks
Sam
Hi,

A normal way of handling this is to force the creation of page tables for all of kernel space.

For example - say your kernel code, data, heap, modules etc all reside at the address range 0xC0000000..0xFFFFFFFF. When you initialise paging you'd create page tables to cover this entire range.

Note that you don't need to create page table entries - just the tables themselves. This shouldn't take up too much extra RAM, and it means that when you create a new process you can reuse the same page tables for the area 0xC0000000..0xFFFFFFFF - so any changes that occur in one virtual address space are effective in any other virtual address space too (but only for 0xC0000000..0xFFFFFFFF - the rest of the address space is independent as normal).

Hope this helps,

James
User avatar
samueldotj
Member
Member
Posts: 32
Joined: Mon Nov 13, 2006 12:24 am

Re: "kernel page directory" in sync with "use

Post by samueldotj »

JamesM wrote: Hi,

A normal way of handling this is to force the creation of page tables for all of kernel space.

For example - say your kernel code, data, heap, modules etc all reside at the address range 0xC0000000..0xFFFFFFFF. When you initialise paging you'd create page tables to cover this entire range.

Hope this helps,

James
Allocating kernel page tables at the beginning seems simple and alternative option. However this method will reserve 1MB for kernel page tables on IA-32. On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.

Thanks
Sam
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.
That depends on just how much address space you reserve for kernel use. I believe it is done the same way on linux/windows, although don't quote me on that.
User avatar
kmcguire
Member
Member
Posts: 120
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by kmcguire »

Allocating kernel page tables at the beginning seems simple and alternative option. However this method will reserve 1MB for kernel page tables on IA-32. On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.
The version of that which uses less memory at startup is to only allocate tables as they are needed. Once allocated go through all page directories and update their table entry. Then just leave the tables allocated. If you create a new page directory (create a new process) then just copy the reserved section of the page directory from another process to the page directory.
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

samueldotj wrote:On a 64 bit architecture with multilevel page table architecture it would reserve much more memory. So I think there should be some other way it is handled in Linux/windows.
It doesn't reserve more memory at all. You still only need to pre-assign the top level paging structures - i.e. create the shared PML4 entries and associated PDPT's. If you want to have a reserved space of 0x8000000000 bytes for kernel space, this means you need 1 PML4 entry (takes up no RAM - you need the PML4 anyway) and 1 PDPT (takes 1KiB of RAM). You still don't need to create the PD's and PT's until you actually need them - as long as the PML4 entry is the same across all address spaces, everything else follows.

Cheers,
Adam
User avatar
samueldotj
Member
Member
Posts: 32
Joined: Mon Nov 13, 2006 12:24 am

Post by samueldotj »

Found how it is done in Windows from book "Inside Windows 2000 3rd Edition" Page 435.
The page tables that describe system space are shared among all processes, however. When a process is created, system space PDEs are initialized to point to the existing system page tables. But as shown in Figure 7-12, not all processes have the same view of system space. For example, if paged pool expansion requires the allocation of a new system page table, the memory manager doesn't go back and update all the process page directories to point to the new system page table. Instead, it updates the process page directories when the processes reference the new virtual address.
User avatar
proxy
Member
Member
Posts: 108
Joined: Wed Jan 19, 2005 12:00 am
Contact:

Post by proxy »

Yea, an approach (which I beleive is similar to what Windows does) is to have a ghost of your "master system page dir" which is always in sync with what you have allocated in the past. An example will illustrate better:

process A makes a system call which triggers kernel heap extension.
process A adds some new pages/page tables to it's PD.
same pages/page tables are added to the master system PD.

process B makes a system call which uses the same memory process A just dragged in
the OS will scan the master PD first and copy any relevant entries if they exist, otherwise do what we just did for process A.

Personally, I find the pre-allocate PTs first easier, but this other approach is nice and has very little overhead (1 extra page)

hope this helps
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

I'm all for people designing their OS's as they see fit, but what is the advantage of this method over pre-allocating? Although this other method has little memory overhead, it seems as if it could have a good deal of cpu time overhead.

Cheers,
Adam
User avatar
proxy
Member
Member
Posts: 108
Joined: Wed Jan 19, 2005 12:00 am
Contact:

Post by proxy »

eh, CPU overhead isn't too crazy if you think about it, it mostely boils down to":

masterPD[index] = entry;

keep in mind, only top level PTs are relevent, so at most there are 4096 - size of user space PTs. If you do the usualy 3/1 split, then that's just 1024 entries at most you need to track.

a few extra moves that occur only during page faults isn't too horrible. The benefit...eh, if i have to be devil's advocate, I'd have to say space (not much, but it's there), and convenience, it's a very simple scheme which would work.

like i said, I went with the pre-allocate top level PTs approach anyway.

proxy
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

like i said, I went with the pre-allocate top level PTs approach anyway.
Understood. I wasn't too clear, but I wasn't trying to criticise your previous post - I was genuinely asking the question of what was to be gained in this second method :)

Like you, I'm sticking with the preallocate approach in both my 32 and 64 bit kernel, because I would like to try to avoid the unnecessary PFE's. Although the solution is just PD[index] = entry, you still have the fact that that all the overhead that goes with a PFE has to happen in the first place.

Cheers,
Adam
Post Reply