Infinitely recursive page table allocation

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Infinitely recursive page table allocation

Post by HyperAssembler »

1. Let's say my initial mapping looks like:

Code: Select all

[0x00100000, 0x00100000 + kernel_size] -> [0xC000000000000000, 0xC000000000000000 + kernel_size]
[Everything else] -> [Unmapped]
2. Then suppose I vmalloc 4MB for my kernel heap at 0xF000000000000000, where the page table does not exist in the initial mapping.

3. The VMM marks the 4MB area as allocated and calls it good.

4. The kernel calls kmalloc, triggering a page fault.

5. The page fault handler calls PMM and gets a physical page, tries to map it to 0xF000000000000000, however the page table does not exist, the page fault handler then calls kmalloc to allocate a 4k-aligned trunk for the page table.

6. The kmalloc triggers a page fault.

7. The page fault calls PMM and gets a physical page, tries to map it to, presumably, 0xF000000000001000, however the page table does not exist, the page fault handler then calls kmalloc to allocate a 4k-aligned trunk for the page table.

.....

I don't think I need to keep going and you should see the problem here.

One way I can think of to solve the problem is to reserve enough space for all possible page tables (PML4, PDPT, PDE, PTE) for the complete virtual address space somewhere in the virtual address space (like Windows does) and then pre-allocate page tables for that area (only allocate page tables, does not actually map) to guarantee that when page fault happens, page fault handler does not need to allocate a page for a page table.

So what happens if you are trying to map a page to a virtual address but the page table for the virtual address does not exist?
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Infinitely recursive page table allocation

Post by neon »

Hello,

You did already discover one solution - called recursive page mapping. However this also solves another big problem - allowing you to actually update PTE's since the virtual address is known (recall that the paging structures store physical addresses not virtual. So you need a way to work around that.) If interested, this is the method we use.

You'll still need to allocate frames and map it into the address space though - but none of this has to do with your heap. If the page table does not exist, get a free frame and map it to the virtual address - with recursive page mapping you already know where its going to be. The tables aren't preallocated though - they are reserved areas in the kernel address space. The only page tables that get allocated initially are those used by kernel space. User space tables gets allocated via demand paging.
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

So essentially, it's not a problem as long as you know the virtual address of your complete page tables.

Let's say:
PML4 BASE: 0xC0000000
PDPT BASE: 0xD0000000
PDE BASE: 0xE0000000
PTE BASE: 0xF0000000

For the actual page table allocation in page fault handler, you allocate a physical page and directly map it to the page table's virtual address as offsets to those bases (since you already know where it goes).
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

I'm lost again. That does not seem to solve the problem.

Let's say I want to map vaddr 0xF000 -> 0x0 paddr, and the page table for 0xF000 does not exist.

I know the virtual address for the complete page table: 0xD0000, then the offset of that page table would be 0xF000, which gives the final virtual address 0xDF000. So I want to modify the content of vaddr 0xDF000 to be 0x0

Since 0xDF000 is not mapped, I ask for a physical page, and map vaddr 0xDF000 to paddr of the page, let's say 0x1000. In order to do that, I need to modify the content of vaddr 0xD1000 to 0x1000, however, 0xD1000 is not mapped..... Recursion again.
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Infinitely recursive page table allocation

Post by neon »

Pretty much. The problem with the initial design was using the heap allocator to allocate page tables for the heap itself. You always need to allocate frames for the page structures -- but they can be mapped anywhere, so why map them to an area that you are trying to map in the first place?

Some people have the boot loader build the kernel paging structures beforehand. With a recursive page directory structure (which is actually a little bit more complicated then your original idea - please see the Wiki for details) -- you will already know where to map it.

The problem here isn't allocation- its deciding where to map it in the address space. You can either predefine its location (what Windows and Linux does via recursive page mapping) or write a separate allocation scheme for finding free areas in the kernel virtual address space that you can map new paging structures to. This is different then the heap and there are ways to do it (bitmap, PTE free list, AVL tree, etc.)

You might even end up using multiple methods (e.g. we use both recursive page directory and kernel PTE pools.)
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Infinitely recursive page table allocation

Post by neon »

Not quite sure where the confusion is. If the page table is already mapped to, using your example, 0xD0000, why would writing to it cause a page fault? Sounds like a direct contradiction to me.

"and the page table for 0xF000 does not exist."

No problem. Grab a free page from the physical memory allocator and map it to 0xF000. You know the page table from the given virtual address that you are wanting to map. Update the page directory if needed, and its mapped now. When you get the actual page to modify, just write to the page table.
Last edited by neon on Mon Jun 20, 2016 7:46 pm, edited 1 time in total.
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

Hello,

The problem is that I already know the virtual address I need to map it to, however, to perform the actual mapping, I might need to map another page since the page table used to map the previous page does not exist. This is where the infinite recursion comes from.
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Infinitely recursive page table allocation

Post by neon »

Perhaps someone else can clarify it better then I can then. I don't quite know what you mean by "previous page" here nor how its relevant to the topic. I no longer know what your intent is here nor do I understand where the confusion lies. I can only wish you luck and hopefully someone else can clarify -- there isn't any infinite recursion in the designs mentioned above.
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

neon wrote:Not quite sure where the confusion is. If the page table is already mapped to, using your example, 0xD0000, why would writing to it cause a page fault? Sounds like a direct contradiction to me.

"and the page table for 0xF000 does not exist."

No problem. Grab a free page from the physical memory allocator and map it to 0xF000. You know the page table from the given virtual address that you are wanting to map. Update the page directory if needed, and its mapped now. When you get the actual page to modify, just write to the page table.
I see what you mean. Let me think if I get this correct this time.

So basically you directly make the PDE to point at the physical address of the allocated page (PTE table). Therefore, 0xF000 is now magically mapped
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Infinitely recursive page table allocation

Post by Brendan »

Hi,
HyperAssembler wrote:1. Let's say my initial mapping looks like:

Code: Select all

[0x00100000, 0x00100000 + kernel_size] -> [0xC000000000000000, 0xC000000000000000 + kernel_size]
[Everything else] -> [Unmapped]
2. Then suppose I vmalloc 4MB for my kernel heap at 0xF000000000000000, where the page table does not exist in the initial mapping.

3. The VMM marks the 4MB area as allocated and calls it good.

4. The kernel calls kmalloc, triggering a page fault.

5. The page fault handler calls PMM and gets a physical page, tries to map it to 0xF000000000000000, however the page table does not exist, the page fault handler then calls kmalloc to allocate a 4k-aligned trunk for the page table.
Why are you using kmalloc here???

How about...

5. The page fault handler checks if the page directory is present. It's not, so it calls PMM, allocates a physical page and creates the page directory. Then page fault handler checks if the page table is present. It's not, so it calls PMM, allocates a physical page and creates the page table. Then page fault handler checks if the page is present. It's not, so it calls PMM, allocates a physical page and creates the virtual page.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

HyperAssembler wrote:I'm lost again. That does not seem to solve the problem.

Let's say I want to map vaddr 0xF000 -> 0x0 paddr, and the page table for 0xF000 does not exist.

I know the virtual address for the complete page table: 0xD0000, then the offset of that page table would be 0xF000, which gives the final virtual address 0xDF000. So I want to modify the content of vaddr 0xDF000 to be 0x0

Since 0xDF000 is not mapped, I ask for a physical page, and map vaddr 0xDF000 to paddr of the page, let's say 0x1000. In order to do that, I need to modify the content of vaddr 0xD1000 to 0x1000, however, 0xD1000 is not mapped..... Recursion again.
I think I mixed up vaddr and paddr somehow.

So mappinmg 0xF000 to 0x0 works like this:

vaddr of the cr3 is 0xD0000
So I look up index 0xF0 in vaddr 0xD0000 first, if that points to NULL, I grab a physical page and make index 0xF0 in 0xD0000 point to that physical page.

So how do I modify the physical page I just allocated? (what's the vaddr for that page) I think this is where my confusion comes from.
User avatar
neon
Member
Member
Posts: 1567
Joined: Sun Feb 18, 2007 7:28 pm
Contact:

Re: Infinitely recursive page table allocation

Post by neon »

The virtual address format already tells you what page directory entry and page table entry to write to. I.e. they are one to one. (Note that virtual address->page is one to one. Virtual address->frame is not - but that is something you shouldn't worry about now.)

So if you just updated some page - you already know the virtual address it corresponds to.
OS Development Series | Wiki | os | ncc
char c[2]={"\x90\xC3"};int main(){void(*f)()=(void(__cdecl*)(void))(void*)&c;f();}
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Infinitely recursive page table allocation

Post by Brendan »

Hi,

I think it's story time...

Once upon a time there was a massive noob that had no idea what they were doing and made a huge number of beginner mistakes (note: eventually Linus learnt a lot more and eventually became very knowledgeable, but he wasn't born like that). As time passed some of their original mistakes (like using hardware task switching, building a floppy disk boot loader directly into the kernel, etc) were fixed because they were easy to fix; and some of their original mistakes (like failing to separate physical memory manager from kernel heap, and assuming all physical RAM will always be small enough to be mapped into kernel space) couldn't be fixed because far too much other code (drivers, etc) grew to depend on the original idiocy.

About 20 years later, beginners writing their own kernel see incredible stupidity in Linux and make the mistake of assuming it makes sense, and copy the same retarded nonsense (like not separating physical memory manager from kernel heap, and assuming all physical RAM will always be small enough to be mapped into kernel space). Then these unfortunate victims come to this forum when it all breaks, and I tell them a little story... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

neon wrote:The virtual address format already tells you what page directory entry and page table entry to write to. I.e. they are one to one. (Note that virtual address->page is one to one. Virtual address->frame is not - but that is something you shouldn't worry about now.)

So if you just updated some page - you already know the virtual address it corresponds to.

I eventually understand the concept of "reserving a space for yourself in yourself", i.e. recursive page tables. That is a really good trick. As the Windows Internals writes, it does not map the last PDE to itself but some other PDE, it works the same way though. Basically, the PDE becomes PTE and what the PDE was originally pointing to becomes what recursive page table access is writing to. Therefore I just need to write the physical address of the physical page to the PDE and everything else is just direct vaddr access. My mistake was that I didn't realize when accessing the page tables itself in the virtual address space actually shifts PDE to PTE and PTE to the address you are writing to. Now it's all clear!

Thanks so much!
HyperAssembler
Member
Member
Posts: 36
Joined: Thu Sep 04, 2014 5:24 pm
Location: !SIGSEGV

Re: Infinitely recursive page table allocation

Post by HyperAssembler »

Brendan wrote:Hi,

I think it's story time...

Once upon a time there was a massive noob that had no idea what they were doing and made a huge number of beginner mistakes (note: eventually Linus learnt a lot more and eventually became very knowledgeable, but he wasn't born like that). As time passed some of their original mistakes (like using hardware task switching, building a floppy disk boot loader directly into the kernel, etc) were fixed because they were easy to fix; and some of their original mistakes (like failing to separate physical memory manager from kernel heap, and assuming all physical RAM will always be small enough to be mapped into kernel space) couldn't be fixed because far too much other code (drivers, etc) grew to depend on the original idiocy.

About 20 years later, beginners writing their own kernel see incredible stupidity in Linux and make the mistake of assuming it makes sense, and copy the same retarded nonsense (like not separating physical memory manager from kernel heap, and assuming all physical RAM will always be small enough to be mapped into kernel space). Then these unfortunate victims come to this forum when it all breaks, and I tell them a little story... ;)


Cheers,

Brendan
Well I don't copy stuff/ideas form linux. I only use linux as a reference to how something might be implemented. E.g. I don't mix PMM and heap management - allocating page frame on kernel heap was my idea (didn't copy from linux). It just turned out to be not working.

Anyways, you are right. I still have tons of stuff to learn and linux is not the "standard" operating system (I don't like that linux does not have a HAL). Just like your signature - always pursue perfection!

Thanks! This forum is awesome!
Post Reply