OSDev.org

Posted: **Sat Feb 28, 2009 4:05 am**

Hi,

I don't see any problems with the above, so my suggestions are about design - of course, you may disagree with them

:

1) What if you had the kernel itself at the 3GiB mark and the "kernel startup code" somewhere higher up. That means that when you reclaim the physical memory for the bootstrap code, you could perhaps reclaim some of the virtual address space for your kernel heap too? I guess this would only work if:
a) The kernel bootstrap does not set up the kernel heap.
b) The size of the kernel image is known before the boot strap is loaded.

2) How about having the user space heap directly above the user app, and the libraries starting at the top of user space and growing down? That may be better in the long-run for lazy loading of user libraries and means you don't have to set a finite space for user libraries at load-time.

3) I don't see any space for the user stack (this may be me missing something).

4) What do you mean by gaps 'n holes? Is that just another guard page?

5) Do you intend to map the current process' paging structures somewhere in to the virtual address space?

Sorry if you've taken any of the above in to account and I'm just being pedantic

Cheers,
Adam

Posted: **Sat Feb 28, 2009 9:42 am**

Hi,

Assuming that the kernel itself isn't too buggy (doesn't use unitialized pointers, etc), there's no need for the last invalid page (at 0xFFFFF000) or the guard page (at K_STACK_END). If you do have a guard page below the kernel stack then you'll probably get triple faults if the kernel fills up it's stack because the page fault handler (and double fault handler) won't have a valid/usable stack.

I don't know how you're doing kernel stacks. Some OSs have one kernel stack for each thread/task/process (which costs more RAM). Some OSs have one kernel stack per CPU (which uses less RAM but means the kernel can't easily be preempted). In both cases you need more than one kernel stack. However, if each CPU (or thread/task/process) uses a different page table you can have different stacks (using different pages) at the same linear address. This saves space but increases the overhead of task switching (because you need to patch page directories during task switches). If you never support multiple CPUs and never allow the kernel to be preempted, then you only need one kernel stack (but I wouldn't recommend this as it's too limiting).

I'm not sure what the "nothing, unmapped" area (at K_TRAP_PAGES_END) is for, or the "Usermode traps for kernel API" area.

I'm not sure why you've got one area for "Kernel initial ramdisk and init" and a separate area for "Kernel startup code", and I'm not sure why these things aren't in the area for the kernel heap (from K_HEAP_START to K_HEAP_END); e.g. so that later on you can "free()" these areas and return this space to the kernel heap (rather than leaving an unused hole below the heap). Typically I put things like this in user-space - basically the initial address space is built around the boot code and kernel, so that the boot code and kernel initialization stuff ends up looking like a normal process (and eventually just does "exit()" when it's finished).

I'm also not sure how you're planning to use the area labeled "Process pagetables". Normally I map the page directory into itself to get a 4 MiB mapping of all page directory entries and page table entries. On top of this I map the page directory for each process into a big table (so that the kernel can access the page directory for any process without messing with paging). I'm not sure if you've taken PAE into account either (for PAE you may need twice as much space, and you end up with a different page directory for each GiB of linear space which is very handy in some cases - for e.g. you end up with one shared page directory for kernel-space, with up to 3 page directories per process). PAE is probably important because it lets a 32-bit OS running on a 32-bit CPU support up to 64 GiB of RAM (on 64-bit CPUs it's even better - a 32-bit OS can usually use PAE to support a lot more RAM and NX/XD protection).

Finally, the entire user-space area is none of the kernel's business. The kernel should provide primitives (e.g. "allocPages()", "freePages()", etc) and let the process do it's own memory management for the entire area from 0x00000000 to 0xBFFFFFFF. If you need an area for shared libaries and nothing else, then reserve an area for shared libaries and nothing else (e.g. from 0x80000000 to 0xBFFFFFFF) and shift it out of user-space (but this shouldn't be necessary). Also you may have problems with the userspace stacks - if the process has 200 threads, do all threads share the same stack?

Cheers,

Brendan

Posted: **Sun Mar 01, 2009 7:54 pm**

Hi,

berkus wrote:Do you have estimates on how big a typical pagetable would be? I guess several thousands of pages active at a time for processes with typical working set (i.e. text editor or a web browser). And with careful memory allocation it shouldn't need too sparse page-tables, so their size will be less than typical 4MB.

You don't need an estimate - you can calculate exactly how much space the maximum would be. For example, (for plain 32-bit paging) if a process can use up to 3 GiB then you need at least 3 MiB space reserved for page table mappings (in case the process grows). However, in this case it'd be better/easier to reserve 4 MiB of space and just insert the page directory into itself (so that the same mapping can be used by the kernel to manage the kernel's part of the address space). Otherwise you'd need to have a page directory for the process and a page table in kernel space for the mapping (where both of these things contain almost identical data).

Cheers,

Brendan

Posted: **Mon Mar 02, 2009 5:48 am**

Hi,

berkus wrote:
Brendan wrote:You don't need an estimate - you can calculate exactly how much space the maximum would be. For example, (for plain 32-bit paging) if a process can use up to 3 GiB then you need at least 3 MiB space reserved for page table mappings (in case the process grows). However, in this case it'd be better/easier to reserve 4 MiB of space and just insert the page directory into itself (so that the same mapping can be used by the kernel to manage the kernel's part of the address space). Otherwise you'd need to have a page directory for the process and a page table in kernel space for the mapping (where both of these things contain almost identical data).
So there will be 4MiB per process reserved for page tables. But it can be mapped into the same location in virtual memory - since kernel pages are mapped into process' page table, kernel can just reuse it for it's own addressing. I would think it's a waste to preallocate all 4MiB and use a special reserved address space area for this. To save space we can have only page directories in the special reserved area, kernel's page tables will reside somewhere in kernel heap area and process' page tables somewhere in process heap area, but accessible only for kernel to manipulate. Are these correct assumptions or am I missing something?

If you pretend that the page directory is a page table and create a page directory entry for it (e.g. "pageDirectory[1023] = pageDirectory | flags"), then the page directory will point to itself. In this case the area from 0xFFC00000 to 0xFFFFFFFF will become a 4 MiB page table mapping automatically, and it won't cost you any RAM at all (only 4 MiB of space).

I guess I should also point out that having a "higher-half 1GB mapping" and using the kernel's heap to allocate physical pages is something that works great for a tutorial, but in practice it's only suitable for a toy OS (or maybe an OS designed for single-CPU 80386) - it's far too inflexible for a serious modern OS design.

Cheers,

Brendan

Posted: **Mon Mar 02, 2009 6:06 am**

Would it be acceptable to extend mapping a page directory into itself to long mode? For example, could you map each process's PML4 into itself?

Posted: **Mon Mar 02, 2009 7:15 am**

JohnnyTheDon wrote:Would it be acceptable to extend mapping a page directory into itself to long mode? For example, could you map each process's PML4 into itself?

That works indeed - I use it for my OS. I enter the PML4 as the last PDP, resulting in a nice chain reaction:

The last entry of the PML4 points to the PML4 again, thus the PML4 works as the last PDP.
=> The last entry of the last PDP points to the PML4 again, so the PML4 also works as the last page directory.
=> The last entry of the last page directory points to the PML4 again, so the PML4 also works as the last page table.

=> The last entry of the last page table points to the PML4, mapping the PML4 to the top of virtual memory.
=> Entries 0 - 511 of the last page table point to the PDPs 0 - 511, mapping them right beneath the PML4.

=> The last entry of the last page directory points to the PML4, so the PDPs are the last 512 page tables.
=> Each entry of a PDP points to a page directory. As these are the entries of the last 512 page tables, all page directories are mapped to the top of memory.

=> The last entry of the last PDP points to the PML4, so the PDPs are the last 512 page directories and the page directories are the last 262144 page tables. This finally maps the page tables at the top of memory.

So, the final virtual memory layout at the top of virtual memory looks like this:

0xfffffffffffff000 = last page table = last page directory = last PDP = PML4
0xffffffffffe00000 = first PDP
0xffffffffc0000000 = first page directory
0xffffff8000000000 = first page table

This also makes page allocation very simple. If a page table does not exist, I just allocate it by calling the page allocation function recursively, so it automatically creates a page table / page directory / PDP chain until it hits the PML4.

Posted: **Mon Mar 02, 2009 9:00 am**

Hi,

berkus wrote:
Brendan wrote:I guess I should also point out that having a "higher-half 1GB mapping" and using the kernel's heap to allocate physical pages is something that works great for a tutorial, but in practice it's only suitable for a toy OS (or maybe an OS designed for single-CPU 80386) - it's far too inflexible for a serious modern OS design.
Can you point out the inflexibilities and viable alternatives for kernel memory allocation in this case? Links will do if you don't want to type it in (again).

For "plain 32-bit paging", if the computer has 3 GiB of RAM how will you map it into 1 GiB of kernel space; and for PAE, if the computer has 64 GiB of RAM how will you map that into 1 GiB of kernel space? If you have the 1 GiB mapping plus some other way to handle the extra RAM, then why do you need the 1 GiB mapping?

Also, there's plenty of ways that paging can be used to improve performance, reduce RAM usage and/or improve fault tolerance. If you use a 1 GiB mapping for kernel space, then you can't do any of these things in kernel-space. Examples include "zeroed page" optimization (instead of having many pages full of zeros, just use the same page full of zeros lots of times and allocate a new page when anyone tries to write to it), optimized data moving (move page table entries instead of the data itself), copy-on-write (copy page table entries and allocate a new page if anyone tries to write to the page), swap space (no reason you can't send kernel pages to swap space to free up some rarely used RAM, and page tables and page directories could be sent to swap too). Then there's NUMA optimizations (e.g. see this topic). Now imagine a computer with ECC RAM, where the RAM controller frequently needs to correct RAM that is being used by the kernel - why not mark that page of RAM as faulty and replace it with a good page of RAM that doesn't need to be repeatedly corrected (which would reduce risk because ECC doesn't catch all possible RAM errors, and would improve performance because it takes time for the hardware and/or SMM code to correct the RAM errors)?

Of course this is just a short/quick list from the top of my head - there's probably more, and it's possible that you might think of some new way of using paging in kernel space that nobody else has attempted.

Cheers,

Brendan

Posted: **Mon Mar 02, 2009 10:17 am**

Hi,

berkus wrote:So we just stick to regions and mappings and allocate memory at will without any fixed mappings, is that what you suggest, Brendan? This way only kernel info page and syscalls table should have some known/fixed location and the rest is dynamic. Or did I misunderstood something?

I'd have a physical memory manager that's used by the linear memory manager, that's used by the heap. In kernel space I'd use a fixed location for the kernel and for the 4 MiB page table mapping, and let the kernel use all remaining kernel space for the kernel's heap.

For boot, I'd do something like:

GRUB loads setup code at 0x00100000 and initial RAM disk somewhere else (as a module)
GRUB starts setup code
(Optional) Setup code decompresses initial RAM disk
Setup code initializes paging and identity maps itself at 0x00100000, and maps the initial RAM disk at a fixed location (e.g. 0x20000000)
(Optional) Setup code detects some stuff and decides which kernel is needed (e.g. 32-bit plain paging, 32-bit PAE or 64-bit; single-CPU, SMP or NUMA; etc)
Setup code finds the kernel in the initial RAM disk
(Optional) Setup code decompresses kernel
Setup code finds a kernel in the initial RAM disk, and maps the kernel at 0xC0000000
Setup code helps initialize the kernel
Setup code finds the "init (kserver)" in the initial RAM disk, and maps it at 0x00001000
Setup code starts the "init (kserver)"
The "init (kserver)" frees any pages that were used by the setup code
The "init (kserver)" is now running like a normal process

This leaves an address space that looks something like this:

Code: Select all

       4 GiB +-------------------------+ 0xFFFFFFFF
             | Page Table Mapping      |
             |_________________________| 0xFFC00000
             |                         | K_HEAP_END
             |                         |
             | Kernel heap             |
             |                         |
             |_________________________| K_HEAP_START
             |                         | K_SPACE_END
             | Kernel itself           |
       3 GiB |_________________________| 0xC0000000
             |                         |
             | Unused Space            |
             |_________________________|
             |                         |
             | Initial RAM disk image  |
             |_________________________|
             |                         |
             | Init/Kserver's Heap     |
             |_________________________|
             |                         |
             | Init/Kserver            |
             |_________________________|
             | Unmapped 4K page        |
       0 GiB +_________________________+ 0x00000000

Of course for other processes the entire area from 0 GiB to 3 GiB would be different (depending on executable file header, etc).

Cheers,

Brendan

Posted: **Mon Mar 02, 2009 6:14 pm**

Hi,

berkus wrote:One probably impractical thing I'm thinking of is: will there be any benefit if kernel was mapped straight below page table mapping with kernel heap growing down towards 3GiB mark. I can see possible problems with that with PAE, but maybe I just don't understand it well enough.

For PAE, for page table mapping/s you'd need 4 separate 2 MiB areas (because there's 4 page directories that manage 1 GiB of space each), plus a 32 byte area for the Page Directory Pointer Table. The 4 separate 2 MiB areas can be contiguous (so it looks like one 8 MiB area), but they don't need to be. This also means that if kernel space is 3 GiB you can use the same "kernel page directory" in all address spaces (same page directory in all Page Directory Pointer Tables), which makes it easier to allocate/free page tables in kernel space (no need to modify the address spaces for all other processes).

However, there shouldn't be any reason why different kernels (or different versions of the same kernel) need to have the same layout in kernel space - the only thing that really matters is that the kernel API is the same. For example, the PAE kernel might be mapped at 0xFF000000 with the heap below it, while the 32-bit plain paging kernel might be mapped at 0xC0DE000 with the heap above it.

berkus wrote:A benefit could be more dense kernel pagetable and no artificial 3GiB limit for processes (although it will be still artificial but a bit higher above 3GiB, probably not worth pursuing for a desktop/mobile OS).

The page table mappings always use entire page table/s, so you can't pack anything else in (to get denser page table usage). About the only thing you could do is let the kernel's code share a page table with the kernel's heap (but you'll probably end up doing that regardless of where the kernel is).

Cheers,

Brendan

OSDev.org

My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map

Re: My image of memory map