Kernel heap management

k0ksz · Post by **k0ksz** » Fri Dec 05, 2008 4:09 am

Hi!

I'm thinking of the best (or simply a better) way to manage kernel heap...

The previous implementation I was using is a little bit bad. So basically this is what I'm doing now: at the beginning of the kernel initialization before I enable paging I map the whole 0 - 1.5Gb region to the kernel memory space. The problem with this is the following: if I have more than 1.5Gb physical memory in the machine I won't be able to use above 1.5Gb because the kernel can't access it.

Now I just started to think on it and I'm trying to figure out a better way to manage kernel heap. The first idea I got is the following: before I enable paging I only map the addresses required by the kernel image (from 1Mb to 1Mb+kernel_size). When kmalloc() (the kernel version of malloc) is called it will allocate n pages to work on (if it didn't do this before). The function that's called by kmalloc to alloc pages will reserve some pages from physical memory and maps it to the kernel address space. After this kmalloc will be happy and I can use every piece of physical memory I want. The only problem with this is if I want to manage the memory regions allocated by the kmalloc page allocator I have to allocate memory for some kind of structures that will store the required information. But the first time kmalloc calls its page allocator there will be no usable memory by kmalloc to allocate for the mentioned structure.

What do you think about it? What would be the best idea to manage kernel heap?

Thanks,
k0ksz

Brendan · Post by **Brendan** » Fri Dec 05, 2008 10:04 pm

Hi,

k0ksz wrote:The only problem with this is if I want to manage the memory regions allocated by the kmalloc page allocator I have to allocate memory for some kind of structures that will store the required information. But the first time kmalloc calls its page allocator there will be no usable memory by kmalloc to allocate for the mentioned structure.

Think of it like this. You've got a physical memory manager that allocates/frees physical pages of RAM. On top of that you've got a virtual memory manager that allocates/frees virtual pages of RAM in the virtual address space (either at a specific virtual address space for user-mode code, or at a specific virtual address in kernel space for kernel-mode code). The virtual memory manager mostly just gets physical pages from the physical memory manager and adds them to the page tables, or (for freeing pages) removes pages from the page tables and tells the physical memory manager the page/s are free (but it would also probably/eventually manage some other things, like swap space, memory mapped files, etc).

On top of that there's "malloc()" (in user space) and "kmalloc()" in kernel space; which both ask the virtual memory manager to allocate some pages and map them at some virtual address, then store their data for managing the heap in these pages, and then use this data to allocate/free space in the heap (while asking the virtual memory manager for more pages if/when necessary, and possibly freeing some pages if the heap has too many it's not using).

Your use of the words "kmalloc page allocator" makes me think you're trying to follow the tragically borked method that Linux has traditionally used; where they attempt to use the (high level) heap manager as a (low leve) physical memory manager and then add a second physical memory manager to cope with everything that didn't fit in the massive wasted area that was mapped into kernel space for "kmalloc()" to use (note: if the kernel only needs 512 MiB for it's own code and it's own data, then you could leave 3.5 GiB for user space instead of mapping 1.5 GiB into kernel space and only leaving 2.5 GiB for user space).

If this is the case, then my advice is obvious - learn from their mistakes...

Cheers,

Brendan

Hangin10 · Post by **Hangin10** » Sun Jan 04, 2009 4:04 pm

My question is on this topic, so I put it here instead of making a new thread. Mods feel free to split if this was an incorrect move.

Is the kernel's heap generally not grow like user heaps? If it does, one has to write the new paging information (whether it be just a table entry, or linking the new table to a directory entry) to each process's tables (assuming each one has a different address space).

Assuming the kernel heap grows, would it be better to immediately put the new information into all directories, or for the page fault handler to recognize the need for change later?
Perhaps the fault handler is better, as if the page is never used by the kernel in the context of the other process,
it doesn't matter having it mapped in.

AJ · Post by AJ » Mon Jan 05, 2009 3:35 am

Hi,

Hangin10 wrote:does, one has to write the new paging information (whether it be just a table entry, or linking the new table to a directory entry) to each process's tables (assuming each one has a different address space).

The easiest way is to map in the kernel page tables for each process at boot time and copy this information when a process is initialised. This "wastes" 1MiB of memory (assuming a 32 bit system with kernel at 3GiB (0xC0000000), but means that each time you update the kernel memory space, each process gets the new information for free.

If you have a 64 bit kernel, do the same thing - simply map the PML4 entries for the kernel at boot time (in other words, create PDPT's for kernel space). In some cases, you will only have your kernel in the top of RAM anyway and will thus only "waste" 4KiB for a single PDPT.

Cheers,
Adam

Hangin10 · Post by **Hangin10** » Mon Jan 05, 2009 9:49 am

But if kmalloc is called in the context of one process, and the kernel heap expands, other processes that were started before the current one (as in, taking place during some sort of syscall in a user app), they will not have the paging info for the new part of the kernel heap.

Perhaps I'm not understanding correctly, as based on your response it would seem the kernel heap is statically allocated and kmalloc does not expand it with new pages when needed (ie sbrk?) like user malloc does for applications.

EDIT: I already know how to map the kernel into each process at process creation time; my question is only about possible kernel heap expansion. (just for greater clarification).

AJ · Post by AJ » Mon Jan 05, 2009 11:24 am

This really works - try it! I would do an ASCII art picture to explain but am rubbish at that

Let's continue with the example of a kernel mapped at 0xC0000000 (with kernel space extending to 0xFFFFFFFF). At boot time you create a single 'template' kernel page directory and 1MiB worth of page tables. You fill in the last quarter of the PD with entries pointing to these page tables (so you fill the PD from entries 0x300 to 0x3FF).

When a new process starts, the end of the PD (from entry 0x300 to 0x3FF) should be copied to the new PD for that process. All the kernel page tables are therefore present for every process.

This means that when you need a few more pages due to a PFE happening in an allocated area, you only need to worry about Page Table Entries. Mapping a PTE in kernel space in one process will automatically happen in every process due to the common page tables.

All this has nothing to do with the kernel heap - we are one level of abstraction under the heap at this point.

Cheers,
Adam

Hangin10 · Post by **Hangin10** » Mon Jan 05, 2009 1:30 pm

*light bulb* Thanks!

So all the page tables for address space of the kernel heap should be allocated at boot time and added to the
template page directory; a lot of PTEs will just be non-present. Page Directory Entries don't need to be added except
for the previously mentioned tables at boot, and invalidation takes care of itself when loading CR3.

Thanks a lot. I've been wondering about this for ages.

AJ · Post by AJ » Mon Jan 05, 2009 1:48 pm

Glad it helped! It's one of these things that can be a bit tricky to explain - too many PD's, PDE's PT's and PTE's

raghuk · Post by **raghuk** » Mon Jul 13, 2009 8:57 pm

This means that when you need a few more pages due to a PFE happening in an allocated area, you only need to worry about Page Table Entries. Mapping a PTE in kernel space in one process will automatically happen in every process due to the common page tables.

I am sorry but I lost you there. If every process has it's own page directory how will this work? How will mapping a kernel PTE (0x300 - 0x3ff) to one page directory will automatically clone to other page directories?

inx · Post by **inx** » Mon Jul 13, 2009 10:34 pm

raghuk wrote:I am sorry but I lost you there. If every process has it's own page directory how will this work? How will mapping a kernel PTE (0x300 - 0x3ff) to one page directory will automatically clone to other page directories?

Each process has a page directory, which has entries(PDEs) for page tables, which have entries(PTEs) for pages(PGs) of RAM.

Code: Select all

|---------------------------------------------------------------------------------------------------|
|Fig 1. Page Directory/Page Table/Page Relationship                                                 |
|--------------------------------------------Page Directory-----------------------------------------|
|        PDE        |        PDE        |        PDE        |        PDE        |        PDE        |
|---------------------------------------------------------------------------------------------------|
|   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |
|---------------------------------------------------------------------------------------------------|
|PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |PG|PG|PG |
|---------------------------------------------------------------------------------------------------|

You allocate all the page tables for the kernel, filling in the page table entries you need for resident RAM for your kernel and setting the rest as not present;
so configuration of kernel space (the page tables) is already in RAM, but can be modified later.

Code: Select all

|---------------------------------------------Page Tables-------------------------------------------|
|Fig 2. Kernel Page Tables                                                                          |
|(Assuming a 32-bit higher-half kernel at 3GB)                                    (NP = Not Present)|
|0xC0000000<----------------------------------------------------------------------------->0xFFFFFFFF|
|   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |   PTE   |
|---------------------------------------------------------------------------------------------------|
|PG|PG|PG |NP|NP|NP |NP|NP|NP |NP|NP|NP |NP|NP|NP |NP|NP|NP |NP|NP|NP |NP|NP|NP |NP|NP|NP |NP|NP|NP |
|---------------------------------------------------------------------------------------------------|

So then, if you configure your process's page directory to look something like so,

Code: Select all

|---------------------------------------------------------------------------------------------------|
|Fig 3. Process Page Directory                                            (Kern = Kernel Pagetables)|
|0x00000000< --------------------------------Page Directory--------------|0xC0000000----->0xFFFFFFFF|
|---------------------------------------------------------------------------------------------------|
|        PDE        |        NP         |        NP         |       Kern        |       Kern        |
|---------------------------------------------------------------------------------------------------|

then, although the latter 7/8 of the kernel page tables are set not present, they are mapped in their entirety to each process. Since it's the same
page tables being mapped to every process, rather than copied, you will see any new kernel pages at the next cache flush (Generally performed at
the end of the system call).

Hope that's somewhat helpful :-S
-- inx

raghuk · Post by **raghuk** » Mon Jul 13, 2009 11:21 pm

Inx, Thanks for the explanation.

So while initializing the kernel you will allocate space for all page tables for the top 1GiB. But only the PTEs needed for resident kernel will be initially marked as present. When the kernel heap grows we keep filling up the PTEs but PDEs does not need to change.

However, these page tables will always occupy (1GiB / 4096) * 4 = 1MiB of memory (assuming 4K pages). That is a complete wastage of precious RAM.

inx · Post by **inx** » Tue Jul 14, 2009 12:55 am

raghuk wrote:inx, Thanks for the explanation.

Not a problem, I'm just happy it made sense.

However, these page tables will always occupy (1GiB / 4096) * 4 = 1MiB of memory (assuming 4K pages)

That's quite true. There's a couple ways to approach that if you're concerned about it:
1) You can flop the balance of processor vs memory usage and just copy each new allocation into each process, wasting processor time,
or
2) You can weigh your expectations and allocate less than 1GB of space for your kernel, balancing out the memory wastage.

Personally, my kernel uses option 2, allocating 512MB to kernel space if there's more than 128MB of RAM, and 128 if there's less.
For my kernel, both of these numbers are ludicrously high. There's almost no way I would ever reach those maximums with my
architecture, but either way doesn't waste enough space for it to matter a whole lot.

raghuk · Post by **raghuk** » Tue Jul 14, 2009 1:30 am

Thank you so much.

I think I'll go with a combination of 1 & 2. My kernel is pretty small at the moment - just 28 KBs, I started only a couple of weeks back

. I do not think it will go beyond 64MiB in the near future. So I'll start with page tables to cover that and then if I need to have more, I'll allocate page tables for another 64MiB and update all tasks.

shadowH3 · Post by **shadowH3** » Wed Mar 09, 2011 1:44 am

I thought it was like so:
(Im using PASCAL..bear with the logic for now..)

Whole PD(contains 1024 PDEs):
PDE PDE PDE PDE
PT *1024 PT *1024 PT *1024 PT *1024
(PTE*1024)[each] (PTE*1024)[each] (PTE*1024)[each] (PTE*1024)[ea]

assuming Im following yall correctly, the PD is 4MB, PT 4K, and PTE[frame] 4 bytes(pageDir as a whole is 4GB)..took me a minute to figure out what a frame was and rework the code..

I can alloc/free pages just fine.
I am having issues with the heap allocation, and the assumption was terrible *NIX logic..
Mario Ray copied the FPC RTL Heap over(so we now have the same logic as Linux kernel with the double heap crap..)

Paging works fine.
But I get bounds errors(MemAlloc excceds end of Phys RAM) when using MemAlloc in my tasking unit.

If we dont use two methods of allocating a stack, then what DO we use?

(I can read your C just fine, but find it a bit obscure, that being a C fault. To each his own on language, really dont care..looking for the logic behind the code.)

BTW:
The whole need for PAE/NX bit is due to a M$ problem(again M$ declaring what industry does...) with GDT/IDT emulation[which has been hacked by the way....its a ten year old unpatched exploit]. If they used the GDT like they were supposed to, they wouldnt have half the issues that they implement. (XP still has 16bit apps lying around...)

If I've gotten to threads, this obviously isnt homework... I have major rewrites underway for the DPMI handling of interrupts(in BP7 of all compilers...). Wouldnt have spent nearly three years on homework anyway....

turdus · Post by **turdus** » Wed Mar 09, 2011 8:06 am

shadowH3: agree, only 2 small things:
PAE was invented to expand bus size (to access more physical memory than 4G), and it was first used by linux (if I remember well)
NX was already implemented by almost every non-intel chips (like sparc) before M$ run into it's problem (nevertheless your're correct thay M$ pushed intel for this)

OSDev.org

Kernel heap management

Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management

Re: Kernel heap management