Page 1 of 1

Memory Organization in kernel context

Posted: Sat Oct 03, 2015 7:34 am
by kailot2
Hi, I'm interested in how you dispose of an address space inside kernel.
I had planned to reserve the first 16 MB of memory for the kernel, and use them, but this scheme I do not like.
Share your ideas on this subject.

Re: Memory Organization in kernel context

Posted: Sat Oct 03, 2015 12:21 pm
by BASICFreak
kailot2 wrote:Hi, I'm interested in how you dispose of an address space inside kernel.
I had planned to reserve the first 16 MB of memory for the kernel, and use them, but this scheme I do not like.
Share your ideas on this subject.
I dislike that scheme too, main reason is I test with 32MB (so using half doing nothing is a no go)

For physical memory management here is my design:
Currently I reserve 512KB (0x80000) for ISA DMA and BIOS/Boot Info. Then I reserve the Kernel and Mod addresses.
Once the mods are initialized I then free the space the ELFs were occupying. End of the day the kernel alone uses about 4MB (with the reserve counted as used - and 0x281000 Bytes for Kernel Reserved Areas)

I'm using something similar to the memory map from INT 0x15 E820h to keep track of free space. Each entry is 64 bits, DWORD Base and DWORD Length.

Each thread takes a minimum of 8KB with no code nor stack, 16KB with minimal code no stack, and 24KB with stack and code - but my temporary user heap manager initializes 516KB per thread (if used)



Hope this gives you some idea of things. If not ask a specific question.

Re: Memory Organization in kernel context

Posted: Sat Oct 03, 2015 10:57 pm
by Brendan
Hi,
kailot2 wrote:Hi, I'm interested in how you dispose of an address space inside kernel.
Disposing of a virtual address space is mostly just freeing pages that correspond to user-space (e.g. a loop that checks if each page table is present, where if the page table is present there's an inner loop that checks if each page within it is present and frees the page before the page table is freed). This would happen when the process is being terminated (and the kernel is running in that virtual address space). It leaves behind a virtual address space that still has the kernel mapped into it (and nothing else). From there you switch to a different task.

For the "virtual address space that has the kernel and nothing else mapped into it"; you have 2 choices. Either you destroy it almost immediately (after switching to a different task, when your kernel isn't still using it); or you keep it around and re-cycle it when a new process is created (to avoid destroying it then rebuilding the same thing later). The latter is a little faster (but probably not much) but is also more complicated (you might need to keep it up-to-date when page tables are allocated/freed in kernel space, you might want to destroy them later if physical memory is running out, etc).
kailot2 wrote:I had planned to reserve the first 16 MB of memory for the kernel, and use them, but this scheme I do not like.
Share your ideas on this subject.
The first 16 MiB of physical memory contains the most valuable memory - it can be used for literally everything (including ISA DMA buffers). The area from 16 MiB to 4 GiB contains the second most valuable memory (it can be used for everything except ISA DMA buffers). The area above 4 GiB is the least valuable memory (it can't be used for PCI devices that require 32-bit physical addresses). It makes sense to use the least valuable memory for kernel (and avoid wasting more valuable memory for no reason), especially during boot when you can't know how much "more valuable" memory will be needed by drivers, etc.

However; there's a few other things that come into this. The first is that often boot code is fairly limited for some reason (e.g. running in protected mode with paging disabled) and can't access memory above 4 GiB (and therefore can't easily use memory above 4 GiB for the kernel, at least not without temporarily enabling PAE or long mode).

The second thing to consider is NUMA optimisation. For NUMA, each group of CPUs typically has "close memory" that it can access quickly and "far memory" that takes longer to access. For example, you might have 16 GiB of RAM where half the CPUs can access the first 8 GiB quickly and the second 8 GiB slowly; and the other half of the CPUs access the first 8 GiB slowly and the second 8 GiB quickly.

If you ignore NUMA and just put the kernel "wherever" (e.g. in one memory area) then that memory will be "close memory" for some CPUs and "far memory" for other CPUs; so some CPUs will be fast and others will be punished. You can balance this out - e.g. deliberately use a mixture of memory areas such that all CPUs end up roughly equal. However; for some of the RAM the kernel uses (e.g. kernel's code and read-only data) nothing prevents you from using multiple copies, so that all CPUs use their own fastest/closest copy. This can be extended a little - e.g. for kernel data that is rarely modified, you can still have multiple copies and update all copies when the data is changed. For kernel data that changes frequently the overhead of updating multiple copies can cost more than not using "fast memory" though.

The third thing to consider is fault tolerance. For example, if you decide the kernel will always use RAM at 1 MiB, but the firmware says (part of) that memory happens to be faulty, then you're screwed and the kernel can't boot using other memory. Even for systems with ECC you can have areas of memory that have a much higher number of corrected single bit errors, which are slower to use (due to frequent corrections) and have a much higher chance of seeing uncorrectable errors. A smarter OS (that's designed for better fault tolerance) could decide which pages to use for kernel during boot and avoid (some or all of) these problems.

Finally; there's no real reason why a kernel can't replace it's own physical pages with other/better physical pages while it's running. For example, maybe the kernel has a "next page to check" counter and checks 10 kernel pages per minute (continuously) and improves them if/when better pages are available. If the boot loader was using some pages and couldn't use them for the kernel during boot (even though they would've been better for the kernel) then kernel would automatically change to the better pages and improve itself later, if some RAM the kernel is using starts to get a bit dodgy (e.g. starts to cause a lot of corrected single bit errors) while the kernel is running then the kernel will fix that, etc. In this case; it means that the boot loader wouldn't need to worry so much about making sure the kernel is using the best pages. Note: It's also interesting for things like hot-plug RAM (where the best pages for the kernel might not have even existed during boot) and for hot-remove RAM (where kernel has to make sure a specific area of RAM isn't being used by anything including itself before it can be removed).

For my OS; I do most of the things I've mentioned above (and some I haven't mentioned). I have no idea which physical pages my kernel uses because they are dynamically allocated by the boot code.


Cheers,

Brendan