Paging and Memory Allocation Woes

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
hexcoder
Posts: 15
Joined: Thu Oct 20, 2016 12:26 pm

Paging and Memory Allocation Woes

Post by hexcoder »

Hi,

I'm at a point where I'm trying to design my kernel in such a way that it doesn't come back to bite me later. My main issue is working out how the memory "bootstrapping" process should be handled, for a few reasons:

In the .bss section of my kernel I have a few page tables, enough to allocate up to 2MB. Having done so, the next thing I need to do memory wise is enable the physical memory manager. Regardless of the data structure I use (stack, bitmap etc.), I don't think I can avoid the requirement that the (maximum) memory usage of that data structure will be proportional to the amount of actual RAM I have available (e.g for a bitmap, if I have 1GB RAM then I need 1GB/4096 bits, but for 2GB I need 2GB/4096 bits. Roughly the same thing applies for a stack structure).

So, in order to initialise this memory I need to map it into the virtual address space. In order to do this, I will need a specific number of page tables which will be proportional to the size of the PMM data structure (and thus also proportional to the amount of RAM available).

And herein lies the problem. I can't statically allocate all the page tables at compile time, because the number of tables I will need depend on the size of the PMM which depends on the size of available RAM. This means I need some (primitive) form of dynamic allocation... all whilst not having a proper PMM/VMM which I can use.

I think one potential solution would be to use recursive page table mapping, but 1) I'm not sure that's possible (since my kernel is currently mapped to the last PML4 entry in order to be able to use -mcmodel=kernel) but 2) even if it is possible I'd prefer not to use it so as to keep as much of the virtual address space for myself. (Of course, this is not too important since I'm using x86_64 and the virtual address space is massive, so I'm willing to reconsider). I'm also planning to adopt the linux strategy of mapping the whole physical address space into the higher half on boot.

How have you guys approached this problem in your own operating systems? If anyone knows how it is done in linux or other systems then I would be grateful to hear that information too.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: Paging and Memory Allocation Woes

Post by Octocontrabass »

hexcoder wrote:I think one potential solution would be to use recursive page table mapping, but 1) I'm not sure that's possible (since my kernel is currently mapped to the last PML4 entry in order to be able to use -mcmodel=kernel) but 2) even if it is possible I'd prefer not to use it so as to keep as much of the virtual address space for myself.
Recursive mapping works with any part of the virtual address space, not just the highest part. You can use a lower, more convenient address.
hexcoder wrote:I'm also planning to adopt the linux strategy of mapping the whole physical address space into the higher half on boot.
It's your kernel and you can do what you want, but I think this is extremely bad and I would never do it. I'm under the impression that Linux only does this because Linus didn't know any better when he wrote it.
hexcoder wrote:How have you guys approached this problem in your own operating systems?
I use a stack. The stack is initialized by "freeing" the usable RAM from the memory map. Whenever the stack outgrows its space, the page being freed is immediately allocated to extend the stack. It can scale for things like NUMA and cache coloring pretty easily (just add more stacks), but I'm not sure if there's a good way to handle mixed page sizes.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Paging and Memory Allocation Woes

Post by Korona »

Octocontrabass wrote:
hexcoder wrote:I'm also planning to adopt the linux strategy of mapping the whole physical address space into the higher half on boot.
It's your kernel and you can do what you want, but I think this is extremely bad and I would never do it. I'm under the impression that Linux only does this because Linus didn't know any better when he wrote it.
I think that this is a misconception. First, to clear things up: We're talking about the kernel heap when we say that Linux maps the whole physical address space. Linux is still able to map other pages into userspace programs, however the Linux kernel is not able to use them to allocate objects.

Mapping all physical memory (at least all non-persistent, non-hotplugable physical, actual RAM) has major advantages:
  • kmalloc() does not need to remap memory. This is a huge performance boost. Memory allocated by kmalloc() does not trigger page faults.
  • kfree() does not need to perform TLB shootdowns. No IPIs are required during kfree().
  • Kernel buffers (e.g. for IPC messages) can be accessed (page-wise) without remapping memory. This is also a huge performance boost (e.g. when accessing buffers in a different address space). There is no need to track which buffers are actually mapped and which are not mapped because all buffers are mapped.
  • No "in order to access new memory I need to setup a page table - in order to setup a page table I need to access new memory" cycles.
  • TLB shootdown is sane. Allocate a shootdown request in permanently mapped memory, IPI all processors, wait until all processors respond, free the request. No "how do I shootdown the memory holding the shootdown request?" shenanigans.
What do you lose? The ability to use all memory for the kernel heap on systems where physical space > ~virtual space/2. However those setups are almost broken anyway. If your kernel alone needs a heap larger than than virtual space/2 your OS is probably not going to do much anyways, as it has to constantly map/unmap buffers and send shootdown IPIs instead of doing work.

Furthermore it is really difficult to work around the last two points in the above list if you do not map large amounts of physical memory permanently. I'd like to hear how a sane TLB shootdown scheme works if you dynamically remap kernel memory.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Paging and Memory Allocation Woes

Post by simeonz »

Korona wrote:kmalloc() does not need to remap memory. This is a huge performance boost. Memory allocated by kmalloc() does not trigger page faults.
You can have non-paging dynamic allocator (such as the non-paged pool in Windows) without having linear map of the physical memory. The linear map approach is the most immutable, but any other approach when implemented sanely will not immediately reclaim physical memory for repurposing to other subsystems. The repurposing will be deferred until another subsystem becomes starved, e.g. filesystem cache, and then its space will be extended in bulk, anticipating for future growth. The mapping of the virtual regions used by the fs cache, the kernel allocators, the virtual memory manager, will hopefully change periodically, and not frequently. One situation where I expect the contrary can occur is when the memory has become very scarce, but it is probably not a very important case to target, because the system is reaching its limit at that point anyway.
Korona wrote:kfree() does not need to perform TLB shootdowns. No IPIs are required during kfree().
That is necessary only on subsystem memory contention events, not for each individual buffer allocation and release in the system. As I already suggested, semi-persistent mapping and linear mapping (as in the Linux kernel) are not the same thing. No sane system will map and unmap unless it has to rebalance the allocation of the physical memory pages between different virtual regions.
Korona wrote:Kernel buffers (e.g. for IPC messages) can be accessed (page-wise) without remapping memory. This is also a huge performance boost (e.g. when accessing buffers in a different address space). There is no need to track which buffers are actually mapped and which are not mapped because all buffers are mapped.
The only complication for the case you describe comes from TLB shootdown, which you have pinpointed below, but hopefully it will be a rare event according to my arguments. There is however a related problem. With full physical memory mapping, user buffers can be accessed directly from any system thread, whereas in the absence of such mapping they need to be mapped in kernel space or accessed only from threads within that user process. This is not an issue for requests that can directly be served by dma, but for software raid, compression, software encryption, this will be an issue. So, the approach has its advantages for certain types of I/O.
Korona wrote:No "in order to access new memory I need to setup a page table - in order to setup a page table I need to access new memory" cycles.
The linear mapping approach simplifies the PMM significantly IMO. Even if for no other reason, I think this is motivation enough to keep it around.
Korona wrote:TLB shootdown is sane. Allocate a shootdown request in permanently mapped memory, IPI all processors, wait until all processors respond, free the request. No "how do I shootdown the memory holding the shootdown request?" shenanigans.
This is cpu related request, and thus can use per-cpu memory. It does not need to be dynamically allocated on each request. If it is dynamically allocated at all, it will be done during a cpu hot-plug event, if such is supported.
Korona wrote:What do you lose?
The problem is not that the mapping is persistent. The mapping on systems like Windows is kept as persistent as possible under the changing system circumstances. In all honesty, I expect that it does thrash the virtual space more, but I don't think that it will produce perceptible difference unless the load is intentionally engineered with this goal in mind. The issue with the Linux approach is that the kernel pool and filesystem cache share the same virtual region (because they operate on the linearly mapped memory) and thus also share fragmentation with each other. It is difficult to make multi-page allocations, because the filesystem cache uses individual pages and has dusted the memory with them. The buddy allocator tries to remedy this by coalescing, but it can only do that for a few levels (it becomes exponentially less effective, the larger the allocation is.) There are even constants in the kernel itself that signify the largest allocation that can be reliably performed under normal statistical load.

I would consider a hybrid approach as a good strategy. But someone else will have to comment on the security implications. It is a little disconcerting that all of the physical memory is visible, even if the contents are generally at randomized locations. The general trend for security mitigations has become to limit the visibility and determinism as much as possible, including when kernel code is involved.

P.S. Forgot to mention, although it probably has become obvious from the entire tirade. The linear mapping approach can use large pages, even huge pages (i.e. 1GB). This means that depending on the limitation for the 1GB TLB entries in that CPU model, the accesses in that range may not thrash at all. This is a significant advantage actually. Windows tries to allocate physical memory to its virtual memory ranges in 2MB large page chunks, I think, but the efficiency of 1GB translations cannot be beat. Altogether, I have to agree that the TLB and translation efficiency is better with the linear mapping model, but I am not sure that it counters the issues with memory fragmentation. Reader's choice.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Paging and Memory Allocation Woes

Post by LtG »

hexcoder wrote: In the .bss section of my kernel I have a few page tables, enough to allocate up to 2MB. Having done so, the next thing I need to do memory wise is enable the physical memory manager. Regardless of the data structure I use (stack, bitmap etc.), I don't think I can avoid the requirement that the (maximum) memory usage of that data structure will be proportional to the amount of actual RAM I have available (e.g for a bitmap, if I have 1GB RAM then I need 1GB/4096 bits, but for 2GB I need 2GB/4096 bits. Roughly the same thing applies for a stack structure).
It doesn't apply to stack, stack doesn't need statically allocated memory (apart from a few variables), so that would be the easiest way.

Bitmap works too, but it requires you to first check how much memory there is, calculate (dynamically) how much memory is needed for the bitmap(s), then allocate that from the memory map you get from BIOS/GRUB.

I would say it's more difficult to sanitize the memory map (mmap) itself (sort, break overlaps, conservatively re-label ranges, etc) than it is to find memory in the sanitized mmap for the bitmap(s). Note, if you want to code for all possibilities you need to prepare to have multiple bitmaps. For example some system might give you a memory map:
0 - 4GiB free
...
128TiB - 130 TiB free

And if the middle part is mostly non-existent, you probably don't want to create a single bitmap from 0-130TiB..

Note also, stack effectively uses zero memory. Or rather, it only uses memory when that part of the memory is not in use, and when it becomes used the stack gets smaller and thus doesn't really use any.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Paging and Memory Allocation Woes

Post by LtG »

Korona wrote: Mapping all physical memory (at least all non-persistent, non-hotplugable physical, actual RAM) has major advantages
What if all of the memory is hotpluggable?
Korona wrote: kmalloc() does not need to remap memory. This is a huge performance boost. Memory allocated by kmalloc() does not trigger page faults.
Not sure why you would need to remap anyway? And how it's _huge_ perf boost?

Avoiding page faults can be done by pretty much any method anyway.
Korona wrote: kfree() does not need to perform TLB shootdowns. No IPIs are required during kfree().
Cheap IPIs are cheap, expensive ones are expensive. And TLB shootdowns can also be avoided in other ways.
Korona wrote: Kernel buffers (e.g. for IPC messages) can be accessed (page-wise) without remapping memory. This is also a huge performance boost (e.g. when accessing buffers in a different address space). There is no need to track which buffers are actually mapped and which are not mapped because all buffers are mapped.
Why would you need to access other process memory? Note, for IPC you need to either map the memory to the recipient or worse, copy the data, I prefer to map it. The kernel shouldn't care.
Korona wrote:No "in order to access new memory I need to setup a page table - in order to setup a page table I need to access new memory" cycles.
Not an issue for me, but maybe in some designs this might be problematic. If I understood correctly.
Korona wrote: TLB shootdown is sane. Allocate a shootdown request in permanently mapped memory, IPI all processors, wait until all processors respond, free the request. No "how do I shootdown the memory holding the shootdown request?" shenanigans.
Not sure if I understood, but for me, each core (CPU) is considered separate (even though RAM and other resources might be shared), so I think I don't have to worry about shooting down the shootdown memory =)
Korona wrote: What do you lose? The ability to use all memory for the kernel heap on systems where physical space > ~virtual space/2. However those setups are almost broken anyway. If your kernel alone needs a heap larger than than virtual space/2 your OS is probably not going to do much anyways, as it has to constantly map/unmap buffers and send shootdown IPIs instead of doing work.

Furthermore it is really difficult to work around the last two points in the above list if you do not map large amounts of physical memory permanently. I'd like to hear how a sane TLB shootdown scheme works if you dynamically remap kernel memory.
[/quote]
And you lose half the address space, you lose the ability to use special VAS "tagging" (as in, MSB has some other meaning than "kernel space", kernel space is at a "known" location(ish), shootdown IPIs in a good system isn't that big of a problem anyway, etc..

So I guess I disagreed with most of your points =)
hexcoder
Posts: 15
Joined: Thu Oct 20, 2016 12:26 pm

Re: Paging and Memory Allocation Woes

Post by hexcoder »

Thanks for the advice everyone, much appreciated. And an interesting debate regarding how Linux arranges memory too.

The stack option does seem like a rather enticing one; I'll certainly look into that further. I can't think of a good way of keeping track of the page sizes though - maybe using the lower 12 bits of the page addresses on the stack as flags to signify the page size?

And you'd ideally map the stack to contiguous virtual memory, correct? I feel like this is going to cause dependency issues again, since I'll need to allocate some more memory for page tables before freeing the next page onto the stack. Or maybe I'm overthinking it and there is no circular dependency chain.

Finally, I feel like the stack would have issues allocating contiguous physical memory (which would be required in moderation for kernel structures if I'm mapping the memory linearly).
FusT
Member
Member
Posts: 91
Joined: Wed Sep 19, 2012 3:43 am
Location: The Netherlands

Re: Paging and Memory Allocation Woes

Post by FusT »

What I do:
- Grab available memory from GRUB
- Allocate enough space for my PMM's bitmap right after the kernel stack
- Mark the available memory portions as "available"
- Mark the address space for the kernel (including the PMM's bitmap) as "used"
- Initialize paging

Edit:
I keep track of the "kernel's end of memory" by getting the address of the stack (ESP) from the bootstrapping code and incrementing it each time I "allocate" memory.
This is basically what JamesM does in his tutorial but with a few tweaks and improvements
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Paging and Memory Allocation Woes

Post by LtG »

hexcoder wrote: I can't think of a good way of keeping track of the page sizes though - maybe using the lower 12 bits of the page addresses on the stack as flags to signify the page size?
You mean supporting large pages (2/4MiB)? I wouldn't support them at all until you have a full OS, then you can add support and actually measure if they do good or bad. Large pages can easily hurt performance, so you need actual ways of measuring _real_ performance if you want to do large page support properly.

I would likely try to have a separate stack for large pages, or something along that line..
hexcoder wrote: And you'd ideally map the stack to contiguous virtual memory, correct? I feel like this is going to cause dependency issues again, since I'll need to allocate some more memory for page tables before freeing the next page onto the stack. Or maybe I'm overthinking it and there is no circular dependency chain.

Finally, I feel like the stack would have issues allocating contiguous physical memory (which would be required in moderation for kernel structures if I'm mapping the memory linearly).
There's some different ways you can do this, one would be to use contiguous virtual memory, another would be to store the address of the next "stack page" in the last 4/8 bytes of a single page, so kind of a linked list of pages, where the page contains 1023/511 free page frames (addresses that can be returned to the VMM) and the last entry is the next "stack page" containing the next 1023/511 free page frames. Note, once you use up an entire "stack page", before you start allocating from the next one, you can return the physical address of the previous "stack page" since it's now empty and no longer needed.
Post Reply