Optimized memory allocator questions

CRoemheld · Post by **CRoemheld** » Thu Jul 05, 2018 3:12 pm

I just finished my buddy allocator for allocating a multiple/power of 2 of 4096 bytes. I also wrote a small extension for the allocator which allows me to reserve specific pages at given addresses to prevent them from being overridden.
Now that I'm in the optimization stage for the allocator, there are a few things that are bugging me:

- Currently I'm declaring an array with (1GiB / 4KiB, > 250.000) page structs holding all pages for the complete RAM of 1 GiB. The whole array takes about 8 MiB of memory space.
Obviously this is a pretty stupid decision, it was just a test before everything goes fluently. How can I fix this in the most efficient way, i. e., without using a gigantic array holding all the information?

- When reserving memory blocks, I currently reserve all memory regions marked as not available in the multiboot struct (!= MULTIBOOT_MEMORY_AVAILABLE).
However, the last entry in the memory map is overstepping the bounds of my physical available RAM, starting at address 0xFFFC0000 > 0x40000000 (RAM size). How should I deal with this memory region?

- I am using the buddy allocator to allocate pages for the pagetable and for the slabs (kmalloc, etc). Am I right to assume that Linux creates some slabs with static object sizes beforehand?
For example, slabs for objects with size 32, 64, 128, 256, .. bytes as well as slabs for specific often used structs, such as the page struct or task/process structs, to reduce the time used for the allocation?
I somewhat expected Linux to have a rather dynamic approach, but this seems pretty static to me.

I do realize that there are at least 3 questions inside this post, but I hoped to get them answered since its pretty much all about memory allocation.

OSwhatever · Post by **OSwhatever** » Thu Jul 05, 2018 3:39 pm

CRoemheld wrote:I just finished my buddy allocator for allocating a multiple/power of 2 of 4096 bytes. I also wrote a small extension for the allocator which allows me to reserve specific pages at given addresses to prevent them from being overridden.
Now that I'm in the optimization stage for the allocator, there are a few things that are bugging me:

- Currently I'm declaring an array with (1GiB / 4KiB, > 250.000) page structs holding all pages for the complete RAM of 1 GiB. The whole array takes about 8 MiB of memory space.
Obviously this is a pretty stupid decision, it was just a test before everything goes fluently. How can I fix this in the most efficient way, i. e., without using a gigantic array holding all the information?

In practice if you have a MMU based OS, then you need metadata for every page and they can be pre-allocated from the very beginning as you will not gain anything by allocating them later. This takes substantial amount of RAM but in percent it is not that much. Bit allocators and buddy allocators are actually very space efficient unless you go for an extent based approach but they have other problems. 8 MB is not too much to worry about, sounds a lot but when you have 1GB, then you have a lot left.

CRoemheld wrote: - When reserving memory blocks, I currently reserve all memory regions marked as not available in the multiboot struct (!= MULTIBOOT_MEMORY_AVAILABLE).
However, the last entry in the memory map is overstepping the bounds of my physical available RAM, starting at address 0xFFFC0000 > 0x40000000 (RAM size). How should I deal with this memory region?

RAM is often non-continous on several systems. You usually solve this by having several pools. Is this what you meant?

CRoemheld wrote: - I am using the buddy allocator to allocate pages for the pagetable and for the slabs (kmalloc, etc). Am I right to assume that Linux creates some slabs with static object sizes beforehand?
For example, slabs for objects with size 32, 64, 128, 256, .. bytes as well as slabs for specific often used structs, such as the page struct or task/process structs, to reduce the time used for the allocation?
I somewhat expected Linux to have a rather dynamic approach, but this seems pretty static to me.[/list]

Linux creates slabs for a set of sizes before hitting the extent based allocator, not sure what sizes they are but it could be as you suggested. This is not really that static in my view, you have a great deal of freedom, however you probably waste more memory because of the size alignment. This is however good in a kernel because it reduces fragmentation. If you have an often used structure with a specific size, then you can use a dedicated slab allocator for that purpose in order to optimize memory usage as you already mentioned. I think that Linux even lets "user" kernel modules create specialized slab allocators.

CRoemheld · Post by **CRoemheld** » Thu Jul 05, 2018 4:13 pm

Thanks for your fast answer on this one!

OSwhatever wrote:
CRoemheld wrote: - When reserving memory blocks, I currently reserve all memory regions marked as not available in the multiboot struct (!= MULTIBOOT_MEMORY_AVAILABLE).
However, the last entry in the memory map is overstepping the bounds of my physical available RAM, starting at address 0xFFFC0000 > 0x40000000 (RAM size). How should I deal with this memory region?
RAM is often non-continous on several systems. You usually solve this by having several pools. Is this what you meant?

Umm, I'm not sure what you mean. I was asking about the memory region you get when looping through each of the multiboot_memory_map_t struct to get the address and the length of the current region and if the region is marked as e. g. MULTIBOOT_MEMORY_RESERVED, I want to reserve the page frames which are overlapping the range from address to address + length of the current memory region.
My multiboot memory map looks like this:

Code: Select all

Region 0x00000000: length 0x0009F000 [AVAILABLE]
Region 0x0009F000: length 0x00001000 [RESERVED]
Region 0x000E8000: length 0x00018000 [RESERVED]
Region 0x00100000: length 0x3FEF0000 [AVAILABLE]
Region 0x3FFF0000: length 0x00010000 [RECLAIMABLE]
Region 0xFFFC0000: length 0x00040000 [RESERVED]

The problem is that my RAM is only 1GiB in size (0x40000000), but the memory map tells me that something is there at 0xFFFC0000. The second to last entry in the memory map seems to match perfectly, since it marks the last 64 KiB (0x00010000) as reclaimable. But what about the last entry? I cannot reserve it since it is out of the available physical memory range.

simeonz · Post by **simeonz** » Thu Jul 05, 2018 5:22 pm

CRoemheld wrote:- Currently I'm declaring an array with (1GiB / 4KiB, > 250.000) page structs holding all pages for the complete RAM of 1 GiB. The whole array takes about 8 MiB of memory space.
Obviously this is a pretty stupid decision, it was just a test before everything goes fluently. How can I fix this in the most efficient way, i. e., without using a gigantic array holding all the information?

You could design some kind of hierarchical (tree/trie) based descriptions, to save some space, but you will need to track per-page information eventually anyway. Given how frequently page states will have to be accessed on the IO path, it is not worth it.

CRoemheld wrote:- When reserving memory blocks, I currently reserve all memory regions marked as not available in the multiboot struct (!= MULTIBOOT_MEMORY_AVAILABLE).
However, the last entry in the memory map is overstepping the bounds of my physical available RAM, starting at address 0xFFFC0000 > 0x40000000 (RAM size). How should I deal with this memory region?

This is the motherboard ROM. The reset vector is pointed in this range. A physical address is not necessarily memory device bus address, and consequently, not all physical addresses have to be in your RAM. (The CPU MCU and the motherboard chipset are capable of translating physical address ranges.)

CRoemheld wrote: - I am using the buddy allocator to allocate pages for the pagetable and for the slabs (kmalloc, etc). Am I right to assume that Linux creates some slabs with static object sizes beforehand?
For example, slabs for objects with size 32, 64, 128, 256, .. bytes as well as slabs for specific often used structs, such as the page struct or task/process structs, to reduce the time used for the allocation?
I somewhat expected Linux to have a rather dynamic approach, but this seems pretty static to me.

You can check the ranges here. I assume that the larger sizes are not progressing more linearly, because the smaller slabs are busier and are more likely to be "hot", whereas the larger slabs are already "cold" and dividing them further will make them even colder. It is desirable, both for the purpose of coalescing and performance to keep the slabs busy. Also, the allocations are packed into pages and wont divide evenly into non-power-of-two. For example, a size like 384 divides 4K with remainder 256, so even though some external fragmentation would be avoided compared to 512, some external fragmentation is added to the entire slab overall.

As OSwhatever pointed out, kmem_cache_create can be used to make dedicated slabs. One reason to use it is to separate long lived objects from the more general allocations. Long lived objects can pin the slabs they occupy. There is logic to try and starve pages that are on their way to be vacated soon. This enables repurposing slab pages. But a single long lived allocation occupying the page can break this strategy. Using a dedicated slab will pack the long lived allocations into their own pages, where they stay out of the way.

CRoemheld · Post by **CRoemheld** » Fri Jul 06, 2018 5:56 pm

OSwhatever wrote:
CRoemheld wrote:I just finished my buddy allocator for allocating a multiple/power of 2 of 4096 bytes. I also wrote a small extension for the allocator which allows me to reserve specific pages at given addresses to prevent them from being overridden.
Now that I'm in the optimization stage for the allocator, there are a few things that are bugging me:

- Currently I'm declaring an array with (1GiB / 4KiB, > 250.000) page structs holding all pages for the complete RAM of 1 GiB. The whole array takes about 8 MiB of memory space.
Obviously this is a pretty stupid decision, it was just a test before everything goes fluently. How can I fix this in the most efficient way, i. e., without using a gigantic array holding all the information?
In practice if you have a MMU based OS, then you need metadata for every page and they can be pre-allocated from the very beginning as you will not gain anything by allocating them later. This takes substantial amount of RAM but in percent it is not that much. Bit allocators and buddy allocators are actually very space efficient unless you go for an extent based approach but they have other problems. 8 MB is not too much to worry about, sounds a lot but when you have 1GB, then you have a lot left.

So using 8MiB inside the kernel just for this is not considered "bad practice" in any way? even if 8MiB compared to the size of RAM is just a small fraction?

simeonz wrote:You can check the ranges here. I assume that the larger sizes are not progressing more linearly, because the smaller slabs are busier and are more likely to be "hot", whereas the larger slabs are already "cold" and dividing them further will make them even colder. It is desirable, both for the purpose of coalescing and performance to keep the slabs busy. Also, the allocations are packed into pages and wont divide evenly into non-power-of-two. For example, a size like 384 divides 4K with remainder 256, so even though some external fragmentation would be avoided compared to 512, some external fragmentation is added to the entire slab overall.

As OSwhatever pointed out, kmem_cache_create can be used to make dedicated slabs. One reason to use it is to separate long lived objects from the more general allocations. Long lived objects can pin the slabs they occupy. There is logic to try and starve pages that are on their way to be vacated soon. This enables repurposing slab pages. But a single long lived allocation occupying the page can break this strategy. Using a dedicated slab will pack the long lived allocations into their own pages, where they stay out of the way.

I looked into the linux source code and realized that for the kmalloc_info array you referenced in the link is in fact used to create predefined slabs, as seen here. Then it is alright.

OSDev.org

Optimized memory allocator questions

Optimized memory allocator questions

Re: Optimized memory allocator questions

Re: Optimized memory allocator questions

Re: Optimized memory allocator questions

Re: Optimized memory allocator questions