Page 1 of 1

Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 2:34 pm
by tabz
I've been following James Molloy's OS tutorial and essentially copy-pasted his paging implementation, which is something I now regret and am planning to rewrite by myself. I wrote my own heap implementation but don't quite understand why he implemented page-aligning memory sizes and addresses in his own implementation, and so the fact that he did makes me think I should add that to my own code, but I don't understand why and how he does it. This could come from my general lack of understanding when it comes to paging (my own fault as I didn't try hard enough to understand the topic before diving into programming it). Would anyone care to explain?

The page in question, and my source.

Re: Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 2:52 pm
by zaval
pool allocations don't have to be page aligned. generally, they are aligned at much less sizes, say 8 byte multiple addresses. why so? I actually don't know, UEFI for example exposes this requirement due to Itanium 64-bit alignment requirement. this doesn't hurt 32 bit machines, but 64 bit ones won't have PITA due to misaligned access.
Page alignment is needed for mapping from processor address spaces (so called - virtual adresses) to system address space (so called physical adresses). Obviously, because page is the unit of that mapping. lol.

Re: Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 4:29 pm
by mariuszp
zaval wrote:pool allocations don't have to be page aligned. generally, they are aligned at much less sizes, say 8 byte multiple addresses. why so? I actually don't know, UEFI for example exposes this requirement due to Itanium 64-bit alignment requirement. this doesn't hurt 32 bit machines, but 64 bit ones won't have PITA due to misaligned access.
Page alignment is needed for mapping from processor address spaces (so called - virtual adresses) to system address space (so called physical adresses). Obviously, because page is the unit of that mapping. lol.
You want 16 bytes alignment on x86_64. Just because the CPU accepts misaligned accesses to 16-, 32- and 64-bit values, does not mean it's good: it actually takes longer to execute misaligned accesses, and they're also not atomic. And for 128-bit operations (like vectors) it is actually required. So heap allocations must be 128-bit-aligned to enable structure fields in that space etc to have the required alignment when necessary.

Re: Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 4:31 pm
by mariuszp
tabz wrote:I've been following James Molloy's OS tutorial and essentially copy-pasted his paging implementation, which is something I now regret and am planning to rewrite by myself. I wrote my own heap implementation but don't quite understand why he implemented page-aligning memory sizes and addresses in his own implementation, and so the fact that he did makes me think I should add that to my own code, but I don't understand why and how he does it. This could come from my general lack of understanding when it comes to paging (my own fault as I didn't try hard enough to understand the topic before diving into programming it). Would anyone care to explain?

The page in question, and my source.
It's because he uses the heap to allocate paging structures which must be page-aligned. Also, don't follow that tutorial, it's extremely broken; he does things such as "moving" a stack and doing sinful things to base pointers on said stack, there's also some interrupt handling errors IIRC, and basically your code will unexpectedly break.

Re: Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 5:04 pm
by zaval
so basically he follows the same approach - to impose the maximum possible alignment requirement on the whole pool as with SIMD rarities. even though it will be a waste of space in 99% cases. I see, that clients should take care on such rare occurences and do additional alignment for their data, whereas pool allocator only should guarantee machine-word defined alignment.

Re: Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 5:27 pm
by simeonz
zaval wrote:so basically he follows the same approach - to impose the maximum possible alignment requirement on the whole pool as with SIMD rarities. even though it will be a waste of space in 99% cases.
The headers of the allocations are usually two or four pointers in size, etc. So, having smaller allocations is of insignificant benefit, because more than 50% of the space will end up being used by metadata anyway. On top of this, the alignment comes kind-of naturally, because as long as your header size is multiple of two pointers, and your allocation payloads are multiple of two pointers, your alignment is also multiple of two pointers. Of course, if the goal is specifically to be efficient with smaller allocations, it can be done. But it will require so basic metadata that it will hurt other aspects - such as security mitigations, robustness, etc.

The stack alignment usually also matches the heap alignment and although the space there is indeed wasted, the stack size is rarely a big concern (on a modern machine and a well behaved program).

Aside from instructions like MOVDQA, there are instructions like CMPXCHG16B, which require double word alignment. Even when double word atomics do not expressly prohibit misalignment, they perform poorly with it, since the interconnect has to be locked for the duration of the instruction.

Re: Why is page-aligning heap allocations necessary?

Posted: Mon Apr 23, 2018 8:38 pm
by Sik
Yeah, I was gonna suggest something more like 32 bytes, in part because of stuff like SSE as mentioned, and in part because you need a header before the returned block anyway (so free() can work) and you may as well make the alignment related to the size of the header (32 bytes lets you store four 64-bit values e.g. pointers or sizes, for example). And this coming from somebody who's obsessed with saving memory.

Also worth noting that the alignment overhead should not be that big of a deal, since in practice this would be used for relatively large chunks of data anyway. For small allocations you're better off giving them a dedicated optimized heap where every allocation is the same size (often a power of two, pad out if needed).


Disclaimer: only bother with the technique in the second paragraph once you have a working "normal" heap. That heap should still cope with small allocations just fine, it's just more wasteful at it.

Re: Why is page-aligning heap allocations necessary?

Posted: Tue Apr 24, 2018 3:05 am
by tabz
Thanks for the feedback!

Thankfully I found the OSDev article on the known issues with James Molloy's tutorial, so I managed to fix a few of the issues. From now on I'm going to try to use other resources and implement as much as I can from my own back.

My heap implementation uses a header and footer design much like James Molloy's, with right and left joining when freeing and splitting when necessary during allocation.

From what you've all said, I'm assuming I should change my heap allocator to ensure that the addresses I return are a multiple of 16-bytes (or does the size of the memory allocated have to be a multiple of 16-bytes)? If so, then an issue I see is how you find the related header when freeing, as you don't know by how many bytes you had to push the address forward when allocating.

For example, the allocator finds a hole header at address 0x123, whose data block is at 0x12C (after adding the size of the header). The address to the data that is returned would have to be 0x130 as that is the next multiple of 16, how do you then find the header when freeing? As you won't know that you had to pad by 4 bytes when allocating. With my current implementation, the start of the header is always sizeof(heap_header_t) bytes before the data being freed, however here it would be sizeof(heap_header_t) + 4 bytes before.

Re: Why is page-aligning heap allocations necessary?

Posted: Tue Apr 24, 2018 6:01 am
by Brendan
Hi,
tabz wrote:From what you've all said, I'm assuming I should change my heap allocator to ensure that the addresses I return are a multiple of 16-bytes (or does the size of the memory allocated have to be a multiple of 16-bytes)? If so, then an issue I see is how you find the related header when freeing, as you don't know by how many bytes you had to push the address forward when allocating.
You should always be able to do "headerAddress = dataAddress - headerSize;" when freeing.

Note that "malloc()" was completely broken for alignment. For example, if you happen to be using SSE then you want 16-byte alignment; but if you're using AVX2 you want 32 byte alignment and if you're using AVX512 you want 64 byte alignment; and sometimes (e.g. arrays of larger data structures) you want "cache line size alignment" (likely to also be 64 byte); and sometimes (e.g. maybe for a large number of short strings where padding could double the amount of RAM used) you might not want alignment at all.

To fix this there were multiple non-standard messes, then POSIX defined a standard "int posix_memalign(void **memptr, size_t alignment, size_t size);" function. If you support any flavour of "aligned malloc" then the old "malloc()" can just be a wrapper, like:

Code: Select all

void *malloc(size_t size) {
    void *address;
    if(posix_memalign(&address, 16, size) == 0) return address;
    return NULL;
}
Also; for large allocations modern implementations typically bypass the heap's pool and use the virtual memory manager's interface directly (e.g. when the size is larger than maybe 256 KiB, "malloc()" might just use "mmap()" without looking for a free block in its pool at all). This allows you do to things like large page optimisations (e.g. if the size is many MiB, "mmap()" might allocate 2 MiB pages).

Finally; if you're not hampered by C library compatibility there are multiple other problems with "malloc()" that you don't have to suffer from - e.g. you can have multiple pools to improve cache locality; and you can give each pool hints/attributes to control whether the physical pages should be encrypted or not, or tied to a NUMA domain, etc; and you can have some control of which allocation strategy (first fit, best fit, ...) should be used; and you can add information for debugging (e.g. give each allocated block an optional "pointer to name string" so you figure out the cause of memory leaks easily, etc).


Cheers,

Brendan

Re: Why is page-aligning heap allocations necessary?

Posted: Tue Apr 24, 2018 6:56 am
by mallard
Brendan wrote:To fix this there were multiple non-standard messes, then POSIX defined a standard "int posix_memalign(void **memptr, size_t alignment, size_t size);" function.
Note that "aligned_alloc" is part of the C11 and C++17 standards, so there is no longer a need for the POSIX-ism. This is only really of practical concern if you're intending to keep your C runtime and POSIX compatibility layers seperate though.