With all 26 cores allocating in parallel, 4 KiB allocations slow down by a factor of ten (94 ns→984 ns), while 2 MiB allocations are even 27 times slower! This poor scalability affects many multicore and memory-heavy workloads [7]. The root causes are the scattered allocator state and the usage of global locks, both of which are also problematic [16, 17]
Compared to the Linux frame allocator, LLFree reduces the allocation time for concurrent 4 KiB allocations by up to 88 percent and for 2 MiB allocations by up to 98 percent. For memory compaction, LLFree decreases the number of required page movements by 64 percent.
Paper: LLFree: Scalable and Optionally-Persistent Page-Frame Allocation
Video presentation: https://www.youtube.com/watch?v=yvd3D5VOHc8
Implementation and benchmarks are well documented at the repos:
Rust repo
C repo
I have no relation to this. I just found it while looking at frame allocators and thought you might find it interesting, too.