memory management with multiple cores
Re: memory management with multiple cores
In my case, I am trying to understand if this model of memory can be used to speed up the execution of MPI applications by using an unikernel. Each core runs an instance of the MPI application. I presented a first PoC about this at https://fosdem.org/2023/schedule/event/ ... ernel_mpi/.
Re: memory management with multiple cores
One idea I had recently was to combine the two:nexos wrote:No, that leads to asymmetric (i.e., your OS will be AMP not SMP) treatment of the CPUs, which will turn into a bottleneck. Instead, I would start out by having global memory management structures (i.e., your physical memory slab / bitmap / free list, and your virtual memory structures) that all CPUs work with equally, as if there was one CPU.
One catch though: these global structures will need to have locks. If you haven't looked it locking, now's the time to do it - it comes up everywhere is multi-processor systems.
The other option would be to have per-CPU memory management structures - that would be lockless (which is faster), but would be a pain to implement right, as memory is typically a global resource, not a per-CPU resource.
One great guide to memory management is on the wiki: https://wiki.osdev.org/Brendan%27s_Memo ... ment_Guide
Each core allocates a few megabytes from the global memory structure (locks requires), then each core can allocate out of its local memory locklessly. The only time the slower, locked structures need to be accessed is when the local memory is used up.
Re: memory management with multiple cores
Sounds a bit like jemalloc. Essentially, you allocate a malloc arena for each thread separately. Only, in your case it would be for each CPU. OK, that it possible, but you have to be careful about throwing terms such as "lockless" around. Since a block allocated by one CPU may be freed by another, each block must record what arena it came from, and multiple CPUs can still access the same arena at the same time. And even on a single CPU, you must prevent interrupts and preemption while allocating memory, or else you can get invalid data structures for a long time. The lock is going to be rarely contended, and most modern lock implementations only issue a single atomic instruction in that case, but you do need a lock nonetheless.azblue wrote:One idea I had recently was to combine the two:
Each core allocates a few megabytes from the global memory structure (locks requires), then each core can allocate out of its local memory locklessly. The only time the slower, locked structures need to be accessed is when the local memory is used up.
Carpe diem!