Hi,
Owen wrote:As for a stack requiring 4 bytes per page and locks? No, it doesn't. Stop thinking "contiguous stack" and think "linked list". Atomic allocations can be done trivially using cmpxchg.
It can be done with a lockless algorithm. This still isn't "perfect" (e.g. there's still contention, fairness and "read for ownership" cache line issues); but most of those problems can be reduced by using many stacks rather than a single global stack.
Stacks can also be done where the "next" field for the linked list is inside the free page itself, and the free pages still don't need to be mapped into any virtual address space. This means that if there's a stack managing 1234 GiB of free pages, it might costs you about 8 bytes of kernel space to manage that stack. If you need to consume 1 MiB of the kernel's virtual address space for each 1 GiB of free pages, then you're probably doing it wrong.
For an algorithm for managing swap space; you never need to send a free pages to swap and therefore you never need any swap management data for free pages. In addition; unless/until you reach some threshold, you don't need any swap management data for individual allocated pages either. For example, you can track when each process was used last, and only begin tracking "least recently used" for individual pages when there's less than 10% of memory left (e.g. if free physical pages plus pages used for file system caches, etc is less than 10% of total pages). Of course this only applies if the computer is configured to use swap space (if there is no swap space, then there's no point keeping track of "least recently used" pages even if less than 10% of memory is left).
For shared memory, you only need data to keep track of for pages that are actually being shared. This is likely to be far less than "all physical pages" and likely to be far less than "all allocated pages".
Finally; allocated pages are always mapped into at least one virtual address space somewhere. For "present" pages, you may be surprised how useful the 3 (or more) "available" bits in the page table entry can be (e.g. they could be used to indicate if the page is part of a shared memory area or copy on write area or whatever). For "not present" pages there's 31 (or more) "available" bits in page table entries (for e.g. you could use 30 of those bits to determine where the data is in swap space, and this would be enough for up to 4 TiB of swap space).
Essentially; for physical memory management you shouldn't need much more than free page stacks at about 8 bytes each (plus something extra for bus mastering/DMA buffers, which is the only case where the physical address of allocated page/s actually matters). For virtual address space management, you only need extra data for things that don't fit in the "available" bits of page tables - lists of virtual address space ranges that are shared, etc.
Cheers,
Brendan