Hello,
I have been planning out my memory manager, but I had one question come to mind. I know many memory managers use the "clock algorithm", which is really approximate LRU. But, one question I have is how does this work on SMP? For example, if the clock hand changes the referenced bit, wouldn't that require a shootdown? So my question is, how do modern systems implement page replacement algorithms?
Thanks,
nexos
LRU on SMP memory manager
-
- Member
- Posts: 424
- Joined: Tue Apr 03, 2018 2:44 am
Re: LRU on SMP memory manager
The same way as UP, but with TLB shoot downs as required, as you say. How a page mapping is changed to clear a reference bit is very platform specific, so the implementation details are hidden behind an abstraction, but the process of scanning memory for reference bits to clear should be platform agnostic, as the referenced bit should be maintained in a central page frame database. [1]nexos wrote:Hello,
I have been planning out my memory manager, but I had one question come to mind. I know many memory managers use the "clock algorithm", which is really approximate LRU. But, one question I have is how does this work on SMP? For example, if the clock hand changes the referenced bit, wouldn't that require a shootdown? So my question is, how do modern systems implement page replacement algorithms?
Thanks,
nexos
In clock, the scan rate generally tends to be proportional to memory pressure. No memory pressure, very little scanning, and so little overhead from resulting TLB shoot downs. Under high memory pressure, you'll notice thrashing overhead more than TLB shoot down overhead, so it is worth a little bit of pain to avoid thrashing.
Of course, this very much depends on your abstractions. The SVR4 and BSD VMM have a much cleaner separation between platform independent and platform specific code, than say, Linux. Linux is more like how 4 BSD prior to 4.4 expected platform specific memory management to look VAX like, and interpret the VAX like page tables onto the local CPU. [2]
Sun threw that out for SunOS (originally 4.1 BSD based?), and replaced it with a VM subsystem that used a much more abstract interface to the MMU using HAT (Hardware Address Translation) API, and was eventually subsumed into SVR4.
4.4 BSD also threw that out, replacing it with the Mach VM, again with a much more abstract interface to the MMU using the pmap API.
But in both the SunOS/SVR4 and BSD/MachVM cases, the page scanning is very much done in the platform independent portion of the VMM, and both have since scaled quite happily SMP wise.
[1] Of course, how the reference bit makes it from the platform MMU to the platform independent page frame database is the crux of the problem. My own personal code doesn't export that information from the hardware reference bit, and instead is filled in by software in response to page faults. Thus, no reference bit is required in hardware, so I should be fine when I port to VAX
[2] Linux and Windows provide an idealised multi-level page table in which to communicate mappings, including maintenance of reference information. I'm not sure if Linux/Windows scans these page table structures directly for reference information, or whether that information is also mirrored to the central page database.
Re: LRU on SMP memory manager
Ok thanks! I guess I'll be looking into lazy shootdown algorithms
Re: LRU on SMP memory manager
The clock algorithm (at least the one described on Wikipedia) requires a reverse map (from physical to virtual addresses) though, if the CPU has virtually-mapped access bits. That sounds quite inconvenient. How do OSes that use such an algorithm typically solve this issue?
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
-
- Member
- Posts: 424
- Joined: Tue Apr 03, 2018 2:44 am
Re: LRU on SMP memory manager
MachVM/BSD provides pmap_is_referenced, which takes a vm_page structure.Korona wrote:The clock algorithm (at least the one described on Wikipedia) requires a reverse map (from physical to virtual addresses) though, if the CPU has virtually-mapped access bits. That sounds quite inconvenient. How do OSes that use such an algorithm typically solve this issue?
Rather than cycling through the physical page database, the BSDs link the vm_page objects onto LRU queues, so it doesn't look like even the BSDs require a reverse map. Forward mapping is enough.
I may be reading it wrong though, I was going from the NetBSD code here:
uvmpdpol_selectvictim