Hi,
quadrant wrote:Which
page replacement aglorithms are used in
xv6 and Minix?
I am trying to learn how page replacement is implemented. Both operating systems are written for educational purposes, so they would be great starting points for this.
To be "perfect", an educational OS needs to be simple enough for beginners to learn from but also needs to be complex enough to cover all useful information. These requirements are mutually exclusive...
I wouldn't be too surprised if neither xv6 nor Minix supports swap space, and wouldn't be surprised if they both have no page replacement algorithm at all (especially if you mean "Minix 1.0").
In general; a "least recently used" algorithm is typically used as the basis for a page replacement algorithm. For this; you need some way to determine how recently used a page is. How this is done depends on the CPU. For 80x86 CPUs there are "accessed" and "dirty" flags in the page table entries that the CPU automatically sets for you; so you might scan the page table entries periodically and update the age of each accessed page. For other CPUs (e.g. those with "software TLB miss handling") you might update the age of each page that's accessed during page faults. Once you know (roughly) how recently each page was accessed you can determine which page/s were "least recently used", and you can send those to swap space (if you need to).
However; "pure least recently used" ignores things like the cost involved. For example, if a page was stored in swap space previously and hasn't been modified since, then you can free the page without storing it in swap space again (which would be faster). Also note that swap space isn't the only option - e.g. unmodified pages that are part of a memory mapped file can be freed and then fetched from the file if they're accessed again. Essentially, for each page you have how much time has passed since it was used last and you have a cost; and you can do "score = time * cost" and select the page/s with the best score, so that (in some cases) you improve performance by selecting pages that aren't the least recently used pages.
But...
"Least recently used" is just an estimate for "most likely to be used soon" that tends to get used for things like cache eviction and paging algorithms because it doesn't depend on other information (and it's hard to predict the future accurately anyway, so...). What if you had more information you could use? For example, if you knew that a task that is currently sleeping will wake up in 5 minutes, you could accurately predict that the task is going to use at least some of its pages when it wakes up. For another example; what if you kept track of past access patterns and knew that when the user presses there's a 50% probability that a web browser would access 20 specific pages and a 90% probability that a GUI would access 100 specific pages?
Let's make the model a little more flexible and say that for each page you have a prediction of how likely it is that it will be used soon (where the prediction may be based on "least recently used" if there's no other information, but could be from other information too), and each page has a cost; and you can do "score = prediction * cost" and select the page/s with the best score.
Now...
This is only looking at one side (selecting which pages to send to disk to free RAM). The other side is figuring out which pages to bring back into RAM. This can be split into 2 parts - prefetching data before it's accessed, and having to fetch immediately because you failed to prefetch. Figuring out what to prefetch can be very complicated (but very important for performance). The problem is that (if you're not careful) these systems can fight - you don't want the OS to prefetch pages and send them back to disk immediately, and you don't want to send pages to disk and prefetch them immediately.
To fix that you might want a more wholistic model. Specifically; ideally you'd want to know (estimate) what the best contents of RAM would be for the future, then find the difference between "estimated best contents for future" and the current contents of RAM, then do whatever you can (sending pages to disk and bringing them back from disk) to reduce the differences while taking cost into account. This kind of thinking leads beyond "cutting edge" into "research"; and (in terms of complexity) there's no restrictions.
Cheers,
Brendan