Hi,
Boris wrote:I'm curous about "smart" usage of swap space (beside extanding physical RAM).
Do you have cases when its better to swap a phyisical RAM page into disk and reuse it for another purpose, than using a free physical page ?
Let's forget about swap space and think about "ideal RAM contents", and things like pre-fetching.
Imagine there is:
- 1 GiB of temporary data used by running applications (stack, heap, etc)
- 1234 GiB of files on that computer's file system (including executable files that are currently running)
- 8 GiB of domain names that the OS could cache
- 100 EiB of static files on the internet that the OS could cache
Let's call this "the set of all data".
Now imagine if you split all that data into 4 KiB pages, and give each page a score representing the probability it will be needed soon. If the computer's RAM can store 1 million pages, then "ideal RAM contents" is when the 1 million pages that have the highest probability of being needed soon are in RAM. Of course maximum performance is achieved when "ideal RAM contents" is achieved (e.g. everything pre-fetched before its needed and no delays for any IO).
This means that (e.g.) if there's 100 KiB of data that the GUI used during initialisation and you know it's very unlikely it will be needed again, and if you know that the user that just logged in happens to go to Wikipedia fairly often; then you want to send those "unlikely to be used soon" GUI pages to swap space just so that you can pre-fetch "more likely to be used soon" data from Wikipedia (so if/when the user wants the data it's in RAM before they ask for it).
However; it's not this simple - disk and network bandwidth limits combined with frequently changing "most likely to be needed" scores mean that the OS will probably never reach the "ideal RAM contents" state. The other problem is that it takes a lot of work just to determine which pages are most likely to be needed (e.g. you'd have to keep track of a huge amount of things, like access patterns for all software, end user habits, etc) and you need to compromise between the complexity/overhead of calculating the scores and the quality of the scores. Basically; the goal of the OS is to do the best job it can within those constraints - constantly trying to get closer to (a less than perfect approximation of) the "ideal RAM contents" state but never achieving it; and only being able to get closer to maximum performance.
Now, swap space...
Getting "most likely to be needed" data into RAM means getting "less likely to be needed" data out of RAM to make space. If the "less likely to be needed" data has been modified, then the only place you can put it is swap space. If the "less likely to be needed" data has not been modified then you could get it from wherever it originally came from and don't need to store it in swap space, but if swap space is faster (or more reliable?) than wherever the data originally came from then you might want to store the data in swap space anyway.
Basically swap space is just one of the pieces of the puzzle; but without it you can't improve performance (e.g. using pre-fetching to avoid IO delays) anywhere near as much.
Now let's think about how to design swap space support...
One of the things people often don't realise is that modern RAM chips have their own power saving. If you've got a computer with a pair of 4 GiB RAM chips and only use the first RAM chip, then the other RAM chip can go into a low power mode. If we put all the "most likely to be used" data in the first RAM chip and all the "slightly less likely to be used" data in the second RAM chip; then that second RAM chip can spend a lot more time in its low power mode; and that means longer battery life for mobile systems (and servers that last longer on emergency UPS power), and less heat, less fan noise, etc. Fortunately an OS typically has the logic needed for this - just use the second RAM chip for swap space (or more correctly; half of each NUMA domain's RAM as swap space).
If we're using RAM as swap space anyway; then we can try to compress data before storing it in "RAM swap". With 4 GiB of RAM being used for "RAM swap" half of the pages might not compress well and half might compress to 50% (or less) of their original size; and we might be able to store 6 GiB of data in "RAM swap".
The next thing you're going to want is a tiered system of swap providers. Maybe the video card has 1 GiB of video memory and is only using a quarter of so it offers 768 MiB of swap space, maybe the SSD drive has a swap partition and provides another 10 GiB of swap space, and maybe there's a mechanical hard drive with another swap partition it that offers another 20 GiB of swap space. They're all different speeds. Obviously you want to use the swap space in order - so that more likely to be used data (in swap space) gets stored by the fastest swap provider and least likely to be used data gets stored by the slowest swap provider. The "RAM swap" is fastest so it's first, and if you run out of space in "RAM swap" you transfer potentially already compressed data from "RAM swap" to "video display memory swap". This means that for a total of "2+0.75+10+20=32.75" GiB of physical swap space you might store almost 50 GiB of data due to compression.
Of course when you're transferring data from a faster swap provider to a slower swap provider, if the original data wasn't modified (and swap is just being used for caching) then you'd check if the slower swap provider is faster or slower than the original supplier. For a file that came from SSD, it makes sense to cache it in "video display memory swap" to improve performance, but doesn't make any sense to cache it on "mechanical hard disk swap" that's slower.
Now; let's go all the way back to the start (to "ideal RAM contents"). Imagine if you split all that (> 100 EiB) "set of all data" into 4 KiB pages, and give each page a score representing the probability it will be needed soon. The pages with the highest scores go in RAM, the pages with the next highest scores go in "RAM swap", the pages with the next highest scores go in "video display memory swap", and so on. This gives you "ideal RAM+swap contents". Maximum performance is achieved when "ideal RAM+swap contents" is achieved.
However; it's not that simple (disk and network bandwidth limits and.... ); but the goal of the OS is to do the best job it can; constantly trying to get closer to (a less than perfect approximation of) the "ideal RAM+swap contents" state but never achieving it; and only being able to get closer to maximum performance.
Cheers,
Brendan