Hi,
Yuji1 wrote:Can anyone provide some advice (and hopefully some post-thought poc) on what size paging I should use? 4KB or 2MB? I know it depends on a wide variety of things but hm, I kinda want to stick to one type alone and use that. At 2MB I'd have to implement process separation which in its own would have overhead but would it outweigh 4KB paging without process 'active' separation. Young, theorizing still.
For long mode there are 3 page sizes - 4 KiB, 2 MiB and 1 GiB. You can't allocate part of a page, so on average you waste half a page per area due to "rounding up". If a program has 4 areas (.text, .rodata, .data and .bss) this means that:
- For 4 KiB pages; you'd waste 8 KiB of RAM for "over allocation"
- For 2 MiB pages; you'd waste 4 MiB of RAM for "over allocation"
- For 1 GiB pages; you'd waste 2 GiB of RAM for "over allocation"
On the other side there's the cost of paging structures (e.g. page tables, page directories, etc), where larger page sizes mean less RAM consumed for paging structures:
- 4 KiB pages cost you 4 KiB extra per 2 MiB of space (compared to using 2 MiB pages)
- 2 MiB pages cost you 512*4+1 KiB =2049 KiB extra per 1 GiB of space (compared to using 1 GiB pages)
This gives us "break even points" - a specific "amount of RAM used by a process" where overhead (over allocation + paging structures) is the same for 2 different page sizes:
- If a program uses 2 GiB of RAM; for 2 MiB pages it costs 4 MiB of RAM for over allocation, and 4 KiB pages it costs 8 KiB for over allocation plus 1024 * 4 KB extra for paging structures.
- If a program uses 1 TiB of RAM; for 1 GiB pages it costs 2 GiB of RAM for over allocation, and 2 MiB pages it costs 4 MiB for over allocation plus 1024 * 2049 KB extra for paging structures.
From this you can say that:
- If processes use 2 GiB of RAM or less on average, then 4 KiB pages consumes the least RAM for overhead
- If processes use between 2 GiB of RAM and 1 TiB of RAM on average, then 2 MiB pages consumes the least RAM for overhead
- If processes use more than 1 TiB of RAM on average, then 1 GiB pages consumes the least RAM for overhead
However; this assumes that all of the data a process "uses" is in RAM all of the time. For an OS that supports swap space (or memory mapped files or a few other things) this isn't true - some of the data a process "uses" is on disk and not in RAM. This has the effect of dramatically increasing the amount of RAM wasted for over allocation. For example, if a process has 50 TiB of data and frequently uses 1234 different 100 byte pieces of that data and doesn't actually use the rest of the 50 TiB (and the rest of that 50 TiB is still on disk); then 2 MiB pages (or 1 GiB pages) is going to waste a massive amount of RAM.
Basically; for "RAM consumed for overhead" it's almost always more efficient to use 4 KiB pages, even if most processes use many TiB of data each.
"RAM consumed for overhead" isn't the only important thing though - there's also things like TLB and TLB misses. It's extremely difficult to determine the effect different page sizes have on the cost of TLB misses, as it depends on the CPU (how many TLB entries there are and how good the CPU is at avoiding the overhead) and the software (specific access patterns, number of "virtual address space switches", etc).
In general, CPUs have lots of TLB entries for 4 KiB pages (e.g. 512 for Sandy Bridge), less TLB entries for 2 MiB pages (e.g. 32 for Sandy Bridge) and even less TLB entries for 1 GiB pages (e.g. 4 for Sandy Bridge). Software tends to use many things at the same time (e.g. a few different areas of code, a few different areas of stack and many different areas of data); and even though larger page sizes would mean less TLB misses, less TLB entries can mean more TLB misses (e.g. with 1 GiB pages you'd run out of TLB entries so fast that you'd end up thrashing the TLB and get a very high number of TLB misses).
Because it's almost impossible to estimate which page size is better (for TLB) under which conditions, it would seem foolish to use larger page sizes in the hope that it might help performance.
Now; all of the above assumes you're only using one page size. What if you use multiple page sizes?
Using multiple page sizes seems like it'd be the best of everything - e.g. minimising page structure overhead, minimising over allocation, and helping with TLB misses. In practice it doesn't work like that. In practice it complicates physical memory management and can cause a disaster.
If you run out of 4 KiB pages then you need to split up a 2 MiB page to get more; but if that's all you do then after a while all of the 2 MiB pages would be split up and you'd have none left. To avoid that you also have to be able to combine 4 KiB pages back into a 2 MiB page. For example, your physical memory manager might have 450 of the 4 KiB pages it needs to make a 2 MiB page, but then have to find the remaining 62 allocated pages (and replace them with other pages to make them free) to be able to combine them into a free 2 MiB page. This (or some variation of it) is possible, but (regardless of how it's done) it adds extra overhead and the complexity tends to make other optimisations (e.g. "O(1) alloc and dealloc") harder. It's likely that the performance improvement you get from supporting mixed page sizes will be too small compared to the extra overhead needed to provide that support.
If you weigh all of this up, the most sane approach is likely to be only using 4 KiB pages for normal RAM (to reduce RAM overhead and physical memory management complexity); and to use the larger page sizes for "large enough" memory mapped IO areas, like video display memory (where there can't be any swapping or anything, over allocation isn't a problem, and there's no physical memory management involved).
Cheers,
Brendan