Hi,
azblue wrote:If I were to create artificially large pages, would that eliminate the need for page coloring or reduce the number of colors needed?
For example, if I always map physical memory to virtual memory at 64K boundaries in 64K chucks (essentially creating a form of 64K pages), would I be correct is assuming I've just reduced the number of colors needed by a factor of 16? (Thus entirely eliminating the need for coloring on any system requiring 16 or fewer colors).
Similarly, if I use 2MB pages (real pages, not "artificial" ones), would that reduce the number of colors needed by a factor of 512?
It would reduce the number of page colours as you've described (all the way down to one page colour for 2 MiB pages in most cases, where "1 page colour" is the same as "no page colouring at all").
However...
Imagine you've got 50 tiny processes that each have 4 KiB of code at 0x00200000 and 4 KiB of data at 0x00400000 in their own/separate virtual address space. In this case, the code for all 50 processes will alias each other in the CPU's L2 instruction cache, and the data for all 50 processes would alias each other in the CPU's L2 data cache, and all the code and all the data for all 50 processes would alias in the L3 unified cache. Assuming the L2 caches are both 256 KiB 8-way associative you'd only be using 32 KiB of each L2 cache due to aliasing between processes, and assuming the L3 cache is 8 MiB with 16-way associativity you'd only be using 256 KiB of the L3 cache due to aliasing between processes.
Normally to avoid this aliasing between processes you'd use skewing - instead of doing "
page_colour = (address/page_size)%page_colours" you do "
page_colour = (address/page_size + process_ID)%page_colours", so that the page at address 0x00200000 is a different colour for each different process.
By increasing the page size, you decrease the amount of skewing you can do (all the way down to "no skewing possible at all") and end up with different processes aliasing in caches.
Also note that increasing page size increases other problems, like RAM wasted due to "partially used" pages. The normal formula I use is "RAM_wasted = number_of_processes * average_number_of_sections_with_different_page_permissions * page_size/2". With 50 processes that each have 3 sections (executable, read only, read/write sections) you'd waste 300 KiB of RAM if you use 4 KiB pages, and you'de waste 150 MiB of RAM if you use 2 MiB pages. It also makes things like shared memory, memory mapped files and swap space less efficient (e.g. if you've got a memory mapped executable file that has 12 KiB of "only used during initialisation" code and 50 KiB of "used after initialisation" code, you can't free that 12 KiB of "not needed anymore" code after the process is initialised). Of course more RAM wasted means less RAM for things like file caches, which means worse performance.
Cheers,
Brendan