Hello guys,
does any of you know a page/cache coloring implementation available online for IA32 architecture?
I would like to take a look in some code that manipulates the memory allocator in order to provide page/cache coloring by changing some bits in the memory address.
Thanks
Any Page/Cache coloring implementation available?
Re: Any Page/Cache coloring implementation available?
Obviously homework, but okay.
L4 Microkernel has a page coloring algorithm.
Check out http://os.inf.tu-dresden.de/L4/
L4 Microkernel has a page coloring algorithm.
Check out http://os.inf.tu-dresden.de/L4/
Re: Any Page/Cache coloring implementation available?
Thanks. It is not homework. I want to implement a cache partitioning in our kernel based on cache coloring.
I want to learn more about how the physical address is changed to be mapped to a specific cache line.
I want to learn more about how the physical address is changed to be mapped to a specific cache line.
Re: Any Page/Cache coloring implementation available?
Hi,
You can (for an example), have a separate pool of free pages for each different cache colour. In this case when you allocate a page for a certain virtual address you'd determine which cache colour you want from the virtual address, then allocate a physical page from the pool of free pages for that corresponds to that page colour. When you free a page you'd use it's physical address to determine which pool of free pages the page should be added to. I normally do this with free page stacks - e.g. for 256 cache colours I'll have 256 free page stacks (with no searching needed).
You can (for an example), have one pool of free pages for all cache colours too. In this case when you allocate a page for a certain virtual address you'd search for a page with the corresponding cache colour. This is slower as it involves searching (but doesn't have problems in less common cases where the physical address matters - e.g. physical pages used for ISA DMA).
You can combine both these techniques. For example, for 256 cache colours you might have 256 separate free page stacks for memory above 0x01000000, plus one free page bitmap for memory below 0x01000000.
In both of these cases (one free page pool per cache colour and one free page pool for all cache colours) if you can't find a free physical page for the desired cache colour you'd allocate a "not so good" page with the wrong cache colour; because a slight performance loss is better than returning an "out of memory" error. This means that it might be worthwhile to have some sort of "memory optimiser" - for example, imagine a low priority kernel thread that checks which pages are being used by each process and (if it can) replaces "less good" pages with a better pages. This is optional (I'd probably only worry about doing this if you're checking which pages are being used by each process for other reasons).
The hard part is detecting how many cache colours you need - you need to know cache size and associativity, and different CPUs use different methods to return this information via. CPUID (and some don't provide this information directly and you have to use the "vendor, family, model" and lookup tables). The basic idea is; for each cache, divide the cache size by the cache's associativity and the page size to get the number of cache colours you'd want for that cache; then do that for each cache and find the highest number of cache colours. For example, if the CPU has a 256 KiB 4-way associative L1 data cache and a 1 MiB 8-way associative L2 data cache, then you'd want 16 cache colours for the L1 data cache and 32 cache colours for the L2 data cache; so you'd use 32 cache colours (as it's higher).
Once you know how many cache colours you want you'd round it up to the nearest power of 2 (e.g. if you want 48 cache colours you'd round up to 64 cache colours), and clamp it to a sane range (e.g. if you want 32768 cache colours, maybe limit it to 1024 cache colours). Then you want to use the resulting number of cache colours to calculate a shift value and a mask, such that:
When freeing a physical page you'd use this formula to determine which free page pool the free page should be added to.
To find the cache colour from a virtual address the formula should be a little different. You could use the same formula; but processes tend to use some virtual addresses more than others (for example, you might have 100 different processes that all have code starting at 0x00010000) which means that you'd run out of physical pages for some cache colours faster than others. To avoid that, you want to give each process a different "cache colour bias". That way (for example) virtual address 0x00010000 for one process might be a different cache colour than virtual address 0x00010000 for a different process. The basic idea is:
However, the cache colour bias may be derived from the process ID (e.g. "process_cache_colour_bias = process_ID & (cache_colours - 1);"), which means that you could use this instead:
Cheers,
Brendan
You can't change a page's physical addresses.giovanig wrote:Thanks. It is not homework. I want to implement a cache partitioning in our kernel based on cache coloring.
I want to learn more about how the physical address is changed to be mapped to a specific cache line.
You can (for an example), have a separate pool of free pages for each different cache colour. In this case when you allocate a page for a certain virtual address you'd determine which cache colour you want from the virtual address, then allocate a physical page from the pool of free pages for that corresponds to that page colour. When you free a page you'd use it's physical address to determine which pool of free pages the page should be added to. I normally do this with free page stacks - e.g. for 256 cache colours I'll have 256 free page stacks (with no searching needed).
You can (for an example), have one pool of free pages for all cache colours too. In this case when you allocate a page for a certain virtual address you'd search for a page with the corresponding cache colour. This is slower as it involves searching (but doesn't have problems in less common cases where the physical address matters - e.g. physical pages used for ISA DMA).
You can combine both these techniques. For example, for 256 cache colours you might have 256 separate free page stacks for memory above 0x01000000, plus one free page bitmap for memory below 0x01000000.
In both of these cases (one free page pool per cache colour and one free page pool for all cache colours) if you can't find a free physical page for the desired cache colour you'd allocate a "not so good" page with the wrong cache colour; because a slight performance loss is better than returning an "out of memory" error. This means that it might be worthwhile to have some sort of "memory optimiser" - for example, imagine a low priority kernel thread that checks which pages are being used by each process and (if it can) replaces "less good" pages with a better pages. This is optional (I'd probably only worry about doing this if you're checking which pages are being used by each process for other reasons).
The hard part is detecting how many cache colours you need - you need to know cache size and associativity, and different CPUs use different methods to return this information via. CPUID (and some don't provide this information directly and you have to use the "vendor, family, model" and lookup tables). The basic idea is; for each cache, divide the cache size by the cache's associativity and the page size to get the number of cache colours you'd want for that cache; then do that for each cache and find the highest number of cache colours. For example, if the CPU has a 256 KiB 4-way associative L1 data cache and a 1 MiB 8-way associative L2 data cache, then you'd want 16 cache colours for the L1 data cache and 32 cache colours for the L2 data cache; so you'd use 32 cache colours (as it's higher).
Once you know how many cache colours you want you'd round it up to the nearest power of 2 (e.g. if you want 48 cache colours you'd round up to 64 cache colours), and clamp it to a sane range (e.g. if you want 32768 cache colours, maybe limit it to 1024 cache colours). Then you want to use the resulting number of cache colours to calculate a shift value and a mask, such that:
Code: Select all
cache_colour = (physical_address >> cache_colour_shift) & cache_colour_mask;
To find the cache colour from a virtual address the formula should be a little different. You could use the same formula; but processes tend to use some virtual addresses more than others (for example, you might have 100 different processes that all have code starting at 0x00010000) which means that you'd run out of physical pages for some cache colours faster than others. To avoid that, you want to give each process a different "cache colour bias". That way (for example) virtual address 0x00010000 for one process might be a different cache colour than virtual address 0x00010000 for a different process. The basic idea is:
Code: Select all
cache_colour = ( (virtual_address >> cache_colour_shift) ^ process_cache_colour_bias) & cache_colour_mask;
Code: Select all
cache_colour = ( (virtual_address >> cache_colour_shift) ^ process_ID) & cache_colour_mask;
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Any Page/Cache coloring implementation available?
Hi Brendan,
Thank you for your detailed answer.
The paging mode is the 32-bit paging. In this paging mode, 32-bit logical addresses are translating into 32-bit physical addresses. For L3 cache's point of view, the 0-5 bits are the offset used to access a cache line, 6-18-bits are the associative set number (8192 sets) and the rest is the tag. For the physical memory, the 0-12 bits are the page offset and the rest is the physical page number. The OS cannot change the first 6 bits of the associative set number because they coincide with the page offset, but it can control the next 7 bits (2^7 = 128 colors) that coincides with the physical page number.
Please, correct me if I am wrong. I will try to explain my implementation.
Our OS has a heap for the application and a heap for the system. I created X different heaps for the applications, where X is at most 128. I divided the total heap size by the number of colors as follows:
Then, I changed the memory allocator to allocate a page from the correct colored heap and to remap a logical page to the correct colored physical page. Something like this:
The remap method is:
What do you guys think about it?
Giovani
Thank you for your detailed answer.
This is exactly what I thought. I am using an intel i7-2600 processor, with 8MB L3 cache, 16-way set associative and 64 bytes per cache line, which gives 128 colors (I am not interested in the L2 cache for now).You can't change a page's physical addresses.
You can (for an example), have a separate pool of free pages for each different cache colour. In this case when you allocate a page for a certain virtual address you'd determine which cache colour you want from the virtual address, then allocate a physical page from the pool of free pages for that corresponds to that page colour. When you free a page you'd use it's physical address to determine which pool of free pages the page should be added to. I normally do this with free page stacks - e.g. for 256 cache colours I'll have 256 free page stacks (with no searching needed).
The paging mode is the 32-bit paging. In this paging mode, 32-bit logical addresses are translating into 32-bit physical addresses. For L3 cache's point of view, the 0-5 bits are the offset used to access a cache line, 6-18-bits are the associative set number (8192 sets) and the rest is the tag. For the physical memory, the 0-12 bits are the page offset and the rest is the physical page number. The OS cannot change the first 6 bits of the associative set number because they coincide with the page offset, but it can control the next 7 bits (2^7 = 128 colors) that coincides with the physical page number.
Please, correct me if I am wrong. I will try to explain my implementation.
Our OS has a heap for the application and a heap for the system. I created X different heaps for the applications, where X is at most 128. I divided the total heap size by the number of colors as follows:
Code: Select all
for(int i = 0; i < colors; i++) {
Application::heap(i)->free( MMU::alloc(MMU::pages(size)) , size);
}
Code: Select all
void *malloc(bytes, color) {
void *rtn = allocated_from_colored_heap(bytes, color);
// remap a logical page to the colored physical address, if the bytes are greater than 4KB, repeat for all pages
Log_Addr laddr(rtn);
Phy_Addr paddr = MMU::physical(laddr); //translate a logical address to physical address
paddr = MMU::align_page(paddr); // get the beginning address of the page
paddr = (paddr & 0xFFF00FFF) | (color << 12); //set the color
MMU::map_page_log_to_phy(laddr, paddr); //remap the logical address
return rtn;
}
Code: Select all
static void map_page_log_to_phy(Log_Addr laddr, Phy_Addr paddr) {
Page_Directory * pd = current();
Page_Table * pt = (*pd)[directory(laddr)];
pt->remap_log_to_phy(laddr, paddr, IA32_Flags::APP);
flush_tlb();
}
void remap_log_to_phy(Log_Addr laddr, Phy_Addr paddr, IA32_Flags flags) {
_entry[page(laddr)] = paddr | flags; //_entry is the page table array
}
Giovani