Page 1 of 2

Two ways of flushing TLB in IA32 and two different behaviors

Posted: Thu Jul 05, 2012 3:38 pm
by giovanig
Hi guys,

IA32 processors have two ways of flushing TLB. Moving the CR3 value to itself:
ASMV("movl %cr3,%eax");
ASMV("movl %eax,%cr3");
and using invlpg instruction:
ASMV("invlpg %0" : : "m"(addr));
I am remapping some logical pages in the page table to different physical pages. After remapping, I must flush the TLB.

My test application allocates 10 char arrays of 4KB each, remaps the physical pages (it is a page coloring mechanism), and initializes the first position of the array with 'A' + (array number from 1 to 10). Then, it prints the first character.

When I use the first method of flushing the TLB, the character printed is not 'A', it is a completely strange character. On the other hand, when I use the invlpg instruction, it correctly prints 'A'.

Does anyone have any clue about this behavior? The processor is an intel i7-2600.

Best regards,
Giovani

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Thu Jul 05, 2012 4:19 pm
by NickJohnson
Are you properly alerting the compiler that eax is being trashed by your inline assembly?

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Thu Jul 05, 2012 4:45 pm
by giovanig
Are you properly alerting the compiler that eax is being trashed by your inline assembly?
What you mean by trashing the eax? ASMV is a macro to __asm__ __volatile__. The method that flushes the TLB is the following:

Code: Select all

#define ASMV  __asm__ __volatile__
static void flush_tlb() {
        ASMV("movl %cr3,%eax");
        ASMV("movl %eax,%cr3");
}
It is what the intel's manual says. Maybe, the problem can be related to my remap mechanism. A description is found here: http://forum.osdev.org/viewtopic.php?f=1&t=25496

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Thu Jul 05, 2012 7:41 pm
by jbemmel
Are you setting the "global" bit in your PTEs?

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Thu Jul 05, 2012 10:36 pm
by bluemoon
giovanig wrote:

Code: Select all

ASMV("movl %cr3,%eax");
ASMV("movl %eax,%cr3");
and using invlpg instruction:

Code: Select all

ASMV("invlpg %0" : : "m"(addr));
The syntax look like you're using gcc.
As noted by NickJohnson, you need to tell compiler you clobber eax on first code.
Furthermore you need to tell compiler you clobber memory in both code.
Check the gcc manual for exact syntax of clobber list.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 1:28 am
by Owen
  • Firstly, in fact, that whole first code segment is invalid. It is entirely legal for the compiler to have emitted something like this...

    Code: Select all

    #line "yourfile.c" 23
        mov %cr3, %eax
    #line /*Mark this as a compiler generated section */
        mov 8(%esp), %eax
    #line "yourfile.c" 24
        mov %eax, %cr3
    
  • Secondly, assuming "addr" is a pointer, the "m"(addr) specifier will generate a reference to the address of that pointer. You probably wanted "m"(*addr)
  • Finally, in neither case are you informing the compiler that a region of memory is (from its point of view) being modified. You have two options here:
    • You can use the "memory" clobber, in which case the compiler will assume that all memory could have changed and reload all pointers
    • You can do something like

      Code: Select all

      struct PageSize { char contents[4096]; };
      ...
      PageSize* p = (PageSize*) addr;
      asm volatile("invlpg %0" : "+m"(*p));
      
      The "+m" specifier tells GCC that the inline assembly both "reads and writes to" *p. Because *p is 4096 bytes in size, it assumes that whole region is/may be modified, and so this will ensure that writes and reads are not incorrectly reorganized.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 8:30 am
by giovanig
Thank you guys for the answers. I made some changes as you pointed out, but the problem remains.
Are you setting the "global" bit in your PTEs?
Yes. A PTE has the following bits set: 0 (present), 1 (read/write), 2 (user access), 5 (accessed), and 8 (global).
As noted by NickJohnson, you need to tell compiler you clobber eax on first code.
Furthermore you need to tell compiler you clobber memory in both code.
I changed the flush_tlb code to:

Code: Select all

ASMV("movl %%cr3, %%eax \n"
          "movl %%eax, %%cr3 \n"
           : : : "eax", "memory");
Running the system with this code, I got the same result as before, a wrong printed character instead of 'A', and the strange thing is that if I print the 9 left characters, they are corrected.
Secondly, assuming "addr" is a pointer, the "m"(addr) specifier will generate a reference to the address of that pointer. You probably wanted "m"(*addr)
Addr is already the logical address that points to a page, not a pointer.
The "+m" specifier tells GCC that the inline assembly both "reads and writes to" *p. Because *p is 4096 bytes in size, it assumes that whole region is/may be modified, and so this will ensure that writes and reads are not incorrectly reorganized.
I tried to change the ASMV, but the g++ has complained about the +m, maybe because addr is not a pointer. I just added the "memory" option the in the clobbered list, to avoid optimizer reordering access before invlpg:

Code: Select all

ASMV("invlpg %0" : : "m"(addr) : "memory");
I am not sure if invlpg is really invalidating a TLB entry, since it behaves differently from flush_tlb.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 9:03 am
by giovanig
I made another test to understand what is happening. I print the value of the first character in the loop that allocates memory:

Code: Select all

char *array[10];
for(int i = 0; i < SIZE; i++) {
    if(i == 0) {
    array[i] = new (COLOR_2) char[4092]; //alllocates 4096 bytes (there are more 4 bytes used by the heap list) and remaps a page
    array[i][0] = 'A' + i;
    cout << array[i][0]; //prints 'A' correct
    } else {
      array[i] = new (COLOR_2) char[4092];
      array[i][0] = 'A' + i;
      cout << " " << array[0][0]; // prints 'A' 8 times, but the last one is wrong
    }
  }
For some reason, in the 10th time that the code remaps a page, the value of 'A' changes.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 9:12 am
by bluemoon
giovanig wrote:For some reason, in the 10th time that the code remaps a page, the value of 'A' changes.
And what did it change to? 'A'+i or random garbage?
It look like a bug in the heap allocator. If you are not sure, replace your page invalidate function with a hardcoded "mov cr3" and see if problem still there.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 9:29 am
by giovanig
Hi,
And what did it change to? 'A'+i or random garbage?
It changes to a random garbage value. When "invlpg" is used, this does not happen.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 9:43 am
by Owen
giovanig wrote:Thank you guys for the answers. I made some changes as you pointed out, but the problem remains.
Are you setting the "global" bit in your PTEs?
Yes. A PTE has the following bits set: 0 (present), 1 (read/write), 2 (user access), 5 (accessed), and 8 (global).
Hang on...

you have the global bit set... and you're still wondering why reloading CR3 isn't causing that page to be flushed from the TLB?

Go and reread the paging section of the manual (in particular the definition of the global bit) again, please...

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 9:45 am
by jbemmel
giovanig wrote:Yes. A PTE has the following bits set: 0 (present), 1 (read/write), 2 (user access), 5 (accessed), and 8 (global).
Well, then that explains why your mov from/to cr3 does not flush these mappings.

"Global" means "do not flush unless explicitly requested through invlpg". So if you want these pages to be flushed when you change cr3, you should clear the global bit in their PTEs

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 10:16 am
by giovanig
Well, a quick test, removing the GLOBAL bit, did not change anything. I will investigate the memory allocator, if there is a bug there.

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Fri Jul 06, 2012 4:20 pm
by Combuster
And changing the G bit in memory doesn't change the G bit in the TLB, which means still no flush...

Re: Two ways of flushing TLB in IA32 and two different behav

Posted: Sat Mar 23, 2013 9:15 am
by sv75
(deleted)