Page 1 of 2

[SOLVED] Unexpected behaviour

Posted: Wed Jun 20, 2012 1:27 am
by nop
Hi to everyone!

This innocent code takes the value of the CR2 register and puts it in the provided variable.

Code: Select all

static inline void store_cr2(u64_t *cr2)
{
    bochs_bp();
    __asm__ __volatile__(
            "movq %%cr2,%%rax\n\t"
            :"=A"(*cr2));
}

As you can easily guess this is invoked in my do_page_fault ISR.
What I didn't expect was is that, in long mode, only the low four bytes are copied and the remaining are all set.

To be more explicit if the faulting address is 0xc0010000 what is copied in RAX is 0xFFFFFFFFc0010000.
Just think what happens to my mmap routines when they want to get the PML4 offset of the linear address: in the case above they get 511 rather than 0.
This gave me headaches because I looked elsewhere for days to find where the problem was.
In long mode the CR2 is 64 bits wide but the internal debugger of Bochs shows it 32 bits wide while CPU is running in long mode.
What am I missing?? I'm cross compiling in a 32 bit environment with gcc 4.x

Thank you in advance!

Teo

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 2:04 am
by iansjack
It may just be me, but I find it a bit confusing that you are using "cr2" as both a register name and a variable name. What happens if you use:

Code: Select all

static inline void store_cr2(long *cr2val)
{
 	__asm__ __volatile__("movq %%cr2,%%rax;"
            "movq %%rax, %0;"
            :"=m"(*cr2val));
}
instead?

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 3:21 am
by qw
The problem is in the "=A". The destination is somewhere in memory, so it must be "=m" (as in iansjack's code).

BTW this is a bit more efficient:

Code: Select all

static inline uint64_t get_cr2(void)
{
    uint64_t rval;
    __asm__ __volatile__ ("movq %%cr2, %0" : "=a" (rval));
    return rval;
}

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 3:37 am
by gerryg400
You also need to clobber something to stop the function from being re-ordered when its called, like this

Code: Select all

static __inline__ unsigned long rd_cr2() {

    unsigned long rval;
    __asm__ __volatile__ ("mov %%cr2, %0" : "=r" (rval) : : "memory");
    return rval;
}
This one works in both 32bit and 64bit programmes.

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 4:41 am
by qw
Gerryg400, in this case that is not necessary, because the asm instruction has side effects: "=r" (rval).

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 4:44 am
by gerryg400
Ooops, yes, true.

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 6:25 pm
by jbemmel
You are seeing canonical 64-bit addresses: http://en.wikipedia.org/wiki/X86-64#Can ... _addresses

x86-64 mode currently does not use all 64 available bits; the upper bits are required to be set to 1 (in order to keep OS designers from using them for internal purposes, hindering future expansion)

Re: Unexpected behaviour

Posted: Wed Jun 20, 2012 11:52 pm
by nop
Thanks for all your prompt replies!
Hobbes wrote:The problem is in the "=A". The destination is somewhere in memory, so it must be "=m" (as in iansjack's code).

BTW this is a bit more efficient:

Code: Select all

static inline uint64_t get_cr2(void)
{
    uint64_t rval;
    __asm__ __volatile__ ("movq %%cr2, %0" : "=a" (rval));
    return rval;
}
This is exactly my first implementation (if I'm not wrong it's also in the asm inline example of the wiki), the code I posted in the opening of the thread was a (wrong) variant - I was exploring the problem looking for a solution: the "=A" was because I wanted to make sure to use a 64 bits register.
jbemmel wrote:You are seeing canonical 64-bit addresses: http://en.wikipedia.org/wiki/X86-64#Can ... _addresses

x86-64 mode currently does not use all 64 available bits; the upper bits are required to be set to 1 (in order to keep OS designers from using them for internal purposes, hindering future expansion)
I thought the same, but actually I think this isn't the case: the faulting address doesn't have the 47th bit set, is in the lower half; however I'm thinking of a quirk that I'm missing related to this

To recap: the following code

Code: Select all

static inline u64_t get_cr2()
{
	u64_t val = 0;

	bochs_bp(); // Bochs magic breakpoint
	__asm__ __volatile__(
			"nop\n\t"
			"movq %%cr2,%0\n\t"
			"nop\n\t"
			:"=r"(val));
	return val;
}
moves the value 0xFFFFFFFFc0010000 to the destination register chosen (R12 in my case) when CR2 contains 0xc0010000.
Is this a standard behaviour? I don't know what to think, both bochs and qemu behave the same way and my code fails to track the page directory table of the faulting address to make the proper checks - I have to dump physical address (puke)
What happens in your long mode OS when you pagefault in the lower half?
In any way I'm taking again the Intel manuals in my hands - if it is a standard behaviour (as it probably is) shame on me for having missed this for all that time

Thanks again!

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 12:37 am
by iansjack
Look at the code that is actually generated by the variants. That way you can see what is really happening and which is the most efficient implementation. Personally I do this sort of thing as an assembler routine rather than using inline assembler. I can assure you that "mov %cr2, %rax" does exactly what you would expect it to.

It might also help to single-step the assembler code in a debugger to see exactly what is happeing to the various memory locations and registers. This will give a better understanding of the result you are getting.

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 1:32 am
by qw
nop wrote:This is exactly my first implementation (if I'm not wrong it's also in the asm inline example of the wiki), the code I posted in the opening of the thread was a (wrong) variant - I was exploring the problem looking for a solution: the "=A" was because I wanted to make sure to use a 64 bits register.
"=A" does not mean RAX but RDX:RAX (or EDX:EAX in 32-bits code). If you want to force the use of RAX, use "=a" instead of "=r". However, for optimization, you'd better leave the decision what register is used to the compiler.

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 1:58 am
by iansjack
Just as a matter of interest, here is the code (omitting the "bochs_bp()" call) generated by the OP version:

Code: Select all

000000000040049c <store_cr2>:
  40049c:	55                   	push   %rbp
  40049d:	48 89 e5             	mov    %rsp,%rbp
  4004a0:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  4004a4:	0f 20 d0             	mov    %cr2,%rax
  4004a7:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  4004ab:	48 89 10             	mov    %rdx,(%rax)
  4004ae:	5d                   	pop    %rbp
  4004af:	c3
and here is the version produced by the version I suggested:

Code: Select all

000000000040049c <store_cr2>:
  40049c:	55                   	push   %rbp
  40049d:	48 89 e5             	mov    %rsp,%rbp
  4004a0:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  4004a4:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
  4004a8:	0f 20 d0             	mov    %cr2,%rax
  4004ab:	48 89 00             	mov    %rax,(%rax)
  4004ae:	5d                   	pop    %rbp
  4004af:	c3
The original version is ignoring what it read from %cr2 and instead is putting the value of %rdx, on entry to the routine, into the input parameter which is not what is required. (You'd have to inspect the code at a deeper level to know what this value actually is.) My version puts the value of %cr2 into the parameter which is what you want.

Moral - if inline assembler isn't working as intended inspect the generated code.

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 3:26 am
by gerryg400
nop wrote:What happens in your long mode OS when you pagefault in the lower half?
In my OS cr2 contains the expected value (i.e. 0xc0010000). It really looks like you are moving cr2 to the EAX register and that it's being sign extended. As iansjack said look at the asm.

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 5:03 am
by iansjack
Actually, examining the code more carefully, my example doesn't give the correct result either. :oops:

It should be:

Code: Select all

static inline void store_cr2(long *cr2val)
{
   __asm__ __volatile__("movq %%cr2,%%rax;"
            "movq %%rax, %0;"
            :"=m"(*cr2val)
            :
            :"rax");
}
producing the assembler

Code: Select all

000000000040049c <store_cr2>:
  40049c:	55                   	push   %rbp
  40049d:	48 89 e5             	mov    %rsp,%rbp
  4004a0:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)
  4004a4:	48 8b 55 f8          	mov    -0x8(%rbp),%rdx
  4004a8:	0f 20 d0             	mov    %cr2,%rax
  4004ab:	48 89 02             	mov    %rax,(%rdx)
  4004ae:	5d                   	pop    %rbp
  4004af:	c3                   	retq   
which, at last, gives the desired result.

Morals, and red faces, all round!

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 6:33 am
by Combuster
Hobbes wrote:However, for optimization, you'd better leave the decision what register is used to the compiler.
For optimisation, return the value instead of writing an argument by reference. Eliminates bad pointer possibilities for free as well. :wink:

Re: Unexpected behaviour

Posted: Thu Jun 21, 2012 6:44 am
by iansjack
I agree with your second point, but optimisation? (Not that it was you who raised the subject of optimisation.) Why worry about optimisation in a function like this that is only going to be called occasionally. In fact, why make it inline. I would just write a simple assembler function:

Code: Select all

get_cr2:
   mov %cr2, %rax
   ret
and leave it at that. You may have the overhead of a function call (but you've eliminated a lot of unnecessary instructions), but so what? You'll declare the function in a header file as returning a long value, so the compiler knows that %rax is going to be clobbered.