OSDev.org

Posted: **Fri Apr 03, 2009 8:00 am**

Yeah, and my point is that since gcc 4.0 they say, that gcc doesn´t change the direction flag and I set it in my assembly stub!

I got 2 solutions from the gcc mailinglist, but both are not really "good". I mean both work around the problem that gcc doesn´t know what it is doing

Solution 1:

Code: Select all

static inline void zeromem4b(uint32t *dst, uint32t count) {
	uint32t tmp1, tmp2;
	
	asm volatile("xor %%eax,%%eax\n\trep stosl"
		:"=c"(tmp1),"=D"(tmp2)
		:"0"(count),"1"(dst)
		:"%eax", "cc", "memory");
}

Solution 2:

Code: Select all

static inline void zeromem4b(uint32t *dst, uint32t count) {
	uint32t tmp1, tmp2;
	
	asm volatile("xor %%eax,%%eax\n\trep stosl"
		:"+c"(count),"+D"(dst)
		:
		:"%eax", "cc", "memory");
}

The last one is also silly because (I think so) it is changeing the values of count and dst.

Posted: **Fri Apr 03, 2009 8:28 am**

Actually, is there any reason why you're not implementing these in straight assembly? You'd have far more control if the entire function was in assembly rather than using "asm volatile".

Posted: **Fri Apr 03, 2009 8:52 am**

Yes, the reason is inlining, this is code where I don´t need the calling overhead!

Posted: **Fri Apr 03, 2009 9:45 am**

FlashBurn wrote:Yeah, and my point is that since gcc 4.0 they say, that gcc doesn´t change the direction flag and I set it in my assembly stub!

It's in versions since GCC 4.3.0 actually that GCC no longer outputs the cld instruction before any inlined memory function. The SysV ABI assumes that the direction flag has been cleared at the entry of every function. In versions of GCC before 4.3.0, GCC output the cld instruction to make sure of this, but that is no longer the case. You SHOULD clear the direction flag yourself before doing things like memset (or in your case, zeromem4b) to ensure that you do memory operations in the proper direction. If in your bootloader you clear DF and never ever ever set DF, you may be safe. But these are not privileged instructions; userspace is free to use set and clear DF as it pleases.

My larger point here is that you are writing a kernel. You are free to define your own ABI. Even if you don't and are using an existing one (like the SysV ABI), you cannot rely on the compiler to completely implement the ABI all by itself, especially when using such things as inline assembly.

Solution 1:

Code: Select all

static inline void zeromem4b(uint32t *dst, uint32t count) {
	uint32t tmp1, tmp2;
	
	asm volatile("xor %%eax,%%eax\n\trep stosl"
		:"=c"(tmp1),"=D"(tmp2)
		:"0"(count),"1"(dst)
		:"%eax", "cc", "memory");
}

What's the point in saving ecx and edi to two variables that are otherwise never used? I considered doing that, but It's a waste of memory and time. It doesn't really tell GCC anything that it already didn't know. That is, it knows the edi register is already clobbered by your asm because it is using edi as an input register! That's why you cannot specify edi as a clobber as well. As I pointed out in my previous post (and as is pointed out in the URL I posted), the use of "cc" and "memory" are a good idea as well. You should use the memory clobber because you're writing to a variable (that is, edi). Without the memory clobber, GCC knows that the EDI register itself has changed (because it was used as an input), but it doesn't know that a memory location was changed. The memory clobber fixes that.

I'm not going to bother explaining the cc clobber again. I'm also still not entirely sure that you need the %eax clobber since memory clobbers all registers (according to the GCC docs).

Solution 2:
Code: Select all
static inline void zeromem4b(uint32t *dst, uint32t count) {
	uint32t tmp1, tmp2;
	
	asm volatile("xor %%eax,%%eax\n\trep stosl"
		:"+c"(count),"+D"(dst)
		:
		:"%eax", "cc", "memory");
}
The last one is also silly because (I think so) it is changeing the values of count and dst.

Technically, ecx and edi are both changed by either solution you posted. The "=" modifier signifies an operand is write only (an output), and the "+" modifier says the operand is both an input and an output.

You could further optimize your function by using the following code:

Code: Select all

static inline void zeromem4b(uint32t *dst, uint32t count) {
  asm volatile ( "cld; rep stosl"
    :
    : "a"(0), "c"(count), "D"(dst)
    : "cc", "memory" );
}

Note I used cld, because doing so is correct under the SysV ABI (especially for inlined functions!). I used "a"(0) because GCC may be able to arrange things so that zero is already in the EAX register, resulting in one less movl (or xor) during its optimization phase. That also means you don't have to specify "%eax" as a clobber, since GCC already knows it is clobbered (again, since it is used as an input, of which GCC is aware).

Hmm, in case you're not aware, the way GCC handles 'static inline' has changed in GCC 4.3.0 as compared to previous versions as well, if you're using C99.

Please spend some time reading the GCC manual. It goes in to all kinds of detail about inline assembly, and it seems clear to me inline assembly is at least somewhat confusing to you. Don't worry though, it's a syntactical nightmare, and a lot of people have problems with the finer points of it, myself included.

Posted: **Fri Apr 03, 2009 10:03 am**

Yes, I know that the manual says that gcc knows that all registers which are in the input and output list are clobbered, but the result (asm code) tells me that this is not the point! As I´ve written before, I can test this if I print out the value of a label which I used before with zeromem4b and the adress of the label has changed. If I use the 1st solution the adress is right and hasn´t changed. So in this point the manual is wrong or it is a bug in gcc.

I don´t like both of the solution, as I´ve said, because they are working around a bug (in my pov). If I would read such code I would ask me, why is he/she doing this, when the vars are never used!

This code is used in my bootloader and not my kernel, so it should be no problem to make some assumptions. I´m not at the point where I´m writing productive code for my kernel

I don´t like inline assembly, but sometimes it is better to use it to produce something faster.

Posted: **Fri Apr 03, 2009 6:02 pm**

Can you show us the generated assembly? Have you actually *seen* the generated assembly, or only used the print method to check it's correct? If you have seen it can we have a look at it, please?

Posted: **Fri Apr 03, 2009 11:59 pm**

And here we go.

This is the c code:

Code: Select all

	zeromem4b((uint32t *)&_bss_start,(&_loader_end - &_bss_start) >> 2);
	
	printf("bss start: %#X, loader end: %#X\n",&_bss_start,&_loader_end);

And asm code:

Code: Select all

	movl	$_loader_end, %ecx
	subl	$_bss_start, %ecx
	movl	$_bss_start, %edi
	sarl	$2, %ecx
/APP
/ 28 "../src/include/memory.h" 1
	xor %eax,%eax
	rep stosl
/ 0 "" 2
/NO_APP
	addl	$12, %esp
	andl	$255, %ebx
	pushl	$_loader_end
	pushl	%edi
	pushl	$.LC1
	call	printf

I only made copy and paste from my loader.c and hope that you believe me now

Posted: **Sat Apr 04, 2009 1:52 am**

I only made copy and paste from my loader.c and hope that you believe me now

I do

I can now see for certain assuming EDI is unchanged, which is what you were saying. I just wanted to make sure

Just to clarify something...

I mean both work around the problem that gcc doesn´t know what it is doing

No, it has no idea what it's doing in this case. That's why inline assembly can be such a pain to make work - you're telling GCC to insert your assembly, and it's expecting you to tell it what happens there so it can make the surrounding code work. If you don't tell it what happens correctly, you'll find yourself in a lot of trouble later on. It's not a GCC bug.

The "best" (as in, works relatively cleanly) solution is probably:

Code: Select all

static inline void zeromem4b(uint32t *dst, uint32t count) {
   asm volatile("xor %%eax,%%eax\n\trep stosl"
      :"+c"(count),"+D"(dst)
      :
      :"%eax", "cc", "memory");
}

With this test code in GCC 4.3.2 and Binutils 2.19:

Code: Select all

  int yey;
  zeromem4b(reinterpret_cast<uint32_t*>(&yey), 1234);

  *((uint32_t*) (&yey)) = 0x1234;

This code assembles to (offsets due to this being in a rather large function):

Code: Select all

     mov    0x10(%ebp),%esi
     lea    -0x10(%ebp),%edi
     mov    $0x4d2,%ecx
     xor    %eax,%eax
     rep stos %eax,%es:(%edi)
     movl   $0x1234,-0x10(%ebp)

I apologize for the AT&T syntax; however you can see that it is not assuming EDI is valid afterwards. I'm not sure what this will do in the context of your code, but it's worth a shot.

Could you please try this version of the function and see if the disassembly is what you're looking for? If not, then we can look into it further and figure out the optimal solution.

Posted: **Sat Apr 04, 2009 2:13 am**

Yes, this code works. But as I said, the manual of GCC says that GCC knows that all input and output registers are clobbered, so my version should also work and if it is not so, then GCC should let me say that %edi and %ecx will change!

Ok, I think I should let it be and use some kind of work around for it.

Edit::

I found something interesting, in every tutorial I find that I should say to gcc which regs are clobberd and there they also tell you that you have to say to gcc which registers change even if they are input registers! So this change that this doesn´t work can´t be happened to long ago.

OSDev.org

Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore

Re: Bochs working, Qemu not anymore