OSDev.org

Posted: **Fri Apr 13, 2007 7:25 pm**

I've been fighting with a problem where my delay() function loops indefinitely. Finally I turned to objdump, and I think I've found a bug in the optimizer that GCC 4.1.2 uses. My assembler really isn't all that great, so I'd appreciate it if someone could look this over for me.

Here's my C function:

Code: Select all

void delay( unsigned long ticks )
{
	unsigned long eticks;

	eticks = tick_counter + ticks;

	while ( tick_counter < eticks );
	
	return;
}

Here's the output of objdump -S for that function, after compiling with:

-g -W -Wall -fno-builtin -nostdinc -nostdlib -nostartfiles -nodefaultlibs

Code: Select all

void delay( unsigned long ticks )
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 10                sub    $0x10,%esp
        unsigned long eticks;

        eticks = tick_counter + ticks;
   6:   a1 00 00 00 00          mov    0x0,%eax
   b:   03 45 08                add    0x8(%ebp),%eax
   e:   89 45 fc                mov    %eax,0xfffffffc(%ebp)

        while ( tick_counter < eticks );
  11:   a1 00 00 00 00          mov    0x0,%eax
  16:   3b 45 fc                cmp    0xfffffffc(%ebp),%eax
  19:   72 f6                   jb     11 <delay+0x11>
        
        return;
}
  1b:   c9                      leave  
  1c:   c3                      ret

Here's the output of objdump -S when adding -O1:

Code: Select all

void delay( unsigned long ticks )
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
        unsigned long eticks;

        eticks = tick_counter + ticks;
   3:   8b 15 00 00 00 00       mov    0x0,%edx

        while ( tick_counter < eticks );
   9:   89 d0                   mov    %edx,%eax
   b:   03 45 08                add    0x8(%ebp),%eax
   e:   39 c2                   cmp    %eax,%edx
  10:   73 02                   jae    14 <delay+0x14>
  12:   eb fe                   jmp    12 <delay+0x12>
        
        return;
}
  14:   5d                      pop    %ebp
  15:   c3                      ret

And finally, the output of objdump -S after compiling with -O2:

Code: Select all

void delay( unsigned long ticks )
{
   0:   55                      push   %ebp
        unsigned long eticks;

        eticks = tick_counter + ticks;
   1:   8b 15 00 00 00 00       mov    0x0,%edx
   7:   89 e5                   mov    %esp,%ebp

        while ( tick_counter < eticks );
   9:   8b 4d 08                mov    0x8(%ebp),%ecx
   c:   89 d0                   mov    %edx,%eax
   e:   01 c8                   add    %ecx,%eax
  10:   39 c2                   cmp    %eax,%edx
  12:   73 02                   jae    16 <delay+0x16>
  14:   eb fe                   jmp    14 <delay+0x14>
        
        return;
}
  16:   5d                      pop    %ebp
  17:   c3                      ret    
  18:   90                      nop    
  19:   8d b4 26 00 00 00 00    lea    0x0(%esi),%esi

If I'm being stupid, don't be afraid to point it out and tell me I don't know what I'm doing.

Posted: **Fri Apr 13, 2007 8:29 pm**

I am guessing this from my ten minutes of research into Optimization many years ago... so don't take my word. I think the problem may be that the code assumes it is impssoble for the variable to change during execution and therefore optimizes it so it never checks if the variable has changed, knowing it won't have. To get round it there is a storage attribute you have to assign to the variable, i think it is "volatile" but check the documentation.

Posted: **Fri Apr 13, 2007 9:07 pm**

How is tick_counter declared? I ask this because if you don't have it declared as volatile then the compiler will optimize it into a register and will never change (unless the state of the register changes on it's own)

Posted: **Sat Apr 14, 2007 10:35 am**

Indeed, pre GCC3 you could expect the above code work even if tick_counter wasn't volatile, but from GCC3 onwards (or could have been some GCC3 minor version.. can't remember exactly), the optimizer caches it in a register if unless told not to (by declaring the variable volatile).

Volatile basicly says "any time code reads this variable, it should actually read it, and every time code writes this variable, it should actually write it, and volatile reads/writes should not be reordered with regards to each other, because I'm accessing my volatile variables from several threads (or ISRs or whatever) at the same time".

Posted: **Sat Apr 14, 2007 10:59 am**

mystran wrote:Indeed, pre GCC3 you could expect the above code work even if tick_counter wasn't volatile, but from GCC3 onwards (or could have been some GCC3 minor version.. can't remember exactly), the optimizer caches it in a register if unless told not to (by declaring the variable volatile).

Volatile basicly says "any time code reads this variable, it should actually read it, and every time code writes this variable, it should actually write it, and volatile reads/writes should not be reordered with regards to each other, because I'm accessing my volatile variables from several threads (or ISRs or whatever) at the same time".

Small adjustment to the definition: "This variable may change without you changing it, so everytime I try to read this, you must actually read it from the location I specify (implicitly: don't reorder around it since it could change the value I read or cause things to be too late or soon)". Writes always need to be done and you can't reorder the write since you have to read this thing anyway.

The main difference is that if you directly use a memory map to a device, it's best to declare it volatile as well since the device may also change it.

Posted: **Sat Apr 14, 2007 11:09 am**

Candy wrote: The main difference is that if you directly use a memory map to a device, it's best to declare it volatile as well since the device may also change it.

Yeah indeed. But the main point holds: every read/write must be done for real, and in order (with regards to other volatile accesses).

Posted: **Sat Apr 14, 2007 4:49 pm**

Thanks for the help on this.. I did think about using volatile briefly, but didn't try it before. I'm use to the behavior of GCC 3.3.6, where it worked fine without declaring tick_counter as volatile. Once I declared it volatile in 4.1.2, my delay function started working again. I had thought that since tick_counter is a global, the optimizer wouldn't need the volatile, but I guess I was wrong.

Again thanks for the help!

Posted: **Sat Apr 14, 2007 5:36 pm**

quok wrote:Thanks for the help on this.. I did think about using volatile briefly, but didn't try it before. I'm use to the behavior of GCC 3.3.6, where it worked fine without declaring tick_counter as volatile. Once I declared it volatile in 4.1.2, my delay function started working again. I had thought that since tick_counter is a global, the optimizer wouldn't need the volatile, but I guess I was wrong.

I remember having similar problems back when I ported some code from 2.95.3 to 3.1.x (or something), but it could be GCC 3.3.6 still had some special case check like "don't optimize if the loop looks like it's polling something" while a more advanced new optimizer in 4.1.2 doesn't make such assumptions...

Posted: **Tue Apr 17, 2007 2:18 pm**

mystran wrote:
quok wrote:Thanks for the help on this.. I did think about using volatile briefly, but didn't try it before. I'm use to the behavior of GCC 3.3.6, where it worked fine without declaring tick_counter as volatile. Once I declared it volatile in 4.1.2, my delay function started working again. I had thought that since tick_counter is a global, the optimizer wouldn't need the volatile, but I guess I was wrong.
I remember having similar problems back when I ported some code from 2.95.3 to 3.1.x (or something), but it could be GCC 3.3.6 still had some special case check like "don't optimize if the loop looks like it's polling something" while a more advanced new optimizer in 4.1.2 doesn't make such assumptions...

The first time GCC updated something that broke old code I wasn't really happy - then I figured, if I'd written the code the way it was supposed to have been, it would still work. I've started coding it like that and the only thing that broke for me was the switch from 3.x to 4.x, which changed some previously -Wall -Wextra -pedantic code into only -Wall -Wextra code. That's good then, so it's quite good enough.

Which is why I keep telling everybody to turn on as many warnings as you can and to fix them.

Small sidenote - we accelerated an application over a factor of 2 today using just volatile - if you have a limited poll loop that is retried, it'll become faster if you make the variable volatile.

OSDev.org

Bug in GCC 4.1.2 optimizer? [Solved]

Bug in GCC 4.1.2 optimizer? [Solved]