Bug in GCC 4.1.2 optimizer? [Solved]

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
quok
Member
Member
Posts: 490
Joined: Wed Oct 18, 2006 10:43 pm
Location: Kansas City, KS, USA

Bug in GCC 4.1.2 optimizer? [Solved]

Post by quok »

I've been fighting with a problem where my delay() function loops indefinitely. Finally I turned to objdump, and I think I've found a bug in the optimizer that GCC 4.1.2 uses. My assembler really isn't all that great, so I'd appreciate it if someone could look this over for me. :)

Here's my C function:

Code: Select all

void delay( unsigned long ticks )
{
	unsigned long eticks;

	eticks = tick_counter + ticks;

	while ( tick_counter < eticks );
	
	return;
}
Here's the output of objdump -S for that function, after compiling with:
-g -W -Wall -fno-builtin -nostdinc -nostdlib -nostartfiles -nodefaultlibs

Code: Select all

void delay( unsigned long ticks )
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 10                sub    $0x10,%esp
        unsigned long eticks;

        eticks = tick_counter + ticks;
   6:   a1 00 00 00 00          mov    0x0,%eax
   b:   03 45 08                add    0x8(%ebp),%eax
   e:   89 45 fc                mov    %eax,0xfffffffc(%ebp)

        while ( tick_counter < eticks );
  11:   a1 00 00 00 00          mov    0x0,%eax
  16:   3b 45 fc                cmp    0xfffffffc(%ebp),%eax
  19:   72 f6                   jb     11 <delay+0x11>
        
        return;
}
  1b:   c9                      leave  
  1c:   c3                      ret    
Here's the output of objdump -S when adding -O1:

Code: Select all

void delay( unsigned long ticks )
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
        unsigned long eticks;

        eticks = tick_counter + ticks;
   3:   8b 15 00 00 00 00       mov    0x0,%edx

        while ( tick_counter < eticks );
   9:   89 d0                   mov    %edx,%eax
   b:   03 45 08                add    0x8(%ebp),%eax
   e:   39 c2                   cmp    %eax,%edx
  10:   73 02                   jae    14 <delay+0x14>
  12:   eb fe                   jmp    12 <delay+0x12>
        
        return;
}
  14:   5d                      pop    %ebp
  15:   c3                      ret
And finally, the output of objdump -S after compiling with -O2:

Code: Select all

void delay( unsigned long ticks )
{
   0:   55                      push   %ebp
        unsigned long eticks;

        eticks = tick_counter + ticks;
   1:   8b 15 00 00 00 00       mov    0x0,%edx
   7:   89 e5                   mov    %esp,%ebp

        while ( tick_counter < eticks );
   9:   8b 4d 08                mov    0x8(%ebp),%ecx
   c:   89 d0                   mov    %edx,%eax
   e:   01 c8                   add    %ecx,%eax
  10:   39 c2                   cmp    %eax,%edx
  12:   73 02                   jae    16 <delay+0x16>
  14:   eb fe                   jmp    14 <delay+0x14>
        
        return;
}
  16:   5d                      pop    %ebp
  17:   c3                      ret    
  18:   90                      nop    
  19:   8d b4 26 00 00 00 00    lea    0x0(%esi),%esi
If I'm being stupid, don't be afraid to point it out and tell me I don't know what I'm doing. :)
Last edited by quok on Sat Apr 14, 2007 4:50 pm, edited 1 time in total.
Tyler
Member
Member
Posts: 514
Joined: Tue Nov 07, 2006 7:37 am
Location: York, England

Post by Tyler »

I am guessing this from my ten minutes of research into Optimization many years ago... so don't take my word. I think the problem may be that the code assumes it is impssoble for the variable to change during execution and therefore optimizes it so it never checks if the variable has changed, knowing it won't have. To get round it there is a storage attribute you have to assign to the variable, i think it is "volatile" but check the documentation.
User avatar
B.E
Member
Member
Posts: 275
Joined: Sat Oct 21, 2006 5:29 pm
Location: Brisbane Australia
Contact:

Post by B.E »

How is tick_counter declared? I ask this because if you don't have it declared as volatile then the compiler will optimize it into a register and will never change (unless the state of the register changes on it's own)
Image
Microsoft: "let everyone run after us. We'll just INNOV~1"
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Indeed, pre GCC3 you could expect the above code work even if tick_counter wasn't volatile, but from GCC3 onwards (or could have been some GCC3 minor version.. can't remember exactly), the optimizer caches it in a register if unless told not to (by declaring the variable volatile).

Volatile basicly says "any time code reads this variable, it should actually read it, and every time code writes this variable, it should actually write it, and volatile reads/writes should not be reordered with regards to each other, because I'm accessing my volatile variables from several threads (or ISRs or whatever) at the same time".
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

mystran wrote:Indeed, pre GCC3 you could expect the above code work even if tick_counter wasn't volatile, but from GCC3 onwards (or could have been some GCC3 minor version.. can't remember exactly), the optimizer caches it in a register if unless told not to (by declaring the variable volatile).

Volatile basicly says "any time code reads this variable, it should actually read it, and every time code writes this variable, it should actually write it, and volatile reads/writes should not be reordered with regards to each other, because I'm accessing my volatile variables from several threads (or ISRs or whatever) at the same time".
Small adjustment to the definition: "This variable may change without you changing it, so everytime I try to read this, you must actually read it from the location I specify (implicitly: don't reorder around it since it could change the value I read or cause things to be too late or soon)". Writes always need to be done and you can't reorder the write since you have to read this thing anyway.

The main difference is that if you directly use a memory map to a device, it's best to declare it volatile as well since the device may also change it.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

Candy wrote: The main difference is that if you directly use a memory map to a device, it's best to declare it volatile as well since the device may also change it.
Yeah indeed. But the main point holds: every read/write must be done for real, and in order (with regards to other volatile accesses).
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
quok
Member
Member
Posts: 490
Joined: Wed Oct 18, 2006 10:43 pm
Location: Kansas City, KS, USA

Post by quok »

Thanks for the help on this.. I did think about using volatile briefly, but didn't try it before. I'm use to the behavior of GCC 3.3.6, where it worked fine without declaring tick_counter as volatile. Once I declared it volatile in 4.1.2, my delay function started working again. I had thought that since tick_counter is a global, the optimizer wouldn't need the volatile, but I guess I was wrong.

Again thanks for the help!
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

quok wrote:Thanks for the help on this.. I did think about using volatile briefly, but didn't try it before. I'm use to the behavior of GCC 3.3.6, where it worked fine without declaring tick_counter as volatile. Once I declared it volatile in 4.1.2, my delay function started working again. I had thought that since tick_counter is a global, the optimizer wouldn't need the volatile, but I guess I was wrong.
I remember having similar problems back when I ported some code from 2.95.3 to 3.1.x (or something), but it could be GCC 3.3.6 still had some special case check like "don't optimize if the loop looks like it's polling something" while a more advanced new optimizer in 4.1.2 doesn't make such assumptions... :)
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

mystran wrote:
quok wrote:Thanks for the help on this.. I did think about using volatile briefly, but didn't try it before. I'm use to the behavior of GCC 3.3.6, where it worked fine without declaring tick_counter as volatile. Once I declared it volatile in 4.1.2, my delay function started working again. I had thought that since tick_counter is a global, the optimizer wouldn't need the volatile, but I guess I was wrong.
I remember having similar problems back when I ported some code from 2.95.3 to 3.1.x (or something), but it could be GCC 3.3.6 still had some special case check like "don't optimize if the loop looks like it's polling something" while a more advanced new optimizer in 4.1.2 doesn't make such assumptions... :)
The first time GCC updated something that broke old code I wasn't really happy - then I figured, if I'd written the code the way it was supposed to have been, it would still work. I've started coding it like that and the only thing that broke for me was the switch from 3.x to 4.x, which changed some previously -Wall -Wextra -pedantic code into only -Wall -Wextra code. That's good then, so it's quite good enough.

Which is why I keep telling everybody to turn on as many warnings as you can and to fix them.

Small sidenote - we accelerated an application over a factor of 2 today using just volatile - if you have a limited poll loop that is retried, it'll become faster if you make the variable volatile.
Post Reply