GCC O2 'incorrectly' optimises my for loop

Programming, for all ages and all languages.
Post Reply
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

GCC O2 'incorrectly' optimises my for loop

Post by FrankRay78 »

Hello,

I've read the following thread: viewtopic.php?f=13&t=33499 and noted both the comments about 'a workman and his/her tools' and the need to provide a self-contained example. I've got a self-contained example (below) and I have also diagnosed the issue, but I still don't really understand why the optimisation is occurring as it does. I'd be interested in what other people may have to say about it.

Here are the methods to write a message to the debug port (in QEMU):

Code: Select all

void debug_writechar(char c)
{
    __asm__ ("movb %0, %%al\n\t"
            "outb %%al, $0xe9"
    :
    : "r" (c));
}

void debug(char* message)
{
    for (int i = 0; message[i] != 0; i++)
    {
        if (i == 0)
        {
            debug_writechar(' ');
        }

        debug_writechar(message[i]);
    }
}
The kernel is calling it as follows:

Code: Select all

debug("Monkey lives here");
debug("Snail lives here");
When I compile it with

Code: Select all

-ffreestanding -O2 -nostdlib -lgcc -Wall -Wextra -Werror -g -Wno-unused-parameter
I get the following output:

Code: Select all

qemu-system-i386 -kernel kernel.bin -no-reboot -no-shutdown -debugcon stdio
 M S
When I add the volatile keyword to the for loop, ie.

Code: Select all

for (volatile int i = 0; message[i] != 0; i++)
    {
        if (i == 0)
        {
            debug_writechar(' ');
        }

        debug_writechar(message[i]);
    }
And re-compile and run, I get the following output:

Code: Select all

qemu-system-i386 -kernel kernel.bin -no-reboot -no-shutdown -debugcon stdio
 Monkey lives here Snail lives here
I really don't understand why the volatile keyword is required, and without it, why the compiler is optimising out all but the first iteration of the for loop. Not understanding the reason is making me nervous about continuing to code with optimisations enabled, given it's changing the intention of my code (nb. I'm not blaming the tool, I'm acknowledging I don't understand the tool)


Furthermore, if I remove the volatile keyword and comment out the first loop branch:

Code: Select all

void debug(char* message)
{
    for (int i = 0; message[i] != 0; i++)
    {
        /*if (i == 0)
        {
            debug_writechar(' ');
        }*/

        debug_writechar(message[i]);
    }
}
The optimisation at O2 retains all the loops:

Code: Select all

qemu-system-i386 -kernel kernel.bin -no-reboot -no-shutdown -debugcon stdio
Monkey lives hereSnail lives here

PS. The main repo is here and my (scrappy) exploratory code for the above on the following branch.
Better software requirements can change the world. Better Software UK.
Octocontrabass
Member
Member
Posts: 5452
Joined: Mon Mar 25, 2013 7:01 pm

Re: GCC O2 'incorrectly' optimises my for loop

Post by Octocontrabass »

Your inline assembly changes AL without telling the compiler, and the optimized code is probably using AL for something else. Try this instead:

Code: Select all

asm( "outb %%al, $0xe9" :: "a"(c) );
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: GCC O2 'incorrectly' optimises my for loop

Post by FrankRay78 »

OMG, it worked Octocontrabass. Thank you.

I don't really understand the syntax differences, but it's clear yours informs the compiler.

I'll spend some time with https://godbolt.org/ I think, looking at what's happening under the covers.

Thanks again.
Better software requirements can change the world. Better Software UK.
FrankRay78
Posts: 22
Joined: Fri Jan 05, 2024 10:10 am

Re: GCC O2 'incorrectly' optimises my for loop

Post by FrankRay78 »

Closely reading this page https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended, I can see I fell into the trap of modifying the register without telling the compiler. Specifically, I note:
Different C compilers use different semantics regarding which registers the asm code can overwrite. gcc assumes no registers get modified. If your asm modifies registers without informing the compiler (which requires using extended asm), undefined behavior will result.
It also gives an example of it, here:
For another example, consider this ARM code:

Code: Select all

   __asm__ __volatile__ (
      "mov r0, #0x00\n\t"
      "vmsr fpscr, r0");
The code changes the r0 register without notifying the compiler. While this might be safe in some C compilers, it is incorrect for gcc.
Better software requirements can change the world. Better Software UK.
Post Reply