Keeping up compiler optimizations

NickJohnson · Post by **NickJohnson** » Thu Aug 13, 2009 1:14 pm

So far in writing my system (mostly but not exclusively the kernel), I've been trying to make sure everything is safe for GCC to optimize with "-fomit-frame-pointer -Os" flags. Until recently, it's been working, but now it breaks randomly unless I set it to "-fomit-frame-pointer -O0". The optimizations at least make bootup faster, even though it's still fractions of a second on a 1MHz emulated i586. I believe I can fix it with some significant work; if I stop trying to keep it optimization safe, it will probably never be optimization safe again. Do you think it's worth it to try and get optimizations working, or will it cause more trouble than it's worth in the end?

Matthew · Post by **Matthew** » Thu Aug 13, 2009 1:32 pm

If optimization causes your kernel to crash then you are using some feature incorrectly. I think you should find it important to ensure that you are using GCC correctly. I know it can be frustrating to debug these problems, but let me tell you, -O0 can hide a lot of problems with inline asm. I ran into a lot of trouble myself because of incorrectly specified clobber-lists. This kind of problem doesn't show when -O0 is enabled because there is no function inlining and therefore the incorrect asm is often protected by register saving due to the x86 C calling conventions. Also it does not help that many of the examples of GCC inline asm available on the Internet are out-of-date and do not work correctly anymore. I had to learn this the hard way.

Don't be afraid to disassemble your functions and find out if GCC is really doing what you think it should be doing.

manonthemoon · Post by **manonthemoon** » Thu Aug 13, 2009 1:49 pm

This article: http://codingrelic.geekhold.com/2008/03 ... n-and.html.

It discusses how the optimizer causes bugs to become more noticeable. If turning up optimization causes problems, you need to fix them instead of just turning off optimization. The bugs are still there, just harder to notice.

I personally use -O2 when compiling. It caused a few annoying bugs when I first tried it, but fixing them makes the code more reliable.

NickJohnson wrote:Do you think it's worth it to try and get optimizations working, or will it cause more trouble than it's worth in the end?

Just the opposite, it's more trouble to ignore. As I've already said, the bugs are still there, you just don't notice until the optimizer starts taking shortcuts (as described in the article I linked to).

NickJohnson · Post by **NickJohnson** » Thu Aug 13, 2009 1:59 pm

Interesting, I guess I'll keep trying to fix it with optimizations enabled. About the inline asm difficulties - is it safe to not specify a clobber list if no general purpose registers are modified? This is the extent of the inline asm I have:

Code: Select all

asm volatile ("cli");
asm volatile ("hlt");
asm volatile ("outb %1, %0" : : "dN" (port), "a" (val));
asm volatile("inb %1, %0" : "=a" (ret) : "dN" (port));
asm volatile ("mov %0, %%cr3" :: "r" (map));
asm volatile ("invlpg %0" :: "m" (target));
asm volatile ("invlpg %0" :: "m" (page));
asm volatile("mov %%cr3, %0" : "=r" (cr3));
asm volatile("mov %0, %%cr3" : : "r" (cr3));
asm volatile ("sti");
asm volatile ("hlt");
asm volatile ("movl %%cr2, %0" : "=r" (cr2));
asm volatile ("sti");
asm volatile ("hlt");

Additionally, is it a problem that I don't use enter/leave or equivalent in assembly functions that are called from C? Mostly those functions just do some simple exclusively in-register transformation of arguments and then return.

Craze Frog · Post by **Craze Frog** » Thu Aug 13, 2009 3:10 pm

enter/leave are useless instructions. They are not needed, ever.

NickJohnson · Post by **NickJohnson** » Thu Aug 13, 2009 3:11 pm

Craze Frog wrote:enter/leave are useless instructions. They are not needed, ever.

I mean the saving of the stack frame in general.

manonthemoon · Post by **manonthemoon** » Thu Aug 13, 2009 3:17 pm

Saving the stack frame shouldn't be necessary for a simple function. However, the C calling convention requires that certain registers are preserved within a function call. They are EBP, EBX, ESI, EDI, and ESP. So if you modify EBX, that could be a problem.

As far as the inline assembly, I sure hope there isn't a problem with it because that's pretty much how mine looks

.

Craze Frog · Post by **Craze Frog** » Thu Aug 13, 2009 3:24 pm

NickJohnson wrote:
Craze Frog wrote:enter/leave are useless instructions. They are not needed, ever.
I mean the saving of the stack frame in general.

Stack frames pointers in a register are totally useless except for some cases of debugging. (PS. You can find out which function was currently executing when the program crashes without doing a stack trace.)

You should really know this before doing OSdev, it is basic knowledge of assembly and calling conventions.

NickJohnson · Post by **NickJohnson** » Thu Aug 13, 2009 3:39 pm

manonthemoon wrote:However, the C calling convention requires that certain registers are preserved within a function call. They are EBP, EBX, ESI, EDI, and ESP. So if you modify EBX, that could be a problem.

Facepalm. Turns out that was actually the entire optimization problem to begin with. I replaced my memcpy() and memset() functions with assembly ones that use rep movsb, but didn't preserve ESI or EDI in either. All of the functions using memcpy() or memset(), which is a ton of them, had a couple of their variables randomly modified to large addresses. I thought the calling convention only preserved ESP, EBP, and EBX. Should've looked that one up - I was just about to write some usermode system call stubs without preserving EDI/ESI!

I guess my C code is flawless then, because it works at -O3 now - that's a big relief.

Matthew · Post by **Matthew** » Thu Aug 13, 2009 8:37 pm

Sounds like exactly the same issue I came across. These are my working versions of memset/memcpy:

Code: Select all

static inline void *
memset (void *p, int ch, uint32 cb)
{
  asm volatile ("cld; rep stosb"
                :"=D" (p), "=a" (ch), "=c" (cb)
                :"0" (p), "1" (ch), "2" (cb)
                :"memory","flags");
  return p;
}

static inline void *
memcpy (void *pDest, const void *pSrc, uint32 cb)
{
  asm volatile ("cld; rep movsb"
                :"=c" (cb), "=D" (pDest), "=S" (pSrc)
                :"0" (cb), "1" (pDest), "2" (pSrc)
                :"memory","flags");
  return pDest;
}

Modern GCC does not allow you to specify a register on both the inputs and the clobber list. For whatever reason, you have to treat the register as an in-out operand. That is why, for example, you see "=c" (cb) as an output, and "0" (cb) as an input referring to the output parameter.

Though I should point out that while REP STOS is always supposed to be the fastest way to memset, the Intel docs do indicate various trade-offs (startup costs, etc) when using REP MOVS that may make it less efficient for some cases.

Personally, I relegated that bit of knowledge to the "really premature optimization" bin.

Clobbers on my inline asm syscall stubs was the other place I ran into troubles -- because there was no logic to enforce the C calling convention, there was nothing stopping a syscall from clobbering EBX and therefore when the call was inlined, I would get page faults on seemingly arbitrary addresses.

Putting a correct clobber list on every syscall stub cured that ill.

NickJohnson · Post by **NickJohnson** » Fri Aug 14, 2009 10:53 am

Hmm... now I am getting a couple of problems that appear only *without* optimizations. Here's a more interesting question - if I work to support -O3 and -Os, should I also work to support -O0? Obviously there is a bug (and probably a small one), and I want my code to be right, so I'm going to fix it, but what would good general advice be? Because once you support -Os or -O3, you'll never want to go back to -O0, right?

Edit: actually, it's not just without optimizations, but more specifically without optimizations while using a beta version of TinyCC, but the point still holds.

Matthew · Post by **Matthew** » Fri Aug 14, 2009 11:30 am

You should ensure that your kernel compiles under all safe optimization settings -- otherwise you are violating some invariant that will come back to bite you later. -O0 through -O3 and -Os should all be "safe" ie. they are semantics-preserving transformations of the program. Of course, the more optimizations you enable, the more likely that compiler bugs are going to manifest. You shouldn't assume compiler bugs are at fault when you have trouble ("beta versions of tiny CC" notwithstanding) but it can happen.

-O0 is meant to make debugging easier. If you are having trouble at this level of optimization, get your debugger out and go at it.

Solar · Post by **Solar** » Mon Aug 17, 2009 5:29 am

Don't think of the different optimization settings as "problems" or "settings to support". Correct code should compile no matter what. Code that doesn't compile that way is buggy, no matter if you'd actually want to use that particular compiler option or not.

The optimization settings are merely what's needed to expose the bugs that are already there.

"Correct code" is not a matter of "it works for all my test cases" or "unless I use XYZ". It's a pure, unrelated-to-environment, feature of a piece of code to be correct. Difficult to attain in non-trivial projects, but the highest order of achievement in coding.

OSDev.org

Keeping up compiler optimizations

Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations

Re: Keeping up compiler optimizations