Do you use compiler optimizations?
Posted: Tue Feb 12, 2019 1:54 pm
Hi
Since I have finally enabled SSE in my bootloader, I played a bit with compiler optimizers. Here are my foundings:
First, I've used a known to be correct and perfectly working code (with "-O0") as a baseline. Then I've recompiled it with "-O2" and various optimizer flags and "-fsanitize=*" turning some features separately off. I have to say I wasn't pleased with the results, but maybe someone here knows the solution for one or more of my problems.
Incorrect resolving of struct field addresses
I map the first page as supervisor only, so that any user code that tries to dereference NULL will cause a page fault. This works great. But I didn't wanted to waste 4k, therefore I put some process related, for kernel's eye only variables there. Unfortunately both gcc and Clang miscompiles the following struct reference (where pid is not the first field in the struct, hence the accessed memory address is definitely not 0):and instead of a correct "mov rax, [8]" instruction, they generate an "UD" instruction. Thankfully there's an easy solution, I've added "-fno-delete-null-pointer-checks" to the compiler flags. Browsing the net revealed that this particular optimization messed up the Linux kernel too really badly (although for a different reason).
Bad code generation
Someting I cannot understand, that gcc generated bad, misaligned code. I've called a C function, and in it's function prologue (before the first instruction compiled from the first expression in the C code) it crashed. Debugging revealed that the faulty instruction was a "movaps [rsp], xmm0". Checking rsp it was 0xfffffffffffa8, which is not 16 bytes aligned. Not only SysV ABI expects 16 bytes stack alignment upon function calls, so does movaps. Now the problem with this is, that the programmer cannot influence the stack pointer from C directly, nor can he tell the compiler to use movups, so it is definitely the compiler's responsibility.
The only solution I could came up with was to add "-mno-sse", which quite defeats the whole purpose of my SSE optimization experiment. And this is the better part, as at least with this kind of errors I could see in run-time that the generated code was wrong.
Changed schematics
The most extremely annoying thing, that the optimizers changed the code's behaviour silently. No errors, no run-time faults, just the code does not do what the algorithm in the C source means. This should not happen under no circumstances IMHO. One of the problematic code was:Debugging revealed that the optimized code simply passed an incorrect address to the second kprintf() (used the wrong register). I'd like to point out that I've added the first kprintf() after I haven't seen the string printed, so it doesn't matter if there's a kprintf() before the 'if' statement or not. I'm really curoius what madness made gcc to optimize this code incorrectly in the 'else' block, especially when the same code was ok before the 'if'.
Another problematic code was in the loggerBelieve it or not, this only printed the first letter. Using objdump it turned out, that the entire loop was optimized away for some strange reason. I've tried too, didn't work. Finally with gcc generated the correct code, but Clang failed no matter what. Clearly the compilers mistaken that ptr haven't changed. But it has, so how on earth do you tell the compiler don't optimize away an important loop? Gcc at least supports __attribute__((optimize(0))) as a function attribute (which is not good because that removes all the other, potentially good optimizations from the function), but Clang doesn't know that attribute. Using "-f*loop*" is not an option, because I wanted to optimize all the other loops in my code except for this one.
Conclusion: I didn't feel I wanted to guess what part of my code's schematics will be changed silently next time, so for now I went back with "-O0" and manual optimization. Maybe it's just me, but I want my generated code to do as the C source says. I just don't want any "if(ptr!=NULL)" silently removed. And I obviously don't want broken, faulty code to be generated either.
What are your experience with gcc and Clang optimizers regarding kernel development?
bzt
Since I have finally enabled SSE in my bootloader, I played a bit with compiler optimizers. Here are my foundings:
First, I've used a known to be correct and perfectly working code (with "-O0") as a baseline. Then I've recompiled it with "-O2" and various optimizer flags and "-fsanitize=*" turning some features separately off. I have to say I wasn't pleased with the results, but maybe someone here knows the solution for one or more of my problems.
Incorrect resolving of struct field addresses
I map the first page as supervisor only, so that any user code that tries to dereference NULL will cause a page fault. This works great. But I didn't wanted to waste 4k, therefore I put some process related, for kernel's eye only variables there. Unfortunately both gcc and Clang miscompiles the following struct reference (where pid is not the first field in the struct, hence the accessed memory address is definitely not 0):
Code: Select all
p = *((proc_struct*)0)->pid
Bad code generation
Someting I cannot understand, that gcc generated bad, misaligned code. I've called a C function, and in it's function prologue (before the first instruction compiled from the first expression in the C code) it crashed. Debugging revealed that the faulty instruction was a "movaps [rsp], xmm0". Checking rsp it was 0xfffffffffffa8, which is not 16 bytes aligned. Not only SysV ABI expects 16 bytes stack alignment upon function calls, so does movaps. Now the problem with this is, that the programmer cannot influence the stack pointer from C directly, nor can he tell the compiler to use movups, so it is definitely the compiler's responsibility.
The only solution I could came up with was to add "-mno-sse", which quite defeats the whole purpose of my SSE optimization experiment. And this is the better part, as at least with this kind of errors I could see in run-time that the generated code was wrong.
Changed schematics
The most extremely annoying thing, that the optimizers changed the code's behaviour silently. No errors, no run-time faults, just the code does not do what the algorithm in the C source means. This should not happen under no circumstances IMHO. One of the problematic code was:
Code: Select all
void kpanic(char *reason, ...) {
va_list args;
va_start(reason, args);
char strbuf[128];
vsprintf(strbuf, reason, args);
kprintf("%s", strbuf); <--- this worked
if (debugger_enabled) {
debugger(strbuf);
} else {
kprintf("Panic: ");
kprintf("%s", strbuf); <--- this doesn't
...
Another problematic code was in the logger
Code: Select all
char *old = ptr;
...here are a bunch of sprintf()s to concatenate the formatted date and the message to ptr...
/* if debug console enabled, print the log message */
if (debugconsole_enabled) {
while (old<ptr)
debugconsole_putc(*old++);
}
Code: Select all
for (;old<ptr;ptr++)
Code: Select all
while(*old)
Conclusion: I didn't feel I wanted to guess what part of my code's schematics will be changed silently next time, so for now I went back with "-O0" and manual optimization. Maybe it's just me, but I want my generated code to do as the C source says. I just don't want any "if(ptr!=NULL)" silently removed. And I obviously don't want broken, faulty code to be generated either.
What are your experience with gcc and Clang optimizers regarding kernel development?
bzt