Callback Handlers vs. GCC optimizer

jbanes · Post by **jbanes** » Fri Jan 03, 2014 3:48 am

I've based my ISR/IRQ setup on the Jame Molley tutorial and thus have a callback system in place that exposes a structure of register values that have been pushed on to the stack. These registers get popped back off as soon as control returns to assembly code.

This particular function:

Code: Select all

 
void key_press(registers_t regs)
{
   terminal_writestring("\nKey Code: ");
   terminal_writeNumber(inb(0x60));
}

...has the distinct displeasure of having its parameters optimized away due to them not being in use. This results in a stack underflow when the handler returns to assembly, thus resulting in the wrong Data Segment getting popped, thus resulting in (you guessed it!) a General Protection Fault.

For the moment, I've solved the problem with a few attribute flags:

Code: Select all

__attribute((optimize(0)))  
void key_press(__attribute((unused)) registers_t regs)
{
   terminal_writestring("\nKey Code: ");
   terminal_writeNumber(inb(0x60));
}

The flags turn off the optimizer for this function and stop the compiler from whining about the unused variable. This works. However...

My question to you lovely folks is: Is this the correct way to solve this problem?

I can't help but feel that this shouldn't be happening. It seems like the compiler should know better than to underflow the stack. Of course, there are a couple factors I see that make this a more interesting case:

The uber-stack structure is being passed as a copy rather than a pointer. I can't help but think a pointer would fair better.
The C compiler isn't aware of the assembly messing with the stack just below it. It may think it's cleaning house correctly.

Any thoughts?

zhiayang · Post by **zhiayang** » Fri Jan 03, 2014 10:33 am

From my own experience, there's a couple of ways to solve this:

1. As you have done, add a couple of attributes to the function. Although I don't think turning off optimisation is required -- what if you have a somewhat more complex key handling function? AFAIK doing an __attribute__((unused)) should be enough.

2. You could make an UNUSED(x) macro that would do this: either use __attribute__((unused)) or do ((void)x). This should 'trick' the compiler into thinking that you've used the function, and should stop it optimising it away.

IMO the 'correct' way to solve this is to reduce the dependency on the global handler as much as possible. If you don't need all the overhead of having the entire context passed to your handler (like a keyboard handler, or a IRQ8/IRQ0 handler), you skip the global stub system those tutorials use, write a couple of lines of asm (doing 'call' and 'iret') and you can skip the hassle.

Owen · Post by **Owen** » Fri Jan 03, 2014 11:03 am

Your problem is exactly that you're passing the parameter by value. Remember the rules that apply in this case:

The contents of that by-value parameter are entirely the callee's domain. The compiler is free to do whatever it wants with that parameter space as long as the callee's code doesn't observe it.
Struct parameters exceeding a certain defined size are passed by reference (and copy) anyway

Pass the parameter by reference. You're violating the ABI; of course everything is breaking.

If the optimizer is breaking things, 99/100 times it's an indication that your code isn't following the ABI or is relying on undefined behavior.

sortie · Post by **sortie** » Fri Jan 03, 2014 4:11 pm

Yes, the problem certainly isn't optimization. Disregard bad advise like requimrar's (various attributes and unused variables are not the solution) and listen to Owen.

Optimization is not the issue here and there's an easy way to realize this: Assume the big struct parameter is unused and the compiler somehow optimized it away. Effectively that changes the function prototype to accept no parameters. This means that all calls to the function must also know the prototype was changed. The function has external linkage, which allows assembly and other translation units to call the function, however such different translation units cannot possibly know that this optimization was used. The result will be that normal C code calling the function would malfunction because the prototype was changed. In other words, optimizing away the parameter and effectively changing the prototype is an unsafe optimization. We will assume the compiler is correct and therefore this will never happen - therefore optimization is not the problem.

The problem is rather that your assembly doesn't follow the ABI. This should be apparent if you objdump -d the relevant functions of your kernel and notice the key_press function corrupts the 'regs' value on the stack. Indeed, the i386 ABI (I'm assuming that platform) ABI says that values on the stack belong to the called function. It is allowed to modify them as it wishes and the contents of the stack parameters and scratch registers are undefined followed the function call. This means that your assembly is invoking undefined behaviour by using whatever values are on the stack following the function call. This fits with what you are experiencing as your operating system crashes.

The easiest solution is to duplicate the registers structure on the stack if you really wish to give the called function a copy of the registers according to call-by-value semantics. More likely, you simply wish to give it access to the current register values, so perhaps you should pass it a pointer instead. Mind you that the pointer would also be subject to the same rules and will be trashed after the function call completes, but it should be fairly easy to recover the location of the register's structure as the stack pointer is preserved by functions according to the ABI.

<strike-through>@requimrar: You really should be more careful with the term 'correct solution' as what you are proposing really isn't. This is just a work-around that happens to work, but doesn't solve the underlying problem or satisfactorily explain why the undesirable behaviour happens.</strike-through>

jbanes · Post by **jbanes** » Fri Jan 03, 2014 5:42 pm

My hat goes off to you gentlemen! I had a feeling that what the tutorial was doing was incorrect. Now I even have a far better sense of how to restructure it. Thank you very much!

@requimrar - FWIW, the unused attribute only muffles the compiler warning. As long as the compiler is allowed to optimize the code (and the parameter goes unused) the code will fail. I know because I tried!

Brendan · Post by **Brendan** » Fri Jan 03, 2014 5:48 pm

Hi,

sortie wrote:
requimrar wrote:IMO the 'correct' way to solve this is to reduce the dependency on the global handler as much as possible. If you don't need all the overhead of having the entire context passed to your handler (like a keyboard handler, or a IRQ8/IRQ0 handler), you skip the global stub system those tutorials use, write a couple of lines of asm (doing 'call' and 'iret') and you can skip the hassle.
@requimrar: You really should be more careful with the term 'correct solution' as what you are proposing really isn't. This is just a work-around that happens to work, but doesn't solve the underlying problem or satisfactorily explain why the undesirable behaviour happens.

I agree with requimrar. There is no sane reason for an IRQ handler to need access to the interrupted code's state in the first place, and the correct solution (for IRQs only) is to use assembly stubs designed for IRQ handlers (e.g. either "void key_press(void) { ... }" or "void key_press(int IRQnumber) { ... }" depending on where the code to send EOI is). Also note that for this case the assembly stub shouldn't need to save/restore "callee preserved" registers or push a useless dummy error code.

Of course for exception handlers and the kernel API (where the interrupt handler does need access to the interrupted code's state), Owen and sortie are also correct and the tutorial should be passing a "pointer to struct".

Cheers,

Brendan

Gigasoft · Post by **Gigasoft** » Fri Jan 03, 2014 6:10 pm

For exceptions and system calls, to correctly get at the trap frame in GCC without having to pass by reference, use __builtin_frame_address(0) + 8.

sortie · Post by **sortie** » Fri Jan 03, 2014 8:30 pm

Gigasoft wrote:For exceptions and system calls, to correctly get at the trap frame in GCC without having to pass by reference, use __builtin_frame_address(0) + 8.

Yeah, that probably works. It's a very dirty hack, though and it begs to break. It's much easier to just pass a pointer and there is no way that breaks.

@requimrar, Brendan: Sorry, I think I misunderstood what requimrar was saying. I too agree with that, I think.

ApproximateIdentity · Post by **ApproximateIdentity** » Sat Jan 04, 2014 3:35 pm

I don't have anything intelligent to add to this, but just wanted to thank all of you for the great responses in this thread. I posted about the same issues a couple days ago here and Owen solved my problem, but seeing the slightly expanded discussion here helped me really understand this better. Thanks a lot everyone!

Owen · Post by **Owen** » Sat Jan 04, 2014 4:42 pm

I knew I'd just answered that question. I just couldn't remember what the topic was called...

qw · Post by qw » Mon Jan 06, 2014 2:03 pm

I fail to see how parameters may be optimized away. The caller pushes the parameters on the stack, and the caller also restores the stack. It is the caller that keeps the stack balanced regardless of prototyping and optimization.

Rew · Post by **Rew** » Mon Jan 06, 2014 3:59 pm

Hobbes wrote:I fail to see how parameters may be optimized away. The caller pushes the parameters on the stack, and the caller also restores the stack. It is the caller that keeps the stack balanced regardless of prototyping and optimization.

The problem isn't that parameters can be optimized away, it is that parameter stack space belongs to the callee and stack space usage can be optimized. GCC sees that the space for the parameter "regs" that is already allocated on the stack by the caller is unused (no usage of "regs" parameter). GCC knows that this stack space belongs to the callee (key_press) and uses it for anything else that it wants to in the function (including compiler declared temporary local variables). This means that the caller's expectation that the register values pushed onto the stack and passed to a function can simply be popped off the stack is incorrect. It is not an unbalanced stack, but incorrect register values that are popped off the stack after call that is the issue. Turning off optimizations was simply telling GCC to allocate more stack space for temporary data instead of reusing unused parameter space. This type of optimization is very common. Compilers frequently perform flow analysis to reuse stackspace for user declared or compiler declared local variables

By pushing the registers onto the stack and then pushing a pointer to the function, GCC may still end up trashing the unused pointer but will not trash the underlying data.

qw · Post by qw » Mon Jan 06, 2014 4:03 pm

Rew wrote:This means that the caller's expectation that the register values pushed onto the stack and passed to a function can simply be popped off the stack is incorrect.

I have never seen an optimization like that. Why would the caller expect such a thing? This stack space belongs to the callee anyway.

Edit: the only cause I can think of, is that the function is called with too few parameters, and that it accesses the memory where the parameters are supposed to be.

Rew · Post by **Rew** » Mon Jan 06, 2014 4:34 pm

The caller in this case is user written assembly code. Specifically, there are some tutorials that do something like the following. (simplified to illustrate the problem)

Code: Select all

[EXTERN IRQ_HANDLER_WRITTEN_IN_C]
IRQ_STUB:
pusha
....
call IRQ_HANDLER_WRITTEN_IN_C
...
popa
iret

Code: Select all

typedef struct registers
{
   u32int edi, esi, ebp, esp, ebx, edx, ecx, eax; // Pushed by pusha.
   u32int eip, cs, eflags, useresp, ss; // Pushed by the processor automatically.
} registers_t;

void IRQ_HANDLER_WRITTEN_IN_C(registers regs)
{
}

Written this way, pushed registers by the assembly based caller are in the parameter space of the callee and then popped off after the callee had the opportunity to modify them. Additionally, registers pushed as part of the interrupt by the processor are now part of C callee stack space. As you said, you've never seen a caller do that and it is not sensible for the caller to expect that given the calling convention being used in this case. The real issue is that the caller (programmer written assembly) is not following the same ABI as GCC. The assembly code may be perfectly valid given a different calling convention, however in this case the assembly code is incorrect. Any "fix" in C code in this case is simply covering up the problem that the assembly code is not following the ABI.

Fixing it in assembly involves either copying the values before passing or simply pass by reference. It is also possible to change design and come up with a different function definition that both assembly and C agree on (as suggested by most in here).

qw · Post by qw » Tue Jan 07, 2014 4:17 am

Rew, that's perfectly clear, but it is not what I would call "optimizing the parameters away".

OSDev.org

Callback Handlers vs. GCC optimizer

Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer

Re: Callback Handlers vs. GCC optimizer