I am experiencing a strange issue in which adding a no-op, or changing a return type definition, makes the issue disappear. I originally suspected a race condition, but changing the function signature's return type leads me to think something else is going on.
I have the following syscall interface that device drivers use:
Code: Select all
// Block until an event is received
// An event will be eit[code]
// Returns true if the call returned due to an interrupt needing servicing,
// or false if the call returned due to an IPC message arriving
bool adi_event_await(uint32_t irq) {
// ...
tasking_block_task(driver->task, IRQ_WAIT | AMC_AWAIT_MESSAGE);
// We're now unblocked
task_state_t unblock_reason = task->blocked_info.unblock_reason;
// Make sure this was an event we're expecting
assert(unblock_reason == IRQ_AWAIT || unblock_reason == AMC_AWAIT_MESSAGE, "ADI driver awoke for unknown reason");
return unblock_reason == IRQ_AWAIT;
}[/code]
When the issue triggers, this function returns the inverse of the correct value: "unblock_reason" indicates AMC_AWAIT_MESSAGE instead of IRQ_AWAIT.
However, if I make very slight changes to the code, the issue disappears:
* If I print "unblock_reason" before returning, the issue disappears
* If I check the PID of the running process and do a no-op, the issue disappears
* If I change the return value from "bool" to "uint32_t", the issue disappears
* If I run in a debugger, the issue disappears
* If I change the code to explicitly return true or false based on the value, instead of taking the result of the equality, the issue disappears
To be sure, I checked the assembly generated for the return statement, and it looks perfectly sane:
Code: Select all
cmp dword [ss:ebp+var_14], 0x100
sete al