Intermittent bit flip in driver syscall
Posted: Mon Feb 08, 2021 6:15 am
Hi!
I am experiencing a strange issue in which adding a no-op, or changing a return type definition, makes the issue disappear. I originally suspected a race condition, but changing the function signature's return type leads me to think something else is going on.
I have the following syscall interface that device drivers use:
her an interrupt that must be serviced, or an IPC message
// Returns true if the call returned due to an interrupt needing servicing,
// or false if the call returned due to an IPC message arriving
bool adi_event_await(uint32_t irq) {
// ...
tasking_block_task(driver->task, IRQ_WAIT | AMC_AWAIT_MESSAGE);
// We're now unblocked
task_state_t unblock_reason = task->blocked_info.unblock_reason;
// Make sure this was an event we're expecting
assert(unblock_reason == IRQ_AWAIT || unblock_reason == AMC_AWAIT_MESSAGE, "ADI driver awoke for unknown reason");
return unblock_reason == IRQ_AWAIT;
}[/code]
When the issue triggers, this function returns the inverse of the correct value: "unblock_reason" indicates AMC_AWAIT_MESSAGE instead of IRQ_AWAIT.
However, if I make very slight changes to the code, the issue disappears:
* If I print "unblock_reason" before returning, the issue disappears
* If I check the PID of the running process and do a no-op, the issue disappears
* If I change the return value from "bool" to "uint32_t", the issue disappears
* If I run in a debugger, the issue disappears
* If I change the code to explicitly return true or false based on the value, instead of taking the result of the equality, the issue disappears
To be sure, I checked the assembly generated for the return statement, and it looks perfectly sane:
I know this is probably some strange interaction in my system, but wanted to post here in case anyone might have an inkling of what could be going on. Thanks in advance!
I am experiencing a strange issue in which adding a no-op, or changing a return type definition, makes the issue disappear. I originally suspected a race condition, but changing the function signature's return type leads me to think something else is going on.
I have the following syscall interface that device drivers use:
Code: Select all
// Block until an event is received
// An event will be eit[code]
// Returns true if the call returned due to an interrupt needing servicing,
// or false if the call returned due to an IPC message arriving
bool adi_event_await(uint32_t irq) {
// ...
tasking_block_task(driver->task, IRQ_WAIT | AMC_AWAIT_MESSAGE);
// We're now unblocked
task_state_t unblock_reason = task->blocked_info.unblock_reason;
// Make sure this was an event we're expecting
assert(unblock_reason == IRQ_AWAIT || unblock_reason == AMC_AWAIT_MESSAGE, "ADI driver awoke for unknown reason");
return unblock_reason == IRQ_AWAIT;
}[/code]
When the issue triggers, this function returns the inverse of the correct value: "unblock_reason" indicates AMC_AWAIT_MESSAGE instead of IRQ_AWAIT.
However, if I make very slight changes to the code, the issue disappears:
* If I print "unblock_reason" before returning, the issue disappears
* If I check the PID of the running process and do a no-op, the issue disappears
* If I change the return value from "bool" to "uint32_t", the issue disappears
* If I run in a debugger, the issue disappears
* If I change the code to explicitly return true or false based on the value, instead of taking the result of the equality, the issue disappears
To be sure, I checked the assembly generated for the return statement, and it looks perfectly sane:
Code: Select all
cmp dword [ss:ebp+var_14], 0x100
sete al