Page 1 of 1
Race Condition [SOLVED]
Posted: Mon Jan 26, 2015 8:41 pm
by jvc
I am working on wait queues right now in my kernel and am encountering some sort of race condition.
Essentially I have three process running, one prints out the process tree before sleeping for a second, and then repeating. Another continuously sets a condition variable to true and then calls the wakeup function. The final resets the condition variable before calling the wait_event function. That process also outputs a little test message every loop.
When I let it run, after a few seconds the output from the 3rd process stops. Sometimes the output from the debugging process, the first one mentioned, stops too. The two functions are shown below:
Code: Select all
void wait_event(sleep_queue_t* waitqueue, uint32_t* condition)
{
if (!waitqueue || !condition || !waitqueue->queue) // Passing bad pointers
return;
while (!*condition)
{
spin_lock(&waitqueue->lock);
list_append_node(waitqueue->queue, &currProc->sleepNode, currProc);
IRQ_OFF;
spin_unlock(&waitqueue->lock);
proc_setState(currProc, TASK_UNINTERRUPTIBLE);
if (!*condition) // If we get woken up between releasing the lock and the process state change
yield();
proc_setState(currProc, TASK_RUNNING);
IRQ_RES;
}
}
void wakeup_queue(sleep_queue_t* waitqueue)
{
if (!waitqueue || !waitqueue->queue)
return;
spin_lock(&waitqueue->lock);
list_node_t* wakeNode = list_dequeue(waitqueue->queue);
while (wakeNode)
{
process_t* wakeProc = (process_t*)wakeNode->data;
proc_setState(wakeProc, TASK_RUNNING);
if (!list_find(waitqueue->queue, wakeProc))
{
sched_queueRunnable(wakeProc);
}
wakeNode = list_dequeue(waitqueue->queue);
}
spin_unlock(&waitqueue->lock);
}
Jacob
Re: Race Condition
Posted: Tue Jan 27, 2015 6:13 am
by Kevin
I'm not really sure about your interrupt disabling. Holding a spinlock with interrupts enabled doesn't sound like a great idea to start with, but it doesn't seem to be a reason for a deadlock in your code. What is this yield() doing, especially with respect to the disabled interrupts?
Re: Race Condition
Posted: Tue Jan 27, 2015 8:33 am
by jvc
The yield function re-enables interrupts and then triggers a context switch. The spinlock calls the yield function if it can not be acquired.
Re: Race Condition
Posted: Tue Jan 27, 2015 3:26 pm
by gerryg400
jvc wrote:The spinlock calls the yield function if it can not be acquired.
That's not a spinlock. A task
spins if a spinlock cannot be acquired.
Re: Race Condition
Posted: Tue Jan 27, 2015 5:23 pm
by gardinirafael
Do you have any optimization flag in your makefile?
Because if condition variable is not volatile, your compiler will not know that pointer is changed by other task and will optimize wait_event function.
Your condition is never tested again and will produce a deadlock.
Pay atention in L4 label
Code: Select all
wait_event:
.LFB0:
.cfi_startproc
cmpl $0, (%rdi)
[i];code inside of condition while[/i]
jne .L1
.L4:
jmp .L4
Re: Race Condition
Posted: Tue Jan 27, 2015 5:49 pm
by gerryg400
Code: Select all
void wait_event(sleep_queue_t* waitqueue, uint32_t* condition)
{
if (!waitqueue || !condition || !waitqueue->queue) // Passing bad pointers
return;
while (!*condition)
{
spin_lock(&waitqueue->lock);
list_append_node(waitqueue->queue, &currProc->sleepNode, currProc);
// gg - at this point currProc is in the waitqueue
IRQ_OFF;
spin_unlock(&waitqueue->lock);
proc_setState(currProc, TASK_UNINTERRUPTIBLE);
// gg - let's suppose that *condition is no longer 0, we will skip the yield
if (!*condition) // If we get woken up between releasing the lock and the process state change
yield();
proc_setState(currProc, TASK_RUNNING);
// gg - at this point currProc is still in the waitqueue but now running -- Is that okay ?
IRQ_RES;
}
}
Please see my inline comments
Re: Race Condition
Posted: Tue Jan 27, 2015 7:03 pm
by jvc
gardinirafael wrote:Do you have any optimization flag in your makefile?
Because if condition variable is not volatile, your compiler will not know that pointer is changed by other task and will optimize wait_event function.
Your condition is never tested again and will produce a deadlock.
No I don't use volatile with the condition, didn't even think of it. I'll try that out.
gerryg400 wrote:That's not a spinlock. A task spins if a spinlock cannot be acquired
I realized the mistake in terminology after a bit of research.
In regards to your inline comments, I can see the design problems and I will look into it .
Thank you for the replies, I'm strongly considering redoing the way the scheduler handle the run/wait queues. I have an idea that will optimize it and hopefully make it less finicky.
Jacob
Re: Race Condition
Posted: Tue Jan 27, 2015 9:44 pm
by jvc
So I reworked part of my scheduler. It now uses a circular queue and the set_state function is responsible (if asked through a parameter) for adding processes to, or removing them from this queue. I replaced a bunch of the mutexes with "irq locks" which simply disable IRQs and only re-enable them if they were previously enabled, hence the parameter being passed in which stores this state.
The yield function no longer re-enables interrupts when called, it is expected that they are enabled already. Also, the condition I am passing in is not volatile.
It still doesn't seem to be working quite right. So I am still using three processes. The processes that call the wait and wake functions are allowed to run at "full speed," as in they set the condition variable, and then call their respective functions. The third process is in a loop that sleeps and then prints a short message. After 25 loops with 100ms of sleeping, the output to the screen locks up.
Here are the waiting and timed sleep functions:
Another thing: Would the check for the condition before the yield call in the wait function even be necessary? The wakeup function handles placing the process back on the run queue so that shouldn't create a race condition? I am having a bit of difficulty wrapping my head around this unfortunately.
Code: Select all
//Sleep
int msleep(uint32_t ms)
{
static volatile uint32_t msleep_eflags;
irq_lock(&msleep_eflags); // Wakeup function will not be called
currProc->ticks = timer_getTicks() + ms;
list_node_t* before = 0;
foreach (child, timedSleepQueue)
{
if (((process_t*)child->data)->ticks >= currProc->ticks)
{
break;
}
before = child;
}
list_insert_node_after(timedSleepQueue, before, currProc, &currProc->sleepNode);
proc_setState(currProc, TASK_INTERRUPTIBLE, 1);
irq_unlock(&msleep_eflags);
yield();
return 0;
}
void sched_wakeTimedSleep(uint64_t ticks) // Called from timer, aka IRQs are already off
{
if (timedSleepQueue->length == 0) // Don't bother if zero length
return;
if (ticks >= ((process_t*)timedSleepQueue->head->data)->ticks)
{
list_node_t* sleepNode = list_dequeue(timedSleepQueue);
process_t* sleeper = sleepNode->data;
sleeper->ticks = 0;
proc_setState(sleeper, TASK_RUNNING, 1);
}
}
// Wait queues
// Block until woken up, and then block again if condition is false
void wait_event(sleep_queue_t* waitqueue, volatile uint32_t* condition)
{
static volatile uint32_t wait_lock;
if (!waitqueue || !condition || !waitqueue->queue) // Passing bad pointers
return;
irq_lock(&wait_lock);
while (!*condition)
{
list_append_node(waitqueue->queue, &currProc->sleepNode, currProc);
proc_setState(currProc, TASK_UNINTERRUPTIBLE, 1);
irq_unlock(&wait_lock);
if (!*condition) // If we get woken up between releasing the lock and the process state change
yield();
// At this point, a wakeup call will have put the process back in the running state, and back on the run queue.
irq_lock(&wait_lock);
}
irq_unlock(&wait_lock);
}
void wakeup_queue(sleep_queue_t* waitqueue)
{
static volatile uint32_t wakeup_lock;
if (!waitqueue || !waitqueue->queue)
return;
irq_lock(&wakeup_lock);
list_node_t* wakeNode = list_dequeue(waitqueue->queue);
while (wakeNode)
{
process_t* wakeProc = (process_t*)wakeNode->data;
proc_setState(wakeProc, TASK_RUNNING, 1);
wakeNode = list_dequeue(waitqueue->queue);
}
irq_unlock(&wakeup_lock);
}
Jacob
Re: Race Condition [SOLVED]
Posted: Thu Jan 29, 2015 10:45 am
by jvc
I think I may have found it! It has to do with a very finicky situation the scheduler can set up under just the right circumstances where the wait process removes itself before itself from the run queue, which through the way the ring queue was set up would cause the next process in the queue to be skipped once. With three process in the right order, this can result in the same process being skipped every time, starving it of process time. It's quite a simple fix, just a simple change to one line in the ring queue deletion handler.
Jacob