Page 1 of 1

mutexes causing stack overflow

Posted: Wed Mar 05, 2008 11:45 pm
by jrussel316
I've been working on implementing a small OS for the intel architecture, and am now attempting to implement mutexes and blocking. The mutexes seem to be working for the most part, but it seems that after repeated use they are causing my processes stack to overflow. Here's the code for the mutexes:

Code: Select all

void mutex_lock(mutex_t *m) 
{
	asm("cli");
	if(m->locked)
	{
		block(&(m->blockQueue));
	}
	m->locked = 1;
	asm("sti");
}


void mutex_unlock(mutex_t *m) 
{
	asm("cli");
	pcb_t pcb;
	m->locked = 0;
	if(deQueue(&(m->blockQueue), &pcb) >= 0)
	{
		queue(&readyQueue, &pcb);
		m->locked = 1;
	}
	asm("sti");
}
Any tips on how these could be messing up my stack are greatly appreciated. Also, any general information about good methods for debugging problems like this where the bug is a result of the interactions of many different pieces of code running in different processes would be helpful.

Thanks in advance

Re: mutexes causing stack overflow

Posted: Thu Mar 06, 2008 12:39 am
by Brendan
Hi,

This isn't enough code to determine why your stack overflows (no code for the "block()", "deQueue()" and "queue()" functions, and also no way to tell if relevant pieces of data are marked as "volatile" or not).

However, I have a feeling your code isn't correct - the lock needs to be atomic so that there's no race conditions when different CPUs try to get the same lock at the same time. Note: I assume you do care about multi-CPU.
jrussel316 wrote:Also, any general information about good methods for debugging problems like this where the bug is a result of the interactions of many different pieces of code running in different processes would be helpful.
For this specific problem, I'd start by replacing the mutex with a simple spinlock to see if that fixes things. Then I'd try to test each smaller piece by itself (e.g. write some temporary code to test the "block()" function by itself, then do the same for the "deQueue()" function, then the "queue()" function).

Alternatively you could try single-stepping with the Bochs debugger, or inserting things at strategic locations in the code. For example (using Bochs with the "0xE9 I/O port hack" enabled), you could do "outportb(0xE9, '+')" when a task is being put on a queue, "outportb(0xE9, '-')" when a task is removed from the queue, and "outportb(0xE9, '!')" when the queue becomes empty. In this case you might get something like "+-!++--!" which looks good, or "+++-++-++++-" which would probably indicate that the same task is being put on the queue more than once, or "+-!-----+-----" which might indicate that the same task is being removed from the queue more than once.


Cheers,

Brendan

clarification

Posted: Thu Mar 06, 2008 12:53 pm
by jrussel316
Thanks for the help - I'm trying that i/o port hack right now. Just for clarification, I am not trying to support multiple cpu's, and I know my queue / dequeue functions work well - I have been using them for my ready queue for some time now. One question though - I haven't been using volatile at all. Where and when is it important?

Thanks

Posted: Thu Mar 06, 2008 4:36 pm
by jrussel316
I figured it out - it turned out to be a problem with my stack initialization code and the part of my context switcher that switches from the idle process if that process was running. Although it turned out to have nothing to do with the mutexes, the tools and methods for debugging were very helpful. Thanks for helping me get a handle on what is a whole new level of debugging for me.