Page 1 of 1

An odd error - an arg as the return value from a function

Posted: Fri Dec 02, 2011 9:17 pm
by eryjus
Hi everyone....

I have been working this issue for several days and have narrowed it down to what I think the problem is, but I'm at a loss to completely explain it. I was wondering if it is a compiler bug or setting, but I would get consistent results which I am not. On the other hand, it feels like timing from a task switch at a "particularly vulnerable" state (is there such a thing) in a return statement.

Here is what I am seeing:
Screenshot-QEMU [Stopped].png
The message says that Resume is being called with PID 0, the null process.

My backtrace in gdb shows the following:

Code: Select all

(gdb) backtrace
#0  0x001002cb in hang ()
#1  0x001039d5 in Panic (message=0x105035 "Bad Resume() pid", file=0x105023 "src/proc.Resume.c", line=11) at src/cmn.Panic.c:17
#2  0x00100500 in Resume (pid=0) at src/proc.Resume.c:11
#3  0x00100842 in kmain (mbi=0x2000, magic=732803074) at src/init.kmain.c:45
#4  0x001002c9 in loader ()
... and line 45 in kmain() is:

Code: Select all

	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'D'));
So, CreateProcess is returning pid 0. However, I checked the return value of pid right before CreateProcess returned to ensure it is not 0 and the check never failed even though Resume received a 0 pid argument.

Any other ideas on how I can track down this odd bug? I have seen http://wiki.osdev.org/Troubleshooting and http://wiki.osdev.org/How_Do_I_Use_A_De ... With_My_OS.

[edit: this does not happen at the same line and does not happen every execution. It does happen at a Resume(CreateProcess( )) call.]

TASK_FUNC is a simple macro:

Code: Select all

typedef void (*TaskFunc)();
#define TASK_FUNC(x) (TaskFunc)(&(x))
Below are CreateProcess, Resume, and kmain in their entirety (respectively):

Code: Select all

#include "kernel.h"

uint16 CreateProcess(TaskFunc t, uint32 numParms, ...)
{
	ProcessQueueEntry *newProc = Dequeue(&freeQ);
	
	if (0 == newProc) PANIC("Ran out of process Threads to use\n");
	
	uint32 ps = DisableInterrupts();
	uint16 pid = newProc->pid;
	ProcessEntry *pe = &ProcessList[pid];
	uint32 flags;
	uint32 *esp;
	
	pe->state = PRC_SUSP;
	pe->priority = PTY_NORM;
	pe->sem = 0;
	pe->msg = 0;
	pe->hasMsg = 0;
	pe->stack = (uint32 *)kmalloc_a(INITIAL_STACK_SIZE);
	pe->stackLength = INITIAL_STACK_SIZE;
	esp = (uint32 *)((uint32)pe->stack + INITIAL_STACK_SIZE - 32);
	
	pe->ss = GetDS();		// Stack will be from the Heap (DS)
	pe->cr3 = GetCR3();
	flags = GetFlags();
	
	int *a = (int *)((&numParms) + (numParms + 1));
	
	// start building the stack for _SwitchTasks
	for (; numParms > 0; numParms --) *(--esp) = *(--a);
	*(--esp) = (uint32)&UserReturn;		// where to go when user task returns
	*(--esp) = (uint32)t;				// eip on the stack for entry point
	*(--esp) = (uint32)&EnableInterrupts;
	*(--esp) = 0;		// ebp
	*(--esp) = flags;
	*(--esp) = 0;		// eax
	*(--esp) = 0;		// ebx
	*(--esp) = 0;		// ecx
	*(--esp) = 0; 		// edx
	*(--esp) = 0;		// esi
	*(--esp) = 0;		// edi
	*(--esp) = (uint32)pe->ss;	// ds is the same as ss
	*(--esp) = (uint32)pe->ss;	// es is the same as ss
	*(--esp) = (uint32)pe->ss;	// fs is the same as ss
	*(--esp) = (uint32)pe->ss;	// gs is the same as ss
	
	pe->esp = (uint32)esp;
	
	processCount ++;
	
	Enqueue(&waitQ, newProc);
	RestoreInterrupts(ps);
	
	return pid;
}

Code: Select all

#include "kernel.h"

void Resume(uint16 pid)
{
	uint32 flags = DisableInterrupts();	
	
	if (IS_BAD_PID(pid) || ProcessList[pid].state != PRC_SUSP) {
		if (IS_BAD_PID(pid)) {
			kPutStr("Bad PID in Resume() call: ");
			kPutDec(pid);
			PANIC("Bad Resume() pid");
		}
		
		goto out;
	}
	
	ReadyProcess(pid);
	
out:
	RestoreInterrupts(flags);
}

Code: Select all

/*!
 * @file init.kmain.c
 * @author Adam Clark
 * @short Kernel Main initialization routine
 * 
 * This file contains the main initialization routine for the Century 
 * Hobby OS Kernel.  It performs the following functions:
 *  1.  Checks the magic number to see if it is valid
 *  2.  Initializes the GDT Table
 */

#include "kernel.h"

void PrintA(void);
void PrintB(void);
void PrintLetter(char);

void kmain (void *mbi, unsigned int magic)
{
	mbi = mbi;
	
	if ( magic != 0x2badb002 ) {
		kCls();
		kPutStr("PANIC: Invalid magic number!!");
		return;
	}
	
	
	kCls();
	
	UpdateStatus();
	
	InitGDT();
	InitIDT();
	InitPaging();
	InitProcess();
	InitTimer(2000);
	
	kPutStr("\nStarting the Operating System\n");
	EnableInterrupts();
	
	uint16 A = CreateProcess(&PrintA, 0);
	uint16 B = CreateProcess(&PrintB, 0);
	uint16 C = CreateProcess(TASK_FUNC(PrintLetter), 1, 'C');
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'D'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'E'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'F'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'G'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'I'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'J'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'K'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'L'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'M'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'N'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'O'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'P'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'Q'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'R'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'S'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'T'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'U'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'V'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'W'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'X'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'Y'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'Z'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, '1'));
	Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, '2'));

	Resume(A);
	Resume(B);
	Resume(C);  

	while (1) {
		Resume(A);
		Resume(B);
		
//		asm volatile("hlt");
	}
}
Thank you all!!!

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 2:20 am
by Combuster
First of all, you are apparently not too familiar with gdb backtraces - a function call with arguments will often show the current value for such argument, which may no longer be the original value passed to the function.

Which brings me to the following question: Are you sure your macros use == instead of = for comparisons?

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 8:39 am
by eryjus
@combuster, thank you for your reply. Yes, you are right, I am learning gdb - powerful, but I need a quick-reference guide. Good tip on current versus passed values. Yet, the code is reporting that the pid is 0 (see the screen shot), which is where I am reporting the 0 argument within Resume.

As for the = versus ==, I have changed my coding style to put the literal first, such as "if (0 == pid)". I'm certain that the compiler would complain about a bad lvalue is I missed an =. That said, I see it backwards in Resume(), though the != is complete -- I'm still retraining myself in the new coding style.

Here is the ASSERT I used for testing in CreateProcess:

Code: Select all

ASSERT(0 != pid);
and the macro

Code: Select all

#define ASSERT(x) if(!(x)) PANIC("Assertion Failed!")
And here is IS_BAD_PID

Code: Select all

#define MAX_PROCESSES 1024
#define IS_BAD_PID(x) (0 == (x) || (x) >= MAX_PROCESSES)
Last night I output assembly from gcc and walked the resulting code. Nothing significant there, which makes me think I have something in my code somewhere. It is most likely something not going well in a task switch. Again, it burps about 1 in 8 executions and is random when it does, though with the same problem each time. This appears to be my last issue before I can move on from multi-threading.

Again, does anyone have any ideas how I can dig further into the issue?

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 5:00 pm
by gerryg400
How do Disable and Restore Interrupts work ? Do they play with the stack ? Or manipulate any registers ?

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 5:14 pm
by eryjus
Sorry for a quick reply. I am away from my development VM.

Code: Select all

	global DisableInterrupts

DisableInterrupts:
	pushfd
	pop		eax
	cli
	ret

Code: Select all

	global EnableInterrupts

EnableInterrupts:
	sti
	ret

Code: Select all

	global	RestoreInterrupts

RestoreInterrupts:
	mov		eax,[esp+4]
	push	eax
	popfd
	ret
Could it be that I am trampling EAX in RestoreInterrupts? I don't recall seeing anything specific when I reviewed the asm last night, but that doesn't mean anything at all.

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 5:44 pm
by Owen
I'm pretty certain that EAX is a scratch register, so that should be good. That said, I'd still go for

Code: Select all

RestoreInterrupts:
    push [esp+4]
    popfd
    ret

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 9:16 pm
by eryjus
Thanks for the tip, @Owen. I did make the change to the code but unfortunately it did not change the result.

One thought that I have been mulling over: since I am dealing with a 16-bit pid and the underlying asm code is 32-bit.

For example, the C code

Code: Select all

Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'G'));
becomes

Code: Select all

	movl	$71, 8(%esp)
	movl	$1, 4(%esp)
	movl	$PrintLetter, (%esp)
	call	CreateProcess
	andl	$65535, %eax
	movl	%eax, (%esp)
	call	Resume
:?

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 9:37 pm
by eryjus
Update:

I replaced each function call

Code: Select all

Resume(CreateProcess(TASK_FUNC(PrintLetter), 1, 'K'));
with

Code: Select all

	uint16 K = CreateProcess(TASK_FUNC(PrintLetter), 1, 'K');
	ASSERT(K);
	Resume(K);
At the end of CreateProcess, I now have

Code: Select all

....
        Enqueue(&waitQ, newProc);
	UpdateStatus();
	ASSERT(0 != pid);
	RestoreInterrupts(ps);
	ASSERT(0 != pid);
	return pid;
With that, the assertion ASSERT(K) fails...

Re: An odd error - an arg as the return value from a functio

Posted: Sat Dec 03, 2011 11:07 pm
by Owen
I presume you have common interrupt dispatching code. If so, I'd save an extra copy of EAX to a global variable at the start of any ISR, and check that EAX hasn't changed in the meantime at the end. At this point it looks very much like you may have an interrupt handler trashing EAX.

Re: An odd error - an arg as the return value from a functio

Posted: Sun Dec 04, 2011 12:43 am
by eryjus
@Owen,

That was it, thank you. I found I was trying to get cute and cut off a spurious IRQ before it got processed. The cute part was that I was (as you suspected) trampling eax before I saved the registers in order to start my checks. Anyway, after commenting out the offending code, I have more than 30 executions without any issues. I will keep pounding, but I think that was it.

I cannot thank you all enough for the "second (and third and fourth, etc) pair of eyes."


Thanks!