Page 1 of 1

SOLVED Strange solution to strange problem.

Posted: Sun Apr 01, 2012 12:37 pm
by dontpanic42
Hi, i'm a longtime lurker, but now i've got a problem, so: first post.

I'm currently trying to get multitasking with usermode applications working. The task-switching and fork code follow the general idears of JamesM's tutorials, with the exception that the kernel stacks (tss.esp0) are always mapped at the same address in userspace. So when forking the current processes stack gets copied (haven't employed copy-on-write yet). In my kernel's main i do something like that:

Code: Select all

	
switch_to_usermode();
	int ret = syscall_fork();	<-- this works
	if(!ret) {
		syscall_execvp("/boot/test2"); <-- inside there it doesn't
	}
This totally works (kernel pages are still user readable/writable). But when calling syscall_fork in the usermode application (test2), i get an illegal opcode exception inside fork for the calling process (no problems with other syscalls). Tell me if i'm wrong, but i think that can only happen when there is some kind of stack corruption. My fork:

Code: Select all

int fork() {
	asm volatile("cli");
	task_t *parent_task = (task_t*) current_task;
	UINT32 physical;
	UINT32 esp, ebp, eip;
	task_t *new_task = (task_t*) kvmalloc(sizeof(task_t));
	
	clone_directory(FALSE, &physical);	
	
	new_task->pid = get_next_pid();
	new_task->esp = new_task->ebp = 0;
	new_task->eip = 0;
	new_task->directory_physical = physical;
	new_task->kernel_stack = parent_task->kernel_stack;
	new_task->next = 0;
        ...
	copy_open_files(new_task, parent_task);
	
	
	BOCHS_BREAKPOINT;
	
	task_t *tmpt = ready_queue;		
	while(tmpt->next) tmpt = tmpt->next;
	tmpt->next = new_task;t                     <-- exactly here: Illegal opcode exception (for the parent_task)
	
	eip = read_eip();
	
	
	if(current_task == parent_task) {
		asm volatile("mov %%esp, %0" : "=r" (esp));
		asm volatile("mov %%ebp, %0" : "=r" (ebp));
		new_task->esp = esp;
		new_task->ebp = ebp;
		new_task->eip = eip;
		
		return new_task->pid;
	} else {
		return 0;
	}
}
I'm trying to find my mistake for 3 days now, debugged with bochs, but no success. But then, by accident, i changed this

Code: Select all

task_t *tmpt = ready_queue;		
	while(tmpt->next) tmpt = tmpt->next;
	tmpt->next = new_task;t  
to this

Code: Select all

if(parent_task->pid != 0) {
		task_t *tmpt = ready_queue;		
		while(tmpt->next) tmpt = tmpt->next;
		tmpt->next = new_task;
	} else {
		task_t *tmpt = ready_queue;		
		while(tmpt->next) tmpt = tmpt->next;
		tmpt->next = new_task;
	}
which is obviously nonsense.But with the 'if' everything works just fine. No problems/errors at all.
So my question is: How does the if construct affect the stack? What does that tell me about my problem? What the f***?

gdb disassembly and test2 code are http://pastebin.com/T06VRuzP.

Sorry if my question is noobish, i'm still learning :-). And sorry for typos+grammar, english is not my native language.

Re: Strange solution to strange problem.

Posted: Sun Apr 01, 2012 12:45 pm
by bluemoon
It is a very tricky to share stacks across processes. For simplicity, give each process a dedicated kernel stack.

Code: Select all

new_task->kernel_stack = parent_task->kernel_stack;
If you don't have CoW yet, do a memcpy there. Since kernel stack is small, and will be written soon, it make no difference to copy it now or copy it very soon when the process resume.

And you want to provide two copy of user-stack too, either copy it now or CoW when needed.

Re: Strange solution to strange problem.

Posted: Sun Apr 01, 2012 12:55 pm
by dontpanic42
Thanks for answering,

the stack isn't shared, in clone_directory all physical frames mapped in userspace are copied and mapped at the same location in the new_tasks address space.
And you want to provide two copy of user-stack too
Same applies to the userstack, since it's in userspace it gets copied.

Re: Strange solution to strange problem.

Posted: Mon Apr 02, 2012 1:42 am
by JamesM
dontpanic42 wrote:Thanks for answering,

the stack isn't shared, in clone_directory all physical frames mapped in userspace are copied and mapped at the same location in the new_tasks address space.
And you want to provide two copy of user-stack too
Same applies to the userstack, since it's in userspace it gets copied.
Why would the kernel stack be in userspace? Surely it'd be in the top 2GB of address space and therefore not cloned.

Re: Strange solution to strange problem.

Posted: Mon Apr 02, 2012 3:39 am
by dontpanic42
It's in userspace (<0xC0000000) so that it gets copied in the initial kernel-fork (otherwise it would be shared between pid 0 and 1). See this thread, second post. I could probably relocate the stack, but at this point, i see no harm in doing it that way...

EDIT:

Ok, problem solved. First i had an stack overflow in my filesystem implementation. Second, i had to reorder the code to

Code: Select all

	
	UINT32 esp, ebp, eip;
	task_t *new_task = (task_t*) kvmalloc(sizeof(task_t));

	task_t *tmpt = ready_queue;		
	while(tmpt->next) tmpt = tmpt->next;
	tmpt->next = new_task;
	
	directory = clone_directory(FALSE, &physical);	
Seems like my assumptions on when gcc puts local variables on the stack where wrong. Further i learned, that when compiling something like this

Code: Select all

if(xy) { 
    int a; 
}else { 
    int a; 
}
with -O0, "a" acutally gets allocated on the stack twice (always thought that would be optimized), so that explains why it worked with the "if" construct explained in my first post (further explanation here, second answer, if anyone is interested).