Bug Hunting!

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Whatever5k

Bug Hunting!

Post by Whatever5k »

Hello there.
I am having a bug in my FORK routine for about half a year now, and I just cannot find the solution. I now hope that you might be able to find it, although it quite much code, and it assumes a certain "qualification". I will describe at first:

My FORK system call calls the function do_fork(). This one creates a new stack (malloc) for the child, and copies the parent's one onto it. Next, it just changes the relevant bits (EIP, EBP, EAX and so on) and puts it on the process queue. That is the simplified story of it and there are some more bits like getting CR3 and so on, but those are not interesting.
I attached my kernel/fork.c and added some more comments that may be useful for you.
The problem:

Code: Select all

void func(void)
{
int pid = fork();

if (pid > 0) { /* parent */
 for (;;) printf("blabla");
}
else { /* child */
 void *tmp = &pid;
 for (;;) 
  if ((void*) &pid != tmp) panic("Stack changed");
}
}
This code causes following: at first, the parent's code is executed (printing blabla). Then, child gets control. It runs through the for loop once, but on the second time (or third, does it play a matter?) I get a panic (stack changed). That is it. The problem. I have no idea how it happens. But it has to do s.th. with FORK. Please help.

Thanks,
Alexander

PS: Maybe Tim or Pype will be able to find the bug?

[attachment deleted by admin]
BI lazy

Re:Bug Hunting!

Post by BI lazy »

Could you be so kind and indicate somewhere a location to download the whole thing? Maybe I can do some downtracing. for bug hunting, I use several techniques of my own, which have sometimes to do with placing some printf's at strategic places ... just to get a picture of where is the code at a given time. Or stack dumps - this one is always interesting prior to process switches.

But ad hoc, with the given code, I for my part can't trace it down.

sorry for not being able to help immediately.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bug Hunting!

Post by Candy »

abless wrote: This code causes following: at first, the parent's code is executed (printing blabla). Then, child gets control. It runs through the for loop once, but on the second time (or third, does it play a matter?) I get a panic (stack changed). That is it. The problem. I have no idea how it happens. But it has to do s.th. with FORK. Please help.
As for an initial thought, seeing the code:

You allocate a new stack for the forked process, which inevitably ends up at a different place (as far as I can tell). -

child->sbox.user_base = malloc(current->sbox.user_size) + current->sbox.user_size - 1;

Note, the - 1 makes the stack aligned to a byte boundary. Try using - sizeof(long) for native-GPR aligned data.

As for the loop, did you test for it being OK once in C or by checking out the assembly code generated? If C, that doesn't say a lot. There is a very big chance the compiler uses a different addressing method, or optimizes the &pid to before the fork call, or it might just determine that it's always true and leave out the entire comparison.

I think your EBP is desynchronised with your ESP, because you use a new stack (at a different place). The only possible solution I can see is a complete overhaul of the code. You are intending to make a posix-ish fork() if I'm correct, so if you duplicate all userspace pages to new pages, and then add a new process entry under the new name, after which you change 1 value on the stack, you would have a compliant fork.

As an aside, are you trying to reimplement the entire posix standard, or is this fork part of your own design?
Whatever5k

Re:Bug Hunting!

Post by Whatever5k »

I am indeed trying to do my FORK routine POSIX-compliant, but I am not copying any code but having my own thoughts, (hopefully) leading to the same result. I do copy all pages by the parent to the child.
Maybe you are right with EBP, that is what I think at the moment, too. But why does it not fail immediately, but works for a time, and then fails (during execution, and not during a task change)?

@Bl lazy:
Give me your e-mail-addy, I'll give you the code ;)
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bug Hunting!

Post by Candy »

abless wrote: I am indeed trying to do my FORK routine POSIX-compliant, but I am not copying any code but having my own thoughts, (hopefully) leading to the same result. I do copy all pages by the parent to the child.
Maybe you are right with EBP, that is what I think at the moment, too. But why does it not fail immediately, but works for a time, and then fails (during execution, and not during a task change)?
If I'm right then you don't have to allocate a new stack for it, as you do now (afaics), because the new stack for the child process will be identical to the parent process, with the exception of the return value in eax.

Allocating a new stack is bad in any way, you'd have to scan the ENTIRE return stack for all stored ebp's, and modify them accordingly. I for one don't think you can pull it off in this fashion.

Allocating a new kernel stack is a must, but as you put it, you don't use it yet so that's not a point.

If your OS can run on bochs, try setting breakpoints and debugging from there, see what exactly happens.
Whatever5k

Re:Bug Hunting!

Post by Whatever5k »

No, I do not think so.
I think it is better (and common) to allocate a new stack for the child, to seperate the processes. What do the others think, anybody knows how POSIX does it?
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bug Hunting!

Post by Candy »

abless wrote: No, I do not think so.
I think it is better (and common) to allocate a new stack for the child, to seperate the processes. What do the others think, anybody knows how POSIX does it?
well, posix REQUIRES that they have identical (called shared in the standard, but I object to that description because they are still COW) pages, which means that if you want them to comply to posix, they must share the same stack, otherwise a LOT of stuff is going to have to change. You are making your situation nearly impossible.

For thread creation calls (also process creation calls that also create an initial thread, such as fork) they either have a new entry point, and do not return from there (use exit or something similar) or they can return and they must have the exact same stack as the original one.

You cannot adjust all ebps on stack (fact).
You can adjust the ESP and the current stack
Most, if not all, compilers generate code with ENTER/LEAVE or equivalent code sequences, which move esp to ebp at the start of the funciton, and move EBP back to ESP at the end. So, if you change ESP, it will be completely useless since at the end of the function it will be overwritten by the old one again. Even if you manage to fix the direct ebp, which you can, you cannot trace the entire stack, so the function before that will not function, etc.

You cannot be posix compliant if you change the stack.

Therefore, my point remains, you cannot get it to work completely posixly, whether this is your current cause of problems or not.

And as an aside, you cannot get it to work reliably this way without adding the prerequisite that there must be a nonreturning function call after the fork() call (pushing the old ebp and loading ebp with the new esp).
Whatever5k

Re:Bug Hunting!

Post by Whatever5k »

So how do you want it to work with two processes returning a different value? How should that work?
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Bug Hunting!

Post by Candy »

abless wrote: So how do you want it to work with two processes returning a different value? How should that work?
sample code, as I hinted at a few posts ago

Code: Select all

pid_t fork() {
   pid_t pid, oldpid;
   tid_t oldtid;

   enter_critical_region();

   pid = allocate_new_process();

   oldpid = get_current_pid();
   oldtid = get_current_tid();

   processes[current_pid].thread[current_thread].stack[place_for_eax] = pid;

   copy_all_pages_to_new_process(pid, current_pid);

   allocate_new_kernel_stack_for_new_process();

   copy_old_kernel_stack_to_new_one();

   alter_pid_value_on_new_kernel_stack(using_stack_bases, to_zero, 0);

   leave_critical_region();
};
note, both pids are at the same place but only in a different stack
BI lazy

Re:Bug Hunting!

Post by BI lazy »

since you copy the hardware context, you can also lay a context-struct over the copied stack to correct some values.

Sidenote: copying the userstack in a paged environment is *maybe* tricky, but you simply don't need to relocate anything, for it just lands in the same adress region - only a different adress space (cr3 rulez! *hehe*).
Post Reply