Dealing With Broken Stack Pointers After Fork

goolmoos · Post by **goolmoos** » Sat Apr 24, 2021 3:30 am

I am currently implementing fork.
The way I do my context switching is that when I decide I need to switch, I call the switch function that:
pushes the registers onto the stack,
saves the current stack pointer,
loads the new one,
[optionally copies some memory] <---- used by fork
pops the registers,
sets the curr_process variable.

So, I implemented fork by:
copying all user memory, and creating new pages tables,
allocating a kernel stack for the new process (all the kernel stacks are always mapped in the higher half),
making a "fake" switch, a switch to myself, that uses the copy option in the context switcher to copy the parent kernel stack to the child kernel stack.
adjusting the saved stack pointer of the child (child is switched out) to point to his own kernel stack.

after this, the child process looks exactly the way the parent looked paused, so I can now switch to it, and it would run just like the parent did (except different pid).

This currently works, but I have realized there is a problem. Although It hasn't occurred me, that is just pure luck, and probably won't last.

The problem is that since the kernel stack moved, all the pointers to it are invalid if made from the child, since they point to the parent stack, which we can now gurantee nothing about. the stack pointer itself I can easily adjust, but it is not the only one.
Explicit pointers to variables on the stack are obviously broken, but they are not all that breaks, so avoiding them wouldn't even help.
All the saved bp values, in all function frames, are now ruined (for the child), since they point to the parent. since all local variables are referenced relative to bp, ALL the local variables are messed up now.

To confirm my worries, running the following code in the kernel:

Code: Select all

int x = 10;
Process::fork();
klog("(pid: %d, x: %d)", getpid(), x);

prints (pid: 1, x: 10)(pid: 2, x: 0)

To solve this, I considered several approaches.

1: implementing fork by saving the context of the parent, and jumping directly to the child, while setting ax = 0, and in doing so avoiding any further use of the kernel stack until the next interrupt, at which point everything will be alright. However, since I want to be able to fork kernel processes too, this approach cannot work.

2: Instead of mapping all the kernel stacks at the same time to different kernel virtual addresses, mapping all kernel stacks(other than the special ones e.g. double fault stack) to the same virtual address. This does complicate the matter of copying the kernel stack, but I can always use the mapped physical memory (I have a mapping for all physical memory in continuous virtual address space).

Thus, my question is: Is there a better way (than option 2) to implement fork here?

nullplan · Post by **nullplan** » Sat Apr 24, 2021 5:22 am

I see what the problem is: fork() is really not a kernel space primitive. In kernel space, you should use something like a "spawn()" primitive, that takes a function pointer as argument, and runs the function in a new thread. This avoids this problem entirely, since the kernel stack is cleared up completely.

For user space fork(), you also create a new kernel stack entirely, this time with the stack set up such that after a context switch there, it will immediately just "return" to userspace with the registers on stack, and initialize those registers to be the same as the parent's, except the return value register is 0.

In your example, if the fork() is happening in userspace, nothing bad will happen, since the stack stays in place, it just refers to different memory now. And in kernel space, a completely new stack is used, so nothing can be left over.

In case you are unsure what I mean, here's some pseudo-code:

Code: Select all

/* For x86-64: */
struct kstack_init_frame {
  void *arg;
  void (*func)(void*);
  uint64_t startfunc;
  uint64_t kregs[6];
};
extern const char startkthread[];
int spawn_kthread(void (*func)(void*), void* arg) {
  struct task *tsk = get_new_task();
  struct kstack_init_frame *frame = (void*)((char*)tsk - sizeof (*frame));
  tsk->ti_flags = TI_KERNEL;
  memset(frame->kregs, 0, sizeof frame->kregs);
  frame->startfunc = (uint64_t)startkthread;
  frame->func = func;
  frame->arg = arg;
  tsk->sp = (uint64_t)frame;
  make_task_runnable(tsk);
  return 0;
}

Code: Select all

.global startkthread
startkthread:
  popq %rax /* func */
  popq %rdi /* arg */
  /* %rsp now points to current task */
  movq %rsp, %gs:CPU_CURRENT
  andq $-16, %rsp
  call *%rax
exit_thread:
  movq %gs:CPU_CURRENT, %rdi
  call condemn_thread
1:
  call schedule
  jmp 1b

See, in a switch to a kernel task, you never need to switch CR3, or change ownership of the FPU, so this works. For user space it's a bit more involved, but not all that much. You need to switch CR3, load FPU, then return to userspace. My task switching model looks like this:

Code: Select all

void arch_task_switch(struct task *new)
{
  struct task *this = get_current_task();
  if (this->ti_flags & TI_FPU)
    arch_fpu_save(this->fpu_save_area);
  arch_lowlevel_task_switch(&this->sp, new->sp);

  /* if we get here, we got scheduled back in. */
  set_current_task(this);
  if (!(this->ti_flags & TI_KERNEL))
    set_cr3(this->cr3);
  if (this->ti_flags & TI_FPU)
    arch_fpu_load(this->fpu_load_area);
}

Code: Select all

.global arch_lowlevel_task_switch
.type arch_lowlevel_task_switch, @function
arch_lowlevel_task_switch:
  /* %rdi points where we will put the current SP after we're done saving */
  /* %rsi is the new SP */
  pushq %r15
  pushq %r14
  pushq %r13
  pushq %r12
  pushq %rbp
  pushq %rbx
  movq %rsp, (%rdi)
  movq %rsi, %rsp
  popq %rbx
  popq %rbp
  popq %r12
  popq %r13
  popq %r14
  popq %r15
  retq

goolmoos · Post by **goolmoos** » Sat Apr 24, 2021 6:02 am

Thank you! This solves my issue. I think the part that I was missing was indeed the separation of userspace fork() and kernel thread spawning.

thewrongchristian · Post by **thewrongchristian** » Sat Apr 24, 2021 11:15 am

goolmoos wrote: The problem is that since the kernel stack moved, all the pointers to it are invalid if made from the child, since they point to the parent stack, which we can now gurantee nothing about. the stack pointer itself I can easily adjust, but it is not the only one.
Explicit pointers to variables on the stack are obviously broken, but they are not all that breaks, so avoiding them wouldn't even help.
All the saved bp values, in all function frames, are now ruined (for the child), since they point to the parent. since all local variables are referenced relative to bp, ALL the local variables are messed up now.

To solve this, I considered several approaches.

1: implementing fork by saving the context of the parent, and jumping directly to the child, while setting ax = 0, and in doing so avoiding any further use of the kernel stack until the next interrupt, at which point everything will be alright. However, since I want to be able to fork kernel processes too, this approach cannot work.

2: Instead of mapping all the kernel stacks at the same time to different kernel virtual addresses, mapping all kernel stacks(other than the special ones e.g. double fault stack) to the same virtual address. This does complicate the matter of copying the kernel stack, but I can always use the mapped physical memory (I have a mapping for all physical memory in continuous virtual address space).

Thus, my question is: Is there a better way (than option 2) to implement fork here?

I use option 3

Within my kernel level thread fork, I walk the child stack, looking for what might be pointers into the parent stack, and adjust them accordingly to point to the equivalent location in the child stack. My user level fork then already has a copy of the user registers on the now child stack, so I just return up the child call stack like the parent does, but with the 0 return value instead of the parent which returns the child pid, and return from the system call as normal.

It works OK for me, and in my child, all pointers to automatic variables are still valid (just in the child stack now) but it's not something I abuse.

I could, and have considered, switch to the scheme described by nullplan, the only reason I haven't is that my current scheme works and I've no breakage to fix.

Korona · Post by **Korona** » Sat Apr 24, 2021 12:41 pm

thewrongchristian wrote:Within my kernel level thread fork, I walk the child stack, looking for what might be pointers into the parent stack, and adjust them accordingly to point to the equivalent location in the child stack. My user level fork then already has a copy of the user registers on the now child stack, so I just return up the child call stack like the parent does, but with the 0 return value instead of the parent which returns the child pid, and return from the system call as normal.

Are you "fixing" pointers on the kernel stack or on the usermode stack? On the usermode stack, that is not necessary since the child does not share its address space with the parent. On the kernel stack, that sounds like a horrible security issue (you cannot know if an given word is a pointer or not...).

fork() should not mess with kernel stacks at all -- it should just copy the usermode half of the address space.

rdos · Post by **rdos** » Sat Apr 24, 2021 1:27 pm

Fork is a usermode function and should only make a copy of usermode, not kernel mode. Although, how much of the kernel state, like handles, really should be copied is quite unclear to me, but I currently do copy all handles and increase their reference count.

Another thing is that it's not my impression that you should copy user mode. Rather, both the parent and child should mark all their pages as copy-on-write, and actually share all data without any copy. When one side does a write, a new physical page is allocated, the content is copied and the copy-on-write bits are reset in both contexts. This is what is so complex with fork, particularly if there are nested forks without exec. When an exec is performed, the other process (typically the parent) will get all it's copy-on-write bits reset and in the child, pages that are not copy-on-write must be freed.

rdos · Post by **rdos** » Sat Apr 24, 2021 3:03 pm

Korona wrote:
thewrongchristian wrote:Within my kernel level thread fork, I walk the child stack, looking for what might be pointers into the parent stack, and adjust them accordingly to point to the equivalent location in the child stack. My user level fork then already has a copy of the user registers on the now child stack, so I just return up the child call stack like the parent does, but with the 0 return value instead of the parent which returns the child pid, and return from the system call as normal.
Are you "fixing" pointers on the kernel stack or on the usermode stack? On the usermode stack, that is not necessary since the child does not share its address space with the parent. On the kernel stack, that sounds like a horrible security issue (you cannot know if an given word is a pointer or not...).

I wouldn't need to fix any pointers on the kernel mode stack since all kernel mode stacks start at 0x1000 and grows down to 0. When a new thread or process is created a new kernel mode stack selector is allocated.

Korona wrote: fork() should not mess with kernel stacks at all -- it should just copy the usermode half of the address space.

Right. Fork is a usermode function and should not operate on the kernel mode stack.

thewrongchristian · Post by **thewrongchristian** » Sat Apr 24, 2021 5:28 pm

Korona wrote:
thewrongchristian wrote:Within my kernel level thread fork, I walk the child stack, looking for what might be pointers into the parent stack, and adjust them accordingly to point to the equivalent location in the child stack. My user level fork then already has a copy of the user registers on the now child stack, so I just return up the child call stack like the parent does, but with the 0 return value instead of the parent which returns the child pid, and return from the system call as normal.
Are you "fixing" pointers on the kernel stack or on the usermode stack? On the usermode stack, that is not necessary since the child does not share its address space with the parent. On the kernel stack, that sounds like a horrible security issue (you cannot know if an given word is a pointer or not...).

fork() should not mess with kernel stacks at all -- it should just copy the usermode half of the address space.

Fixing pointers on my kernel stack. I wrote all this before even making it to user mode, but you're right, there is potentially user state stored on the kernel stack (the user registers are saved on the kernel stack), so user state could be corrupted by this stack pointer fix up. I'm not sure how exploitable this is though, fork doesn't have any arguments, so I don't use the user register state in the system call, so user mode shouldn't be able to coerce the kernel into doing something in an uncontrolled manner.

nullplan · Post by **nullplan** » Sat Apr 24, 2021 9:33 pm

thewrongchristian wrote:Within my kernel level thread fork, I walk the child stack, looking for what might be pointers into the parent stack, and adjust them accordingly to point to the equivalent location in the child stack.

Dear god! And if you have a false positive, you corrupt the stack. Plus, you are wasting stack space. I have noticed that the general pattern for fork() usage seems to be:

Code: Select all

if (!fork())
  do_something(), exit();

This means, the child is created with all the context necessary to allow crawling back up the stack, but it will never do so. All the stack that is already used up in the parent is copied to the child, but mostly needlessly. But because fork() doesn't know what the child is supposed to do, you need to keep a single stack frame around, and you have no good way of knowing where exactly that is, and what needs fixing up. Speaking of fixing up: Do you clean the rBP register in the child? Because that one might currently be pointing to the parent stack. And might cause the child to switch back to the parent's stack.

Yeah, those are the reasons why I avoided fork() in the kernel in the first place. Then I saw James Molloy's fork implementation, had to evacuate my stomach into the toilet, and vowed to never get that hacky. The way he does it was to copy all state from the parent to the child, then use paging tricks to put a new stack into the same virtual space. Meaning that in his kernel, kernel threads cannot share stack pointers. Which would bum me out, since I often use that for synchronization.

thewrongchristian · Post by **thewrongchristian** » Mon Apr 26, 2021 12:24 pm

nullplan wrote:
thewrongchristian wrote:Within my kernel level thread fork, I walk the child stack, looking for what might be pointers into the parent stack, and adjust them accordingly to point to the equivalent location in the child stack.
Dear god! And if you have a false positive, you corrupt the stack. Plus, you are wasting stack space. I have noticed that the general pattern for fork() usage seems to be:
Code: Select all
if (!fork())
  do_something(), exit();
This means, the child is created with all the context necessary to allow crawling back up the stack, but it will never do so. All the stack that is already used up in the parent is copied to the child, but mostly needlessly. But because fork() doesn't know what the child is supposed to do, you need to keep a single stack frame around, and you have no good way of knowing where exactly that is, and what needs fixing up. Speaking of fixing up: Do you clean the rBP register in the child? Because that one might currently be pointing to the parent stack. And might cause the child to switch back to the parent's stack.

Err, so reading between the lines, you don't approve, right ?

I mostly follow the code pattern you identified, so yes, I do waste some stack space. It's not much, persistent kernel threads are created at boot, with only a couple of stack frames worth of stack wasted. Most kernel threads are created in response to fork, and I also use fork() in the kernel to create pid 1, so I do rely on walking back up the stack in the child in both cases, and no stack is wasted.

One benefit of the pattern is that the call to whatever function I want to execute doesn't have some fixed format in terms of what arguments can be passed in. Some of my thread functions take no arguments, some take some context pointer, but in all cases, the function prototype exactly matches what I intend the function to do, rather than some generic wrapper declaration.

I do fix up ebp (32-bit only at this point) so my stack frames in the child are coherent, but the only thing I am worried about is the potential for false positives in the user registers saved to the kernel stack when calling fork(), if the user state happen to contain register values that point to the kernel stack location, these registers will have their values corrupted.

In remediation, I have been considering having a more conventional thread spawn, and handling fork as you suggested a few posts back.

This is all just old code that I implemented early in my kernel experiments, and it's worked without incident so far so I haven't been obliged to fix it yet.

But rest assured, I will.

Korona · Post by **Korona** » Mon Apr 26, 2021 2:15 pm

Why don't you just run some function on a fresh (= empty) kernel stack instead of doing a fork()-like operation on kernel stacks? If you need to pass stage, pass it on the heap or let the function that creates the kernel-thread copy some data block onto the top of the stack.

thewrongchristian · Post by **thewrongchristian** » Mon Apr 26, 2021 4:13 pm

Korona wrote:Why don't you just run some function on a fresh (= empty) kernel stack instead of doing a fork()-like operation on kernel stacks? If you need to pass stage, pass it on the heap or let the function that creates the kernel-thread copy some data block onto the top of the stack.

That's what I intend to do. The thread will take a void* and return a void*, which should cover all my scenarios.

nullplan · Post by **nullplan** » Tue Apr 27, 2021 8:56 am

thewrongchristian wrote:Err, so reading between the lines, you don't approve, right ?

I don't, but it's your OS, so what business is it of mine? I merely pointed out the weaknesses of that design, but you already knew about them.

thewrongchristian wrote:One benefit of the pattern is that the call to whatever function I want to execute doesn't have some fixed format in terms of what arguments can be passed in.

Marginal benefit, since you can always create a context structure. Although that is more complicated: If the context is allocated automatically, you also need some kind of synchronization to ensure the lifetime of the object doesn't end before the target thread has managed to read/copy all the important values. Alternative is to use a heap object for context, but then that heap allocation may fail. *Sigh* software development is hard. Then there's also static allocation, but static allocation is just the wrong type for an ad-hoc communication structure. Even in the best case you'd just end up blocking some memory permanently just in case a thread start comes along.

thewrongchristian wrote:This is all just old code that I implemented early in my kernel experiments, and it's worked without incident so far so I haven't been obliged to fix it yet.

Yeah, I know all about that. It's called "technical debt". You will pay interest on that debt one way or the other. Best to tackle it sooner rather than later. That said, it's working now, and maybe you have more interesting/important things to work on right now.

thewrongchristian · Post by **thewrongchristian** » Tue Apr 27, 2021 1:41 pm

nullplan wrote:
thewrongchristian wrote:Err, so reading between the lines, you don't approve, right ?
I don't, but it's your OS, so what business is it of mine? I merely pointed out the weaknesses of that design, but you already knew about them.

I always welcome constructive criticism. Valid points are valid points, and no-ones points on this subject have been invalid.

nullplan wrote:
thewrongchristian wrote:One benefit of the pattern is that the call to whatever function I want to execute doesn't have some fixed format in terms of what arguments can be passed in.
Marginal benefit, since you can always create a context structure. Although that is more complicated: If the context is allocated automatically, you also need some kind of synchronization to ensure the lifetime of the object doesn't end before the target thread has managed to read/copy all the important values. Alternative is to use a heap object for context, but then that heap allocation may fail. *Sigh* software development is hard. Then there's also static allocation, but static allocation is just the wrong type for an ad-hoc communication structure. Even in the best case you'd just end up blocking some memory permanently just in case a thread start comes along.

In most cases, the context is already dynamically allocated in the form of some sort of device structure (in the case of device background threads) so allocation and lifecycles is unlikely to be a problem.

nullplan wrote:
thewrongchristian wrote:This is all just old code that I implemented early in my kernel experiments, and it's worked without incident so far so I haven't been obliged to fix it yet.
Yeah, I know all about that. It's called "technical debt". You will pay interest on that debt one way or the other. Best to tackle it sooner rather than later. That said, it's working now, and maybe you have more interesting/important things to work on right now.

To make it manageable, I'll implement the new API in terms of the old API, combining the fork/call pattern into a single function. Then I can refactor that later as and when I remove my thread fork. I like paying debt back in instalments, and the lessons learned are reasonable interest.

OSDev.org

Dealing With Broken Stack Pointers After Fork

Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork

Re: Dealing With Broken Stack Pointers After Fork