fork theory?

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
yemista
Member
Member
Posts: 299
Joined: Fri Dec 26, 2008 12:31 pm
Location: Boston
Contact:

fork theory?

Post by yemista »

Why does do UNIX's use fork to create a new process? I am just wondering what the reasoning is behind this. I have been studying JamesM tutoral on multitasking, and looking at his code for creating a new process. It makes sense as to what he is doing, but my first inclination for implementing this in my own kernel was to do something like this:

Code: Select all

pde_t* create_user_space(u32 start, u32 end) {
  u32 i;
  pde_t* user_pde = (pde_t*)prim_malloc_a(sizeof(pde_t));
  mem_set((u32*)user_pde, 0, sizeof(pde_t));
  
  // make sure address range is page aligned
  start &= 0xFFFFF000;
  end &= 0xFFFFF000;
  end += 0x1000;

  for(i = start; i < end; i += 0x1000) 
    alloc_frame(get_page(i, 1, user_pde), 0, 0);

  // make sure kernel space is mapped into user process
  for(i = KERNEL_START; i < KERNEL_END; i += 0x1000) {
    user_pde->tables[i / 1024] = kernel_pde->tables[i / 1024];
    user_pde->tables_phys[i / 1024] = kernel_pde->tables_phys[i / 1024];
  }

  return user_pde;
}  
So basically, when you want to create a process, you call this function to get a page directory for it and create its address space. This function just creates page tables for the given address range you expect the process to occupy, and copies in the kernel page tables as well. I admit this function isnt working 100% yet, but the basic idea is that you create a pde, create and allocate tables for the address range the executable expects to execute in, and just link in kernel tables, as opposed to cloning a directory. Is there something Im missing here that makes this a bad idea, or would this work for creating processes and switching to them? I dont plan to implement fork, but rather just setup processes from scratch
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: fork theory?

Post by NickJohnson »

The idea behind fork() is that you are able to implicitly preserve all of the permissions, file descriptors, and memory of a process while making a new one. There are many advantages to this design.

First of all, the combination of fork() and exec() is more flexible than simply creating a new address space and executing in it. You can make extra lightweight processes at any point, and have them continue from the current execution state.

Second, the file descriptors a process has open are duplicated, which means the filesystem does not have to authenticate and reopen them. It also means descriptors can be used relatively: a shell can fork(), change it's stdout and stdin to pipes, and then exec(). This is the only reasonable way to implement pipelines, an essential part of a *nix system.

Last, it is impossible not to marvel at how the undoubtedly most powerful system call possible takes no arguments. It is invaluable to be able to take all of the stress off of the system to make a new process and context, when there are many already in existence on the system.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: fork theory?

Post by bewing »

The big disadvantage is that if you do not use "copy on write", then the overhead of forking something like a shell process is enormous -- and the overhead of "copy on write" itself is pretty large. In some cases, it can be a much more lightweight solution to be able to exec() directly from a child thread. And one small part of the exec function is to create an address space the way you did, above.
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: fork theory?

Post by NickJohnson »

Well, you could implement lightweight processes and exec directly from one of those. Then the overhead wouldn't be too much.
User avatar
yemista
Member
Member
Posts: 299
Joined: Fri Dec 26, 2008 12:31 pm
Location: Boston
Contact:

Re: fork theory?

Post by yemista »

NickJohnson wrote: Last, it is impossible not to marvel at how the undoubtedly most powerful system call possible takes no arguments. It is invaluable to be able to take all of the stress off of the system to make a new process and context, when there are many already in existence on the system.
Yes but, all that matters is the total work done. Either way, if the process is going to be used to execute a new program, you will have to setup the address space for that program.

So basically, if I dont implement fork, I might have some trouble down the line when trying to implement more advanced features? I can certainly see the merits of implementing fork and the advantages to it, but since this is all in good fun, I will go about it the way I posted and maybe figure something out if I ever get to pipes. Who knows, maybe if I ever get far enough without fork, Ill learn to appreciate it because of things I realize end up being much more difficult
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: fork theory?

Post by NickJohnson »

Well, it's less the address space than the other process metadata. If you create a new process with every exec() call, you would have to specify a lot of information all at once instead of just copying from the parent process. That's why on Windows you have to give CreateProcess() a whole structure filled with arguments, while fork() takes none. This also makes things implicitly secure with no extra code, because any permission system set up on the parent process is automatically cloned onto the child process.
Post Reply