technical difference between a fork and thread creation

Caleb1994 · Post by **Caleb1994** » Sat Jun 09, 2012 12:08 pm

As I am beginning to plan out my multitasking scheme, I was just thinking about how when fork is called, the two processes share the same address space, and therefore share global variables, and upon fork the file descriptor table is simply copied, and all file system node reference counting is increment respectfully. Would this not be the same actions taken when spawning a thread? I never tackled multithreading before, but I figured that if it was built into the tasking system itself, it wouldn't be as hard (still a pain, I'm assuming

).

If the process to fork a task and to spawn a task thread is similar enough, would it not be possible to have a function which spawns a new thread, simply by calling fork, then setting a few flags or fields in the task structure to indicate it is a child thread of another process? E.g. (psuedo)

Code: Select all

int spawn_thread(void(*thread_func)(void*), void* data)
{
     task_t* parent = get_current_task();
     pid_t pid = fork();
     if( pid != 0 ) return EOKAY;
     task_t* this = get_current_task();
     lock_task();
     this->flags |= TASK_THREAD;
     this->parent = parent;
     unlock_task();
     thread_func(data);
     kill_self();
     return EOKAY; // keep the compiler from complaining :P
}

Thoughts? Ideas? Ridicule?

Combuster · Post by **Combuster** » Sat Jun 09, 2012 12:39 pm

You can have multiple threads per address space, whereas each process has it's own address space.

Effectively, threads can observe and modify the same data as other threads, and each can see the effects of the other. If you call fork, a new process gets created, and any changes to memory made by either parent or child are not visible to the other.

Brendan · Post by **Brendan** » Sat Jun 09, 2012 6:40 pm

Hi,

Caleb1994 wrote:As I am beginning to plan out my multitasking scheme, I was just thinking about how when fork is called, the two processes share the same address space, and therefore share global variables, and upon fork the file descriptor table is simply copied, and all file system node reference counting is increment respectfully.

No - for "fork()" two processes don't share an address space, but one is given a separate copy of the other's address space. Typically this is done with "copy on write" paging tricks, where all pages are marked as "read only" and writes to these pages cause a page fault where the page fault handler creates a copy of the page and marks it as "read/write". Of course if a newly forked process calls "fork()" then you can have 3 processes all sharing the same page (and if the third process calls "fork()" you can have 4 processes sharing a page, etc). Basically you end up with an unlimited number of processes sharing the same page. This means you need reference counting - if the number of processes sharing a page is 1 then the page fault handler can make it "read/write" again without creating a copy of the page; and if the number of processes sharing the page is 2 or more then you have to allocate a new page, copy the data and make the copy of the page "read/write". Also, if a process frees the page then the number of processes sharing the page can be decremented without creating a copy of it; and if a process terminates (or calls "exec()") you get to decrement the "the number of processes sharing the page" for every page. Also, don't forget that any of these pages might be on swap space or part of a memory mapped file. You can see how this can be expensive (extra overhead all over the place) and very complicated.

Spawning a thread is simple - the new thread shares the same address space, and you don't need to clone the address space.

Spawning a process is simple too - the new thread has an entirely new address space, and you don't need to clone the address space.

Caleb1994 wrote:If the process to fork a task and to spawn a task thread is similar enough, would it not be possible to have a function which spawns a new thread, simply by calling fork, then setting a few flags or fields in the task structure to indicate it is a child thread of another process?

You can have a "meta thing". For example, the kernel could have a low level function to create a task, that accepts a bunch of other variables that tell it to either create a new address space, clone an existing address space or use an existing address space; and tell it to either clone file handles or not; and tell it to either clone signal handling or not; etc. This is how Linux does it.

You can also decide that "fork()" is a stupid pain in the neck, and avoid massive amounts of complexity and overhead by refusing to support it (and only supporting spawning threads and spawning processes). This is how I do it (but I don't care about POSIX compatibility).

Cheers,

Brendan

Caleb1994 · Post by **Caleb1994** » Mon Jun 11, 2012 8:26 am

Do you ever have that

DOH! Moment? I feel really stupid... It made sense in my head the other day, and I have no idea why...

I hate these moments, when you feel like an idiot for being *that guy* on the forum... sorry guys. Haha Still some good information, though.

@Berkus:

No, I had not read that page, but I am reading right now. Thanks for the link.

Now, then as far as how this original idea came up. I think that what I was imagining is having threads run as a process, yet use the same memory map as their parent. Essentially a fork without copying the address space; simply using the same one. Otherwise, would you not have to write a whole other scheduler or add a chunk of code to the current scheduler to loop through the processes child threads, instead of iterating only through the process list (by "iterating" I of course mean whatever sorting your scheduler uses).

Although, Brendan did mention one of my biggest fears with this idea. The whole reference counting thing. I figured it could get pretty complicated, but you have shown how much so. o.O Haha

Thanks for your help and your patience. It seems to me that my idea was a flop.

Jezze · Post by **Jezze** » Mon Jun 11, 2012 11:30 am

What you are describing is a thread or a lightweight process. For embedded systems there is also something called protothreads that might interest you.

Caleb1994 · Post by **Caleb1994** » Tue Jun 12, 2012 8:42 am

Interesting, I'll look up those "protothreads" you mentioned. Thanks for the input.

OSDev.org

technical difference between a fork and thread creation

technical difference between a fork and thread creation

Re: technical difference between a fork and thread creation

Re: technical difference between a fork and thread creation

Re: technical difference between a fork and thread creation

Re: technical difference between a fork and thread creation

Re: technical difference between a fork and thread creation