Page 1 of 2
difference between fork() and threads...
Posted: Sat Oct 01, 2011 8:05 am
by skandalOS
Hello everybody!
I want to ask something which annoys me.
What is the difference between fork() and a thread? or which similarities and relationships do they have?
fork() is integrated on the OS and is called with int 0x80 and works on the same level of kernel-threads.
And what about execve()? It should also use fork() for creating a new process, or am I misunderstanding something?
Can anybody help me?
Thanks
Re: difference between fork() and threads...
Posted: Sat Oct 01, 2011 11:13 am
by AJ
Hi,
The method for creating a new process is fork() followed by execve().
Take a look here. Also, fork() is a well defined function, whereas a thread is a programming concept.
IIRC, Linux has no notion of threads, just processes (please correct me if I'm wrong or out of date - *ducksandruns*). So, with fork() you spawn a child process identical to the parent except process ID. You then use the PID to determine whether you are the parent or child process. The child process then uses execve to actually load the new binary and jumps to the entry point.
Cheers,
Adam
ps: please don't use colour in your posts - not everyone uses the same theme.
Re: difference between fork() and threads...
Posted: Sat Oct 01, 2011 10:53 pm
by Solar
AJ wrote:IIRC, Linux has no notion of threads, just processes (please correct me if I'm wrong or out of date - *ducksandruns*).
Consider you told. NPTL has been integrated with the kernel for the 2.6 release.
Re: difference between fork() and threads...
Posted: Sat Oct 01, 2011 11:43 pm
by bluemoon
I think fork() duplicate the who process space except a few items(see man page). It sound more practical to just duplicate the page table (and retain resource handles), setup the child-specific items, and continue execution (by return from fork function), upon child do changes to memory they are handled with copy-on-write.
Re: difference between fork() and threads...
Posted: Sat Oct 01, 2011 11:58 pm
by Brendan
Hi,
skandalOS wrote:What is the difference between fork() and a thread?
For "fork()" the entire virtual address space is cloned (and then typically discarded soon after when a variation of "exec()" is called). This is typically done using "copy on write" - e.g. everything in the virtual address spaces is marked as "read only", and any write causes a page fault where a new copy of the page is allocated/created and changed to "read/write". Various other resources are also (temporarily, until "exec()"?) shared, including things like environment variables, file handles, signal handling, etc. It's relatively expensive.
When a thread is created, the same address space (and other resources) are used "as is". This should be faster as the OS/kernel doesn't need to setup cloned versions of the resources.
Note: Some OS's also have some sort of "spawnProcess()", which works like "fork()" and "exec()" combined. The benefit of this is that a new virtual address space is created (and the old address space is not cloned and then discarded) and other resources (file handles, etc) don't need to be shared; which is simpler and faster. Some OS's only have "spawnProcess()" and don't support "fork()" at all; which is a lot easier to implement (no need for the OS/kernel to support things like address space cloning, file handles that are shared by multiple processes, etc).
AJ wrote:IIRC, Linux has no notion of threads, just processes (please correct me if I'm wrong or out of date - *ducksandruns*).
If I understand it correctly; internally Linux has a "meta-fork()" where the caller tells it what to do with various resources. For example, the "fork()" function would call "meta-fork()" and tell it to clone the parent process' virtual address space, while "spawnThread()" would call "meta-fork()" and tell it to re-use the existing address space. Basically Linux doesn't support threads, but does support processes that "share the same everything" (and therefore behave identically to threads).
Cheers,
Brendan
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 1:02 am
by xenos
Brendan wrote:Note: Some OS's also have some sort of "spawnProcess()", which works like "fork()" and "exec()" combined. The benefit of this is that a new virtual address space is created (and the old address space is not cloned and then discarded) and other resources (file handles, etc) don't need to be shared; which is simpler and faster. Some OS's only have "spawnProcess()" and don't support "fork()" at all; which is a lot easier to implement (no need for the OS/kernel to support things like address space cloning, file handles that are shared by multiple processes, etc).
Windows, for example, has API functions like CreateProcess, CreateThread and so on, which create a new process from an executable file or a new thread within the same process. Actually this is what I implemented in my kernel, since it appears more logical to me and, as you said, requires no expensive address space cloning and discarding. I wonder why fork / exec has survived such a long time in Unix / POSIX operating systems. I read that the original reason was somehow related to pipes and filters, but I can hardly imagine that they are harder to implement with something like CreateProcess.
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 6:20 am
by Owen
Fork is useful because it lets you do some setup in the context of the child process before handing over control. It provides a lot of flexibility that CreateProcess doesn't for doing things like massaging file descriptors.
Theres no reason, as I see it, not to support both. For simple tasks, CreateProcess can be more efficient; for complex ones, fork() can be useful.
For file descriptors: Fork shares all of them with its parent process. However, file descriptors can be marked with F_CLOEXEC, which closes them when exec is invoked.
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 8:20 am
by Combuster
Theres no reason, as I see it, not to support both. For simple tasks, CreateProcess can be more efficient; for complex ones, fork() can be useful.
I decided to kick out the limiting abstractions altogether and went for a system consisting of CreateAddressSpace/CreateThread/TransferPage as the relevant system calls. (Which are powerful enough to implement any variation of the fork/createprocess calls)
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 8:34 am
by gravaera
Same here, spawnProcess(), spawnThread().
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 8:36 am
by OSwhatever
Combuster wrote:Theres no reason, as I see it, not to support both. For simple tasks, CreateProcess can be more efficient; for complex ones, fork() can be useful.
I decided to kick out the limiting abstractions altogether and went for a system consisting of CreateAddressSpace/CreateThread/TransferPage as the relevant system calls. (Which are powerful enough to implement any variation of the fork/createprocess calls)
Isn't CreateProcess exactly what CreateAddressSpace does? These calls tends to differ between microkernels and monolithic. Monolithic wants to store more byrocracy like child-parent relationship, environment variables, access rights and so on. With microkernels much of this is moved to user space for example the process manager in QNX, then CreateProcess in the kernel becomes much simpler. Also as the process manager is the actual process that creates/fork a new process, no one else does, then then fork cannot work the same way (the kernel API).
fork is like one of these Unix institutions itself and I've not fully understood it yet. Win32 has survived well without a fork API and I have no plans introducing one in my kernel since I cannot find a use case for it. CreateProcess does well for me.
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 9:24 am
by Combuster
Textbook CreateProcess() does three things: create a new address space, load a program from disk into that address space, create a thread in the new address space. fork() follows the same steps except for copying itself rather than loading a new program at the call.
CreateAddressSpace, as per the name, only performs the first step. The calling program can then load a program as a copy of itself, shared from itself, an entirely different program, and can also set up debugging facilities and patch the result before an actual thread is created and started as the last step. In fact, CreateAddressSpace is little more than a security feature: a new program may very well be loaded to share the caller's address space.
The concept is also independent from mono/micro considerations: CPU management is in the end always done by the kernel, memory management is not necessarily specified, and the code to load an actual program may be part of a dedicated system call. Remote memory modifications are also not limited to either concept (though microkernel designs have a bigger tendency to require it)
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 10:09 am
by gravaera
Fork is yet another "gift" we have been bestowed with from *nix, and like all the other gifts from *nix, at the time it was invented, it made sense. And when it obviously came out of practicality, it was stubbornly clung to. Concurrent servers do not any longer need fork. There are threads and IPC now. Welcome to 2011.
Meanwhile,
in Australia...
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 10:21 am
by fronty
POSIX also has vfork(2), which doesn't copy the virtual memory of the parent process, which is blocked while the child is using its resources. If a child process is used to call exec*(2), fork(2) won't be the best idea.
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 1:54 pm
by xenos
Owen wrote:Fork is useful because it lets you do some setup in the context of the child process before handing over control. It provides a lot of flexibility that CreateProcess doesn't for doing things like massaging file descriptors.
I see the benefits. Well, my idea is that CreateProcess may create a new process either in an active or inactive state, and a handle / PID is returned to the caller. When the new process is created in an inactive state, the caller can thus do things like granting resources to the new process and finally switch its state to active. I guess this would provide a similar functionality.
Re: difference between fork() and threads...
Posted: Sun Oct 02, 2011 2:36 pm
by FlashBurn
What would be the problem to create a new process with a flag (start or waiting)? So you then could to all things you would do between a fork() and an exec(). This is the way I´m doing it.