difference between fork() and threads...

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
skandalOS
Posts: 15
Joined: Mon Sep 05, 2011 12:05 pm

difference between fork() and threads...

Post by skandalOS »

Hello everybody!

I want to ask something which annoys me.
What is the difference between fork() and a thread? or which similarities and relationships do they have?
fork() is integrated on the OS and is called with int 0x80 and works on the same level of kernel-threads.
And what about execve()? It should also use fork() for creating a new process, or am I misunderstanding something?

Can anybody help me?

Thanks
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: difference between fork() and threads...

Post by AJ »

Hi,

The method for creating a new process is fork() followed by execve(). Take a look here. Also, fork() is a well defined function, whereas a thread is a programming concept.

IIRC, Linux has no notion of threads, just processes (please correct me if I'm wrong or out of date - *ducksandruns*). So, with fork() you spawn a child process identical to the parent except process ID. You then use the PID to determine whether you are the parent or child process. The child process then uses execve to actually load the new binary and jumps to the entry point.

Cheers,
Adam

ps: please don't use colour in your posts - not everyone uses the same theme.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: difference between fork() and threads...

Post by Solar »

AJ wrote:IIRC, Linux has no notion of threads, just processes (please correct me if I'm wrong or out of date - *ducksandruns*).
Consider you told. NPTL has been integrated with the kernel for the 2.6 release.
Every good solution is obvious once you've found it.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: difference between fork() and threads...

Post by bluemoon »

I think fork() duplicate the who process space except a few items(see man page). It sound more practical to just duplicate the page table (and retain resource handles), setup the child-specific items, and continue execution (by return from fork function), upon child do changes to memory they are handled with copy-on-write.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: difference between fork() and threads...

Post by Brendan »

Hi,
skandalOS wrote:What is the difference between fork() and a thread?
For "fork()" the entire virtual address space is cloned (and then typically discarded soon after when a variation of "exec()" is called). This is typically done using "copy on write" - e.g. everything in the virtual address spaces is marked as "read only", and any write causes a page fault where a new copy of the page is allocated/created and changed to "read/write". Various other resources are also (temporarily, until "exec()"?) shared, including things like environment variables, file handles, signal handling, etc. It's relatively expensive.

When a thread is created, the same address space (and other resources) are used "as is". This should be faster as the OS/kernel doesn't need to setup cloned versions of the resources.

Note: Some OS's also have some sort of "spawnProcess()", which works like "fork()" and "exec()" combined. The benefit of this is that a new virtual address space is created (and the old address space is not cloned and then discarded) and other resources (file handles, etc) don't need to be shared; which is simpler and faster. Some OS's only have "spawnProcess()" and don't support "fork()" at all; which is a lot easier to implement (no need for the OS/kernel to support things like address space cloning, file handles that are shared by multiple processes, etc).
AJ wrote:IIRC, Linux has no notion of threads, just processes (please correct me if I'm wrong or out of date - *ducksandruns*).
If I understand it correctly; internally Linux has a "meta-fork()" where the caller tells it what to do with various resources. For example, the "fork()" function would call "meta-fork()" and tell it to clone the parent process' virtual address space, while "spawnThread()" would call "meta-fork()" and tell it to re-use the existing address space. Basically Linux doesn't support threads, but does support processes that "share the same everything" (and therefore behave identically to threads).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: difference between fork() and threads...

Post by xenos »

Brendan wrote:Note: Some OS's also have some sort of "spawnProcess()", which works like "fork()" and "exec()" combined. The benefit of this is that a new virtual address space is created (and the old address space is not cloned and then discarded) and other resources (file handles, etc) don't need to be shared; which is simpler and faster. Some OS's only have "spawnProcess()" and don't support "fork()" at all; which is a lot easier to implement (no need for the OS/kernel to support things like address space cloning, file handles that are shared by multiple processes, etc).
Windows, for example, has API functions like CreateProcess, CreateThread and so on, which create a new process from an executable file or a new thread within the same process. Actually this is what I implemented in my kernel, since it appears more logical to me and, as you said, requires no expensive address space cloning and discarding. I wonder why fork / exec has survived such a long time in Unix / POSIX operating systems. I read that the original reason was somehow related to pipes and filters, but I can hardly imagine that they are harder to implement with something like CreateProcess.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: difference between fork() and threads...

Post by Owen »

Fork is useful because it lets you do some setup in the context of the child process before handing over control. It provides a lot of flexibility that CreateProcess doesn't for doing things like massaging file descriptors.

Theres no reason, as I see it, not to support both. For simple tasks, CreateProcess can be more efficient; for complex ones, fork() can be useful.

For file descriptors: Fork shares all of them with its parent process. However, file descriptors can be marked with F_CLOEXEC, which closes them when exec is invoked.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: difference between fork() and threads...

Post by Combuster »

Theres no reason, as I see it, not to support both. For simple tasks, CreateProcess can be more efficient; for complex ones, fork() can be useful.
I decided to kick out the limiting abstractions altogether and went for a system consisting of CreateAddressSpace/CreateThread/TransferPage as the relevant system calls. (Which are powerful enough to implement any variation of the fork/createprocess calls)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: difference between fork() and threads...

Post by gravaera »

Same here, spawnProcess(), spawnThread().
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: difference between fork() and threads...

Post by OSwhatever »

Combuster wrote:
Theres no reason, as I see it, not to support both. For simple tasks, CreateProcess can be more efficient; for complex ones, fork() can be useful.
I decided to kick out the limiting abstractions altogether and went for a system consisting of CreateAddressSpace/CreateThread/TransferPage as the relevant system calls. (Which are powerful enough to implement any variation of the fork/createprocess calls)
Isn't CreateProcess exactly what CreateAddressSpace does? These calls tends to differ between microkernels and monolithic. Monolithic wants to store more byrocracy like child-parent relationship, environment variables, access rights and so on. With microkernels much of this is moved to user space for example the process manager in QNX, then CreateProcess in the kernel becomes much simpler. Also as the process manager is the actual process that creates/fork a new process, no one else does, then then fork cannot work the same way (the kernel API).

fork is like one of these Unix institutions itself and I've not fully understood it yet. Win32 has survived well without a fork API and I have no plans introducing one in my kernel since I cannot find a use case for it. CreateProcess does well for me.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: difference between fork() and threads...

Post by Combuster »

Textbook CreateProcess() does three things: create a new address space, load a program from disk into that address space, create a thread in the new address space. fork() follows the same steps except for copying itself rather than loading a new program at the call.

CreateAddressSpace, as per the name, only performs the first step. The calling program can then load a program as a copy of itself, shared from itself, an entirely different program, and can also set up debugging facilities and patch the result before an actual thread is created and started as the last step. In fact, CreateAddressSpace is little more than a security feature: a new program may very well be loaded to share the caller's address space.
The concept is also independent from mono/micro considerations: CPU management is in the end always done by the kernel, memory management is not necessarily specified, and the code to load an actual program may be part of a dedicated system call. Remote memory modifications are also not limited to either concept (though microkernel designs have a bigger tendency to require it)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: difference between fork() and threads...

Post by gravaera »

Fork is yet another "gift" we have been bestowed with from *nix, and like all the other gifts from *nix, at the time it was invented, it made sense. And when it obviously came out of practicality, it was stubbornly clung to. Concurrent servers do not any longer need fork. There are threads and IPC now. Welcome to 2011.

Meanwhile, in Australia...
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
fronty
Member
Member
Posts: 188
Joined: Mon Jan 14, 2008 5:53 am
Location: Helsinki

Re: difference between fork() and threads...

Post by fronty »

POSIX also has vfork(2), which doesn't copy the virtual memory of the parent process, which is blocked while the child is using its resources. If a child process is used to call exec*(2), fork(2) won't be the best idea.
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: difference between fork() and threads...

Post by xenos »

Owen wrote:Fork is useful because it lets you do some setup in the context of the child process before handing over control. It provides a lot of flexibility that CreateProcess doesn't for doing things like massaging file descriptors.
I see the benefits. Well, my idea is that CreateProcess may create a new process either in an active or inactive state, and a handle / PID is returned to the caller. When the new process is created in an inactive state, the caller can thus do things like granting resources to the new process and finally switch its state to active. I guess this would provide a similar functionality.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: difference between fork() and threads...

Post by FlashBurn »

What would be the problem to create a new process with a flag (start or waiting)? So you then could to all things you would do between a fork() and an exec(). This is the way I´m doing it.
Post Reply