fork() to speed things up

All off topic discussions go here. Everything from the funny thing your cat did to your favorite tv shows. Non-programming computer questions are ok too.
Post Reply
sancho1980
Member
Member
Posts: 199
Joined: Fri Jul 13, 2007 6:37 am
Location: Stuttgart/Germany
Contact:

fork() to speed things up

Post by sancho1980 »

hi

i had some kind of an argument today in the company and thought i might get some advise from you guys:

i was looking at a c program today where several child processes were created by means of the fork() system call. because these child processes did all the same kind of thing (only with different parameters), i wondered why the programmer did not simply do all the work sequentially and rather used the fork()
the answer i was given was: to speed things up! were talking here about maybe 5 different child processes that all do the same kind of work (fetching data from a database, processing it in some way and inserting it all in their respective output files), and i thought, how would this possibly speed up things, given that you're adding the context switch overhead AND theyre all doing pretty much the same thing (i.e. they'll be competing for the same resources more ore less at the same time), so i said no way is this going to make the program run faster than if you had everyhing run in sequence...what do you guys think??
Craze Frog
Member
Member
Posts: 368
Joined: Sun Sep 23, 2007 4:52 am

Post by Craze Frog »

If the program is I/O bound then that will slow it down. If it's CPU bound then it will be slowed down if you have only one CPU, else it will be sped up. It depends on the database.
sancho1980
Member
Member
Posts: 199
Joined: Fri Jul 13, 2007 6:37 am
Location: Stuttgart/Germany
Contact:

Post by sancho1980 »

right
but even if its cpu bound and you HAD several processors in place: how do you know its going to be sped up? the other cpu might very well be busy running a different job at the same time..also, even if its cpu bound, chances are you cant hold your working set entirely in registers, so you'll be accessing memory so youll be competing for the bus constantly...
is this really an advisable technique in a very general case?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

sancho1980 wrote:but even if its cpu bound and you HAD several processors in place: how do you know its going to be sped up?
Technically, there are too many variables to be certain about this.
the other cpu might very well be busy running a different job at the same time..
which frees the current CPUs of performing that task
also, even if its cpu bound, chances are you cant hold your working set entirely in registers, so you'll be accessing memory so youll be competing for the bus constantly...
Why do you think the cache was invented? :wink:
is this really an advisable technique in a very general case?
The conclusion is: multithreading for performance only works on multiprocessor systems with cpu-bound tasks.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
GuiltySpark
Posts: 7
Joined: Sun Dec 16, 2007 9:47 am
Location: The (Other) Counterweight Continent

Re: fork() to speed things up

Post by GuiltySpark »

sancho1980 wrote:hi

i had some kind of an argument today in the company and thought i might get some advise from you guys:

i was looking at a c program today where several child processes were created by means of the fork() system call. because these child processes did all the same kind of thing (only with different parameters), i wondered why the programmer did not simply do all the work sequentially and rather used the fork()
the answer i was given was: to speed things up! were talking here about maybe 5 different child processes that all do the same kind of work (fetching data from a database, processing it in some way and inserting it all in their respective output files), and i thought, how would this possibly speed up things, given that you're adding the context switch overhead AND theyre all doing pretty much the same thing (i.e. they'll be competing for the same resources more ore less at the same time), so i said no way is this going to make the program run faster than if you had everyhing run in sequence...what do you guys think??
I think this sounds like the work of im/premature optimisation.

In situations like these, I find it's best to actually measure the performance of both types. It's no use trying to convince people whose understanding come from a set of principles by arguing with another set of principles. Hard data shuts people up.
"Pissing people off since 1986."

"Edible: n, As in a worm to a toad, a toad to a snake, a snake to a pig, a pig to a man, and a man to a worm."
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Hard to say without seeing the actual layout of the system, i.e. what kind of database, if the database is running on the same machine etc. etc.

Even on a single-CPU machine, splitting a job into two threads can result in the job being done faster - as many systems limit the amount of CPU / I/O a single process can get, so splitting up into two processes can result in more ressources being available overall.

But GuiltySpark is right. Generally speaking, there is only one way to be sure: Measure, optimize, measure. The system I am working on is running on a 16-CPU server. We had a job split up in 12 processes, thinking we'd make good use of available ressources. We actually became faster by reducing to 8 processes - but still faster when we used 20 processes... that's computers for you.
Every good solution is obvious once you've found it.
User avatar
B.E
Member
Member
Posts: 275
Joined: Sat Oct 21, 2006 5:29 pm
Location: Brisbane Australia
Contact:

Re: fork() to speed things up

Post by B.E »

This reminds me of "make -j x" switch. The "-j x" allows you to run more than one job simultaneously. Even on a single processor computer, you get a performance boost (depending on what you set it at, it normally twice the speed then normal). This is because compiling is IO bound.

From what you describe, each child process is IO bound ('inserting it all in their respective output file' is IO bound, it's hard to say weather the database is IO bound, but it's more than likely to be IO bound) and therefore you will get a preformance boost.
Image
Microsoft: "let everyone run after us. We'll just INNOV~1"
Craze Frog
Member
Member
Posts: 368
Joined: Sun Sep 23, 2007 4:52 am

Post by Craze Frog »

This is because compiling is IO bound.
No, it's because compiling is alternating between CPU-bound and IO-bound. If it was only IO-bound there would be no performance increase. But when you run two processes then one can access the IO when the other is using the CPU and vice versa.
User avatar
babernat
Member
Member
Posts: 42
Joined: Tue Jul 03, 2007 6:53 am
Location: Colorado USA

Post by babernat »

sancho1980 wrote: is this really an advisable technique in a very general case?
That could be a can of worms. I agree with the others in that the answer is not trivial, even in a general sense. You could check out things like Amdahl's law or the definition of speed up. Once you can quantify these then you have a starting point.

It's probably obvious to you, but making a program parallel can cause other "boundings" to arise that may not exist in the non parallel case.
Thanks for all the fish.
Post Reply