bewing wrote:What's the point of multi-threading if the threads are running on a single CPU only?
Each thread can block independently.
wouldn't it be better if they could run on seperate CPU's so both parts are done simultaneously, rather than on a single CPU where they are sharing cylces?
If that one process was the only one running on a machine (which it NEVER is) then that would be great. If there are 500 tasks sharing cycles on 8 cores, then you gain absolutely nothing by spreading the threads between cores. In fact, you lose a tremendous amount of "affinity".
(And I did oversimplify my load balancing method description. It won't suffer any of the "obvious" errors. But it may have some subtle ones.)
Yes, but even if you have 20 tasks running with 4 cpu's, and someone is trying to play a game that is multi-threaded at high priority (like the quake 3 engine is), it would be much more efficient to have the threads on different CPU's (where that game has seen nice increases on multi cpu/multi-core cpu setups). Running all the threads (most of which are non-blocking) on a single CPU would not net any increase in speed vs. a single core CPU, outside of the fact that a few of the low cycle processes could run on another CPU (which would mostly be idle during gameplay). Also, for something like a very large web server (i know not everyone writes a gaming OS, so lets go to the other side of things), spawning multiple tasks is common, so they can using blocking I/O for the connections. So, we have 1000 threads for 1000 people connected, and all 1000 threads are on a single CPU while the other CPU's do very little, this does not seem ideal to me. Sure, in some cases, there are reasons to want to have a child thread on the same CPU as it's parent, but those are typically best cases, not worst cases like I have provided. I would rather my OS be efficient in the worst case/heavy load, than be the best at idling
.
And if I have 500 threads sharing cycles, and 99% of those are idle, then spreading them out among cores wouldn help, especially if 480 of those threads where on the same CPU because they all spawned from one process, and that left 20 threads for the other 7 cpu's to share. That does not sound balanced at all, and if all 500 threads where seperate and active, with only 2 of them being from the same process, would it really make a difference if they were on the same CPU or not? Not likely. It's not my OS, i'm just giving my perspect/reasoning for wanting to do things the way I explained. If you get really bored and wanted to make some benchmarks, I would be willing to write an implementation for your OS (if you've got specs) and do some testing to see what is more efficient, because I really haven't done any testing, it's just the way I view it in my head. Sharing a CPU is fine if the other CPU's aren't starving, but I can see them starving doing something intensive, like playing a game, running a server, etc. I would like a few, real world, examples of when your method of each sub-thread on the same CPU would really be a bonus to speed. I have provided examples (and there are many more, like video compression, i know we've all seen the benchmarks between single and multi-core CPU's doing compression and decompression, running multiple threads on seperate CPU's net faster speeds, and these are things that people do every day with thier PC).
For sharing threads among CPU's there are a few methods, I'll discuss the 2 main ones.
1.) Sharing a thread queue: Each CPU has it's own interrupts firing at specific intervals, when it fires to switch tasks, it locks the queue, and searches for the next thread/process and goes to it. Each CPU just pulls the next process off the list. The problem is, the chances of each CPU spinning while waiting for the queue to become unlocked is pretty high, since they are all pulling from one pool. This means wasted cycles, but on the plus side, there really isn't any load balancing, as each CPU is running all the tasks in the list, so they never stop grabbing tasks.
2.) List per CPU: Just how it sounds, each CPU has its own list, the benefit here is, that, even if you lock the list (so tasks can be added/removed), the chance of another CPU spinning is much less, since most of the time will be spent just reading it's own task list, meaning the CPU cycles are used more smartly. The down side is, now you have to try to balance your processes among processors, and keep track of multiple lists, and swap tasks when one CPU has to much idle time, or whatever. There is a lot of housekeeping going on, so this method is only worth it, if the housekeeping takes less cycles than the spinning that is done with the other method.
Now, there are plenty of sub-methods for each of these, but they are the main overviews. There are also plenty of ways to implement and try to get around the negative sides, but those are the problems that each specific OS needs to solve depending on your goals. If I want to build a single app gaming OS, my goals and implementations would probably be much different than someone shooting for a server, etc.