Hi,
AJ wrote:I think the person who is doing something closest to this on the OS Dev boards is probably Brendan at
This Web Page. Unfortunately, he seems to be doing a site update at the moment and I can't access his specifications. I'm sure he was doing something similar to distributed computing that
isn't quite traditional distributed computing - correct me if I'm wrong, Brendan
For me, processes run anywhere (on any computer within the cluster) and communicate with each other using messaging; where the kernel routes messages to the receiver (regardless of which computer the receiver is running on) and processes don't need to care if they're talking to something on the same computer or something on a remote computer. On top of this there's a "peer to peer" distributed virtual file system (where any file can be on any disk/s on any computer). However, I know that a good idea implemented poorly is useless, and have spent ages making sure I've got a good foundation to build everything else on (which is just another way of saying it doesn't work yet
).
I have experimented with the idea of a distributed emulator though; and sadly it mostly doesn't work well. The problem is finding a way to share the work involved without causing a massive amount of overhead trying to keep everything synchronized.
What I tried was one process that emulated RAM, with more separate processes that emulated CPUs (one process per emulated CPU). To avoid the need to use IPC for every emulated RAM access I also implemented emulated caches and my equivalent of MESI cache states, so that a process that emulates a CPU could (mostly) run without any IPC except for emulated cache misses. Despite this (and despite the fact that I was using processes running on the same computer without the additional overhead/latency of ethernet/networking hardware) it was slow. Also note that keeping emulated RAM in sync isn't the only problem - you need to keep the emulated CPUs and other emulated hardware (roughly) in time too, which for me meant time control messages that increased IPC (and reduced performance more).
There are other ways to share the work, but they either suffer from the same problem (IPC overhead costing more than potential gains) or don't share the work well (e.g. one process that emulates all CPUs and all RAM with a separate process that only emulates the I/O hub/devices, where one process does a huge amount of work while the other process does very little work, and where it won't scale to more than 2 processes).
Basically what I'm saying is that for distributed systems (and SMP for that matter) you get the best performance when there work done on one computer doesn't depend much on the work being done on another computer.
One way of doing this is "pipelining", where each computer does some stuff and sends the results to the next computer (which does more stuff and sends the results to the next computer, and so on). For an example of this, imagine a C compiler where the first computer parses the source code and compiles it into "intermediate language", the second computer optimizes the intermediate language, a third computer converts the intermediate language into assembly language, and the fourth computer creates the final binary. In this case each computer does a reasonable amount of work but there's very little communication between the computers (e.g. one "here's my output" message per computer/stage).
Another method is "farming", where you've got a controller that splits a huge job into smaller pieces and sends each piece to other computers to be processed, and then combines the results from these other computers. An example of this is video rendering farms, where a master computer asks slave computers to generate one frame each and combines these frames into a movie.
For SMP there's similar problems, but you can use shared memory to minimize the communication costs. For distributed systems it is possible to simulate shared memory (e.g. fetch a page from the a central "page manager" during page faults) but that needs to be implemented on top of lower level communication systems and therefore isn't a way to avoid the overhead of these lower level communication systems (it's still slow, and is typically even slower because processes having less control over it).
Cheers,
Brendan