Hi,
iammisc wrote:i have a bunch of computers all running the same daemon. The daemons are designed to work together. the network will get BIG BIG BIG and i need to make sure every single computer on the network gets a message. I need to make sure no computer gets it twice and that a computer does not need to keep in memory a HUGE list of ip addresses because that would eat up memory. Each computer only knows about the existence of 10 other computers due to a heartbeat.
What sort of structure are these computers/daemons arranged in?
For "client-server", all clients need to know one IP address (for the server) and the server needs to know all IP addresses (for all clients). A client would tell the server to broadcast a message and the server would send it out to all clients.
For "hierarchical client-server", all computers need to know the IP address of the "master server", and the IP addresses of the computers connected "downstream". Each client would tell the master server to broadcast a message and the master server would send it to the computers directly under it, the computers directly under the master server would forward the message to computers under them, and so on until all computers have received the message. This might work well if you're using ethernet subnets and can make use of a broadcast packet (e.g. one computer on each subnet uses a broadcast packet to send the message to all other computers on the same subnet at the same time).
For "peer-to-peer" every computer needs to know the IP address of every other computer. In this case it costs you about 4 bytes per IP address per computer (e.g. for 10000 computers it costs about 40 KB of RAM on each computer, and for 1 million computers it'd be about 4 MB of RAM on each computer). Not sure how you'd define "BIG BIG BIG", but this seems practical to me (unless you're thinking of every computer on the internet). The bonus here is that there's no double handling (i.e. no need to send a message up to a server before anything is broadcast). The bigger problem is the bandwidth available to each client (e.g. if a client is using a 56 K dial-up connection then sending 1 million messages from it might take a while).
You could combine approaches though. For example, for 1 million computers you could nominate one thousand "group controllers" (where each group controller controls 1000 computers). In this case each client only needs to know 1000 IP addresses (one for each group controller). When a message is broadcast it's sent to each group controller, and each group controller sends it to each computer in it's group. In this case, if N equals the square root of the maximum number of computers supported, then you'd use N * 4 bytes of RAM on each client, N * 4 * 2 bytes of RAM on each "group controller", and each computer needs enough bandwidth to be able to send N messages in a suitable amount of time.
Of course this is only a "2 level" system. For a 3 level system, N equals the maximum number of computers to the power of one third. For a million computers, you'd have 100 "master controllers" that know about 100 "group controllers", that each know about 100 clients. By expanding on this idea you could have a "6 level system", where (for a million computers) there's 10 "level 1 controllers", that each know about 10 "level 2 controllers", that each know about 10 "level 3 controllers", etc.
By taking this idea to extremes you could end up with an "N level binary system", where there's 2 "master controllers", and each controller knows about 2 other computers below it. This would have bad latency though - the time it takes for a message to find it's way from a master controller down to the furtherest client would be the sum of the latencies of all intermediate controllers (but RAM usage and bandwidth requirements would be extremely small).
The problem with this combined approach is figuring out which computers are "controllers", how each client finds out who the top level controllers are, and how each controller finds out who it's responsible for controlling (especially for a dynamic network, and/or if you want to nominate a new controller if an existing controller dies, and/or if you want to optimise it to minimise latency/bandwidth).
For a "chaotic" configuration (no structure at all), the only alternative I can think of is for each message to have a unique identifier, and for each computer to send the message to any/all computers it knows about. If a message is received you'd compare the unique identifier with previously received messages, and ignore it if it's been received already.
Anyway, none of this probably makes much sense, but it (hopefully) will give you some ideas....
Cheers,
Brendan