OSDev.org

Posted: **Sat Oct 01, 2016 9:34 pm**

In light of the things currently going on in the 'POSIX under Microkernels' thread, I think we need to step back - again - and consider a specific, well-defined example of an asynchronously event-driven operation working on an indeterminate number of threads which communicate with each other.

I want to propose such an example, which I call naive mapreduce. Despite its name, it is not a form, simplified or otherwise, of an actual Mapreduce algorithm; it does, however, simulate certain aspects of its operation. For the purposes of this particular discussion, we will start with the assumption that there will always be at least one CPU available for any thread when it is started or awakened from a waiting cycle, and that we are using kernel threads (to be exact: that the thread creation, wait, IPC, and termination are performed through system calls, either explicit or implicit, that all scheduling is by the kernel scheduler, and that threads can be scheduled to run on any free CPU). These conditions can be changed to explore different issues, but anyone discussing a variant should say so explicitly; for now, let's keep to fairly idealized assumptions about the CPU scheduling to avoid contention over which conditions apply.

The Naive Mapreduce problem, broadly speaking, is to fingerprint all the possible links reachable from a given hypertext document (or unspecified format), using one thread or process per document found during the tree-building process. The specific approach I want to suggest we discuss would be:

a request is made to map the document's link tree.
A primary thread is started for processing the links within that document. The thread contains three tables: one cataloging the links within the current page and their corresponding threads (more later), one cataloging links reachable from documents that the current document links, and one cataloging any incoming links back to it that the child threads find.
A child thread replicates the parent, save for its tables, thread ID, and parent ID. The incoming links table of a new child is seeded with the link to the page that first found it.
Each thread can receive five messages from another thread: An 'update incoming links' message (which can come from any thread); an 'update secondary links' message (from a child thread); a 'document processing complete' message (from a child thread); a 'all child document processes complete' message (from a child after all of its own chiuldren have reported completion); and a 'flush all data to log then forward to parent' (which the parent can issue to a completed child after both it and the child have completed their document processing ansd that of the child's children).
Upon finding a link in its own document, a thread searches for the link in its outgoing links table; if the link isn't already cataloged, it adds it to the table and spawns as child thread to process the links of that document. It also passes the new link up to its parent thread, if any, so that it can be added to the parent's secondary links table along with the thread id of the child it spawned to process the new link. Each parent thread stores this locally before propagating up to its own parent.
Upon finding a link that already exists in the table, a thread will look up the thread processing said document and pass it an update incoming links message.
Upon completing its own document processing, a thread will send a self completion message to its parent, if any; then wait for any messages from its own children (if any); when all of its own children have notified that they and all of their children have completed, it then sends them a flush request, causing them to log their own data in a file, then return a copy of their tables to the parent for possible further processing. It then notifies its own parent that it's fully completed it's task and waits for its own flush message. Once a thread has flushed, it requests a termination on itself.

This gives a framework to begin discussing the subject in a relatively formal manner. The process is highly inefficient, and the example probably could be simplified, but it is, I think, a good place to work from.

Posted: **Sat Oct 01, 2016 10:23 pm**

Hi,

Schol-R-LEA wrote:The specific approach I want to suggest we discuss would be:

By specifying an implementation (and not just specifying the behaviour), you tie my hands.

For a simple example; the first thing I'd want to do is establish a pool of worker threads. Otherwise I'd be too worried about "fork bomb" (an exponentially increasing number of threads where the only thing limiting worst case growth rate is networking).

More specifically; I'd be tempted to want "1 + number worker threads == number of CPUs on this computer" for each computer; where a worker thread only communicates with its master in the same process, and masters (different processes on different computers) communicate with each other. This allows for a "table of already seen links and their states" for each process that is shared by master and its worker threads; while using "shared nothing" between masters (where masters send messages to each other to update each other's tables). Of course for my model, a single (master?) thread can handle fetching an unlimited number of pages by itself, so I'd mainly be using worker threads to spread the load of processing pages across available CPUs (and possibly "lazy spawning" where it only spawns a new worker thread if/when processing pages becomes a bottleneck, while keeping under the "1 + number worker threads == number of CPUs on this computer" maximum).

Cheers,

Brendan

Posted: **Sun Oct 02, 2016 10:34 am**

Brendan wrote:By specifying an implementation (and not just specifying the behaviour), you tie my hands.

On the one hand, this is a good point, and I didn't really intend it as such, at least not in the long run; I was presenting it as sketch of an example solution.

OTOH, I did specifically write this as a simplified, naive, and flawed solution, because I wanted to start with a relatively abstract analysis (hence the stipulation about unlimited CPU availability) to isolate the issue that are inherent in the problem of asynchronous event handling, before addressing more practical concerns. I was aiming for an abstract problem space, along the lines of "Producers and Consumers" or "Dining Philosophers" (or at least "Byzantine Generals", which is more broadly defined but still abstract), as I wasn't able to find such a standard treatment for general asynchronous IPC (though it is likely I simply overlooked it). This isn't as simplified as I would like, in fact, and I am concerned that this will interfere with the ability to isolate the questions involved.

On the gripping hand, the main goal here is to find some problem that we can all study and tear apart, so for the time being, I think we should start with a specific design and each independently implement that, then review the implications of the specific solutions. We can look at other designs later as we progress through this.

OSDev.org

Modelling asynchronous event handling

Modelling asynchronous event handling

Re: Modelling asynchronous event handling

Re: Modelling asynchronous event handling