OSDev.org

Posted: **Thu Jun 18, 2015 6:57 pm**

While I was pondering about an implementation of asynchronous I/O I came up with a ?new? way of dividing computing time, not bounded by large monolithic threads running for long periods of time but instead sequential tasks that take and yield information to other tasks on starting and completion, much like UNIX pipe programs but much smaller. A sequence of tasks would be specified via a dependency calculation, an interface program such as a shell would receive a request from the user to process information and determine what input is needed to generate the information, then invoke the task class for that input to generate the dependencies of that input (if any) which in turn generate their dependencies and when this tree has been built it will be run with each node running after its dependencies have become available.

This system would still require some long running threads for user interaction but would greatly simplify data flows and would (if implemented efficiently) greatly speed up data processing by only loading/starting tasks when they can actually run, eliminating wait states.

Example:

Code: Select all

|-------|                                ___________________                            __________________
| Shell |  >----wants image masked--->  | image masker task | --- wants images ----> 2x| image loader task|
|-------|                               |___________________| 			                  ------------------

[image loader task] >-----wants file-----> [file load task] >----wants bytes-----> MANYx [block cache task]

[block cache task] >-----wants blocks----> [disk driver task]

Here the dependency tree would be constructed and because the file load task has no dependencies at first it will begin execution there but instead of blocking it will request the required blocks and post a deffered "return" task that will run when the inode/directory blocks and lookup tasks it has posted are done and which will construct the file object so that the execution order would be:

1 shell
2 file load task (1) -> requests path lookup etc and exits with a deferred return task
3 path lookup
...
4 file load task (deferred return) -> request bytes, deferred return
5 disk driver task
6 block cache task
7 file load task (deferred return 2) -> construct file object, return bytes
8 load image task
9 mask image task
10 shell

This is still a raw idea and I havent really been working on it yet but i thought it was worth sharing

Posted: **Fri Jun 19, 2015 1:33 am**

Hi,

Peterbjornx wrote:While I was pondering about an implementation of asynchronous I/O I came up with a ?new? way of dividing computing time, not bounded by large monolithic threads running for long periods of time but instead sequential tasks that take and yield information to other tasks on starting and completion, much like UNIX pipe programs but much smaller. A sequence of tasks would be specified via a dependency calculation, an interface program such as a shell would receive a request from the user to process information and determine what input is needed to generate the information, then invoke the task class for that input to generate the dependencies of that input (if any) which in turn generate their dependencies and when this tree has been built it will be run with each node running after its dependencies have become available.

That dependency calculation is likely to be impossible for non-trivial scenarios. For example, the user types "gcc" with some command line arguments in the shell. You can't the know the dependencies until after GCC has examined its command line arguments, and maybe GCC loads a file and (depending on the file's contents - e.g. "#include" lines) it may or may not load more files and examine their contents too.

Instead of trying to determine which other tasks will be needed in advance (which is impossible); why not decide which tasks are needed when they are actually sent data (which is trivial)? Of course this ends up being the same as "remote procedure call" - all tasks are waiting (either on disk waiting to be executed until they've been called, or in memory waiting to be called); and when they are called they stop waiting for a while and do something and then stop again (either terminating and going back to waiting to be executed, or staying in memory and going back to waiting to be called).

Peterbjornx wrote:This system would still require some long running threads for user interaction but would greatly simplify data flows and would (if implemented efficiently) greatly speed up data processing by only loading/starting tasks when they can actually run, eliminating wait states.

For a normal system; those wait states are actually flow control. One task might do "fetch data from disk; process it; send processed data to next task" in a loop to process a 20 TiB file; where it's constantly stopping due to flow control (because it can't get data from disk fast enough and has to wait). The second task has to wait for data to arrive from the first task (more flow control) but as soon as its next piece of data does arrive it can do something with it. Note that the first task and second task might be running in parallel at the same time (e.g. on different CPUs); and you don't need 20 TiB of RAM to process the 20 TiB file. This is how "command line pipes" work - e.g. if you pipe data from a file to "gunzip" to "wc" to a second file; then gunzip and wc can be running in parallel, and you don't wait until the entire original file is loaded before doing any decompressing, you don't wait until all data is decompressed before doing any word counting, etc). Even better is when you're able to use asychronous file IO; where the computer can be loading one piece of data from disk while decompressing the previous piece of data while counting words in the piece of data before that; all at the same time.

If you get rid of that and (e.g.) load the entire 20 TiB file into memory, then process the 20 TiB of data (e.g. decompress it so that now you've got 40 TiB of data in RAM), then send all of that data to the next task ("wc") and start processing it; then it will be extremely slow (massive amount of swapping to/from disk, nothing happening in parallel).

Basically; to greatly speed up data processing you need wait states; and by eliminating wait states you cripple performance.

Cheers,

Brendan

Posted: **Fri Jun 19, 2015 3:07 am**

Peterbjornx wrote:While I was pondering about an implementation of asynchronous I/O I came up with a ?new? way of dividing computing time, not bounded by large monolithic threads running for long periods of time but instead sequential tasks that take and yield information to other tasks on starting and completion, much like UNIX pipe programs but much smaller.

First, there are independent tasks in the world, so you still need some means of doing some jobs in parallel.
Second, somebody should prepare the pipe. It should be started from rewriting all programs in the world, just because they are not ready for your model.

But you still can try to implement your model for your OS. Then you need to invent some language that defines sequential tasks, next you need to implement a compiler that efficiently schedules the tasks itself or prepares some metadata for succeeding a tool, next you need OS-side support for such execution model. And after all of this done you still need to run independent tasks on your PC simultaneously. So, finally you have to do a lot of work and the result will be the old multi-threaded environment with an ability to run some threads in a bit unusual way.

But as a way of approaching compiler design, your model can be useful. A very smart compiler just must to have some clue about task independence and efficient scheduling. The only problem here is the cost of implementation of a very smart compiler, because it should be able to optimize somewhat on par with a human developer. So, you just need to create such compiler and there will be no more problems in the software design area

Posted: **Sun Jun 21, 2015 12:33 pm**

Hi Peterbjornx,

This is what I'm currently trying to do. Instead of having threads I only have tasks (which cannot block), and a variety of causes can cause a task to be scheduled - such as completion of another task, an interrupt or a timeout. I'm using a variant of C++11 futures to do so (implemented according to the newest proposals for C++17+ including executors and .then/wait_all/etc. with an additional async_for that I needed).

So far it complicates the code a *lot* and does not add a lot of direct value. I do expect it to result in more available parallelism as you're essentially creating new pseudothreads at every point where you can fork execution - any blocking call has its result returned asynchronously and will run its return path in parallel with your original request. As *every* call you can do that has much effect will be asynchronous, you will get a lot of implicit parallellism. I think that at currently I could get at least 8x parallellism in early boot time - while you're loading the shell it constructs a console, which loads a font file (implicitly in the background), which also loads an image (in parallel with the font load) as texture, etc. In regular processing you wouldn't even notice these options for parallellism, but in this style of programming they're almost unavoidable.

I'm just hoping that when I get this done for the basic stuff I have so far that it's still a compilable and runnable system. I notice that the compiler is taking quite a bit longer to compile bits of code and I'm afraid it may grow the code size. Hopefully GCC5.1 LTO can help with that a bit.

Is this the direction you were also thinking in?

Posted: **Mon Jun 22, 2015 5:07 am**

Candy wrote:Instead of having threads I only have tasks (which cannot block), and a variety of causes can cause a task to be scheduled - such as completion of another task, an interrupt or a timeout. I'm using a variant of C++11 futures to do so

Unless you have seriously modified the futures approach it represents just a hand made case of parallelism. You have to define all tasks manually and next to write a code that waits for task completion. The futures here are just a very lightweight (with minimal functionality) shell around your code.

But initial suggestion was about automatic task scheduling. Also, implicitly, it assumes some automatic means of splitting a code into a set of tasks. So, your approach differs from the suggested in a manner similar to the difference between a manual and an automatic job execution. But may be your final version will have some work automated and as a result will be able to save some developer's efforts.

OSDev.org

Tasks as sequential "processes" instead of monolithic tasks

Tasks as sequential "processes" instead of monolithic tasks

Re: Tasks as sequential "processes" instead of monolithic ta

Re: Tasks as sequential "processes" instead of monolithic ta

Re: Tasks as sequential "processes" instead of monolithic ta

Re: Tasks as sequential "processes" instead of monolithic ta