OSDev.org

Posted: **Sat Oct 01, 2016 3:19 am**

Kevin wrote:Did you forget error handling here? Or are you assuming that you never need to check for errors because the next function would automatically fail, like read() when passed a -1 file descriptor?

You could of course add some kind of conditional execution here. And probably you'll soon find uses for loops (handling short reads maybe). Eventually it might turn out that what you just started to write is a VM.

Sik wrote:To be fair, he did say it wasn't well defined yet. My first thought was that the chain would immediately terminate the moment a call fails. That seems like the obvious approach, especially since errors would be isolated to that chain and wouldn't affect other chains.

Yes, I should have mentioned that I was pretty much assuming that if one of the scheduled operations fails, that any other operation that depends on it will end up being cancelled with a different error code to indicate that it has been cancelled (as you usually want to ignore cancelled operations anyway). That would work because those operations won't end up being executed until the operation(s) they depend on have actually completed. The difference is that if any of those operations fail, you simply end up not executing any dependent operations and inform the process that they have been cancelled.

On a less serious note, you could probably go as far as writing a Lisp interpreter for the system calls, but I think people would both hate and love you for that :').

Kevin wrote: Yes, coroutines are nice. And no, you don't need your asynchronous syscall interface if the kernel understands them. You already save and restore the register state when processing a syscall, so doing a context switch here comes for free. Even with a synchronous syscall interface, the kernel can just queue the syscall and switch to a different thread/coroutine until the operation has completed. The important part here is that the kernel is working asynchronously internally, but if you want the userspace to feel synchronous, there's little reason to change the traditional syscall interface. Essentially you get something that feels like blocking syscalls, except that they block only a single coroutine instead of the whole thread.

I always thought the point was that the kernel wasn't aware of coroutines/green threads, as they would otherwise be normal threads scheduled by the kernel. The trade off mostly is that they are a lot cheaper to create, because they don't have to be backed by kernel resources, but that normal threads can be assigned to any processor core by the kernel and that such threads don't have to share the same time slice. One of the big problems with green threads, however, is that you definitely want to use any kind of asynchronous interface, because if an operation ends up blocking the green thread, it actually ends up blocking all the green threads within that process. With an asynchronous system call interface, the userland scheduler could simply schedule the system call and switch to the next green thread and wait for any of the operations to complete in the meanwhile.

However, the use of green threads was just a suggestion, as I definitely like the programming model offered by programming languages like Erlang, and while that language has some weak points (e.g. string handling), it definitely has proven to be scalable. I am also quite sure that there are many other programming languages and libraries that benefit from something this (e.g. Go and libmill/libdill). Part of the reason to implement asynchronous system calls is that they are very flexible: you could end up implementing an existing (partially) synchronous interface for compatibility on top, you could end up implementing something like coroutines/green threads, or you could end up using the asynchronous system call interface directly. Another part of the reason is that they can help saving some of the costs of switching between the process and the kernel, or even other processes (in the case of microkernels), as I have outlined in my earlier comment.

Yours sincerely,
Stephan.

Posted: **Sat Oct 01, 2016 3:28 am**

Hi,

Sik wrote:You know, for all the talk Brendan makes about asynchronicity, he still insists on showing an explicit loop to handle them. Not only that sounds bundersome, it effectively forces the whole thing to become single threaded. If instead of having a loop the system called the callbacks directly, they could be run in separate threads and easily make the system very scalable without even trying - the only thing that would be done sequentially is events that by definition behave that way. In fact, doing things purely asynchronously would remove the need for the scheduler altogether, there isn't task switching, callbacks get called when needed, and the concept of process becomes more about isolation than anything else.

Each thread is single-threaded.

You can have as many threads as you like, all with their own message handling loops, all sending messages to each other. You can even have one thread per "OOP object" if you really wanted. However; because threads don't block unless they have nothing to do you only really want a maximum of one thread per CPU per thread priority; and any more than that is pointless bloat. For optimum data locality you want threads to be explicit. Think like "thread_number_to_send_request = identifier_hash % total_threads;" balancing load between threads, where each thread contains a fraction of the data and that data remains in that thread's CPU's caches.

You can also have multiple processes, each with multiple threads, all sending messages to each other. Like threads (but far more so) for optimum data locality you want processes to be explicit; not least of all because processes share nothing (unlike threads that belong to the same process) which means they can be running on different computers. You just send/receive messages and let the kernel worry about transport/networking. Keep thinking like "thread_number_to_send_request = identifier_hash % total_threads;" balancing load between threads (on many computers). That's how you scale.

Cheers,

Brendan

Posted: **Sat Oct 01, 2016 6:56 am**

Brendan wrote:Yes, the trivial example is trivial.

Apparently not trivial enough that you would actually implement it correctly. And also not trivial enough that it would be as readable as the synchronously written version which does semantically exactly the same.

Yes, you've failed to extrapolate from that.

Oh, yes, I did extrapolate. But you ignored it because you didn't like the result of the extrapolation:

Kevin wrote:And if we did want to have parallelism, we would have to add code here that handles replies to requests made within handle_data(). This is quickly becoming a mess. Manually programming global state machines like this doesn't make sense for more than Hello World programs.

Posted: **Sat Oct 01, 2016 8:43 am**

Hi,

Kevin wrote:
Brendan wrote:Yes, the trivial example is trivial.
Apparently not trivial enough that you would actually implement it correctly. And also not trivial enough that it would be as readable as the synchronously written version which does semantically exactly the same.

I didn't realise there was a "correct" for pseudo-code. The synchronous version isn't extendable in any way that leads to parallelism, and (as I've already explained) I'm not even sure it's possible to make my trivial example equally crippled.

Kevin wrote:
Yes, you've failed to extrapolate from that.
Oh, yes, I did extrapolate. But you ignored it because you didn't like the result of the extrapolation:
Kevin wrote:And if we did want to have parallelism, we would have to add code here that handles replies to requests made within handle_data(). This is quickly becoming a mess. Manually programming global state machines like this doesn't make sense for more than Hello World programs.

I didn't ignore it exactly. The fact is I didn't read your post properly, mostly because I think you're trying your hardest to remain ignorant and I'm wasting my time bothering.

Assume "handle_data()" function is just storing the file's data in memory (or calculating a checksum, or counting how many lines of text are in the file, or....). When the file has finished loading perhaps a less trivial version of the program breaks out of that message handling loop and starts using a completely different message handling loop (without any of the "no longer needed" file IO stuff) that handles a user interface or something else. This is what I meant earlier when I said "This is like breaking a large finite state machine into a nested "finite state machine of finite state machines"". Sadly, you still think you need a single global state machine as if you never learnt to split complex things into smaller simpler things like most children do, but whatever.

Cheers,

Brendan

Posted: **Sat Oct 01, 2016 9:30 am**

Brendan wrote:I didn't ignore it exactly. The fact is I didn't read your post properly, mostly because I think you're trying your hardest to remain ignorant and I'm wasting my time bothering.

If you had read it, you might notice that the people who disagree with you actually think about what they're saying and that it is you who is demonstrably trying to remain ignorant...

Posted: **Sat Oct 01, 2016 9:44 am**

Hi,

If you had read it, you might notice that the people who disagree with you actually think about what they're saying and that it is you who is demonstrably trying to remain ignorant...

I didn't ignore it exactly. The fact is I didn't read your post properly, mostly because I think you're trying your hardest to remain ignorant and I'm wasting my time bothering.

Yes, you've failed to extrapolate from that.
Oh, yes, I did extrapolate. But you ignored it because you didn't like the result of the extrapolation:

Yes, the trivial example is trivial.
Apparently not trivial enough that you would actually implement it correctly.

This thread is nothing more than a quote-fest right now. I suggest you, either try to discuss these matters more politely, either stop replying in this thread altogether if you can't resist flaming.

Regards,
glauxosdever

Posted: **Sat Oct 01, 2016 10:04 am**

Brendan wrote:For optimum data locality you want threads to be explicit. Think like "thread_number_to_send_request = identifier_hash % total_threads;" balancing load between threads, where each thread contains a fraction of the data and that data remains in that thread's CPU's caches.

This statement leads me, somewhat indirectly, to some important questions, ones which I think will explain a great deal about your views on programming in general. I would like you to rank - in general, as the issues will by their nature be subjective and contextual - the importance of some design principles and tradeoffs.

optimal time of service for system calls
optimal memory usage for kernel operations
reduction of the number and cost of interprocess context switches
optimal scheduling of user and kernel processes
optimal throughput in user applications
stability of the operating system in normal operation
stability of the operating system under greater than anticipated load
stability of individual user processes
securing access to kernel-space data
securing access to user-space data from kernel space
securing access to user-space data between user-space processes
decreasing cognitive load for kernel developers (relevant in both reducing programmer error and allowing more complex software design)
decreasing cognitive load for application developers

Hmmn, that doesn't seem to quite capture what I was aiming at, but we can start with this, I suppose.

My thoughts about this are not fully realized yet, but I get the distinct impression that you see black-box abstraction as a negative, not merely because of its performance cost, but because to you it indicates that the developers do not understand the system in detail. You seem to feel - and please correct me if I am mistaken - that a programmer who isn't aware of all aspects of the system they are working on at all times is incompetent, and that cognitive load (that is, how many pieces of information the individual needs to keep 'in scope', mentally) is of no concern.

This seems to be reflected in your language design, for example, which focuses not on providing means of abstraction for more elaborate programming systems, but on channeling the programmers' workflow and trapping common error modes - that is to say, you are seeing using a higher level language primarily as a way to add error detection and allow ease of portability for what is basically assembly level programming. Please correct me if I am mistaken, here.

And quite frankly, your aggressive defense of positions that are out of scope in discussions, and your equally aggressive mockery of anyone and everyone that you see as disagreeing with you (whether they actually are or not), comes across as rather insecure to me, and, I suspect, most of the posters here. I don't think that this is actually the case, but it is how you sound, and it interferes with clear communication in this group. How you say things affects how others interpret what you are saying, and in this case, it is impacting it in a very negative way.

Now, please do not take this as a personal attack. I am saying this mainly to give you a bit of reflection on how other perceive your statements, which should, if nothing else, help you clarify what you are saying in the future. Take it as feedback, not criticism, and I will try to do the same for your reply.

Posted: **Sat Oct 01, 2016 11:07 am**

Brendan wrote:I didn't realise there was a "correct" for pseudo-code. The synchronous version isn't extendable in any way that leads to parallelism, and (as I've already explained) I'm not even sure it's possible to make my trivial example equally crippled.

Start another thread running the same code. Voilà, parallelism.

Assume "handle_data()" function is just storing the file's data in memory (or calculating a checksum, or counting how many lines of text are in the file, or....).

In other words, handle_data() is trivial. Maybe you can finally decide whether you want to talk about more complex scenarios or if you don't? Every time I refer to your trivial examples, you say that that's not the real thing, and when I try to think of a real thing, you tell me that I should imagine something trivial instead.

When the file has finished loading perhaps a less trivial version of the program breaks out of that message handling loop and starts using a completely different message handling loop (without any of the "no longer needed" file IO stuff) that handles a user interface or something else. This is what I meant earlier when I said "This is like breaking a large finite state machine into a nested "finite state machine of finite state machines"". Sadly, you still think you need a single global state machine as if you never learnt to split complex things into smaller simpler things like most children do, but whatever.

I just expected that if you're willing to take all the cost of making everything async, you would like to do, you know, two things in parallel. Like loading a file and responding to UI events. At the same time, and not only returning to processing UI events when the file has completed loading.

All you've really been showing here so far is that even with asynchronous events and event loops everywhere, you can still build fully synchronous applications.

Posted: **Sat Oct 01, 2016 6:28 pm**

Hi,

Kevin wrote:
Brendan wrote:I didn't realise there was a "correct" for pseudo-code. The synchronous version isn't extendable in any way that leads to parallelism, and (as I've already explained) I'm not even sure it's possible to make my trivial example equally crippled.
Start another thread running the same code. Voilà, parallelism.

Unless you have an actual reason to want another thread (e.g. different priority work, needing another CPU); additional threads add more bloat for nothing (memory for stack, thread specific storage, message queues; thread switches between the thread you could've used and the unnecessary thread/s, etc).

Kevin wrote:
Assume "handle_data()" function is just storing the file's data in memory (or calculating a checksum, or counting how many lines of text are in the file, or....).
In other words, handle_data() is trivial. Maybe you can finally decide whether you want to talk about more complex scenarios or if you don't? Every time I refer to your trivial examples, you say that that's not the real thing, and when I try to think of a real thing, you tell me that I should imagine something trivial instead.

No, you're trying so hard to learn nothing that you don't notice things like "or calculating a checksum, or counting how many lines of text are in the file, or...." (above).

Kevin wrote:
When the file has finished loading perhaps a less trivial version of the program breaks out of that message handling loop and starts using a completely different message handling loop (without any of the "no longer needed" file IO stuff) that handles a user interface or something else. This is what I meant earlier when I said "This is like breaking a large finite state machine into a nested "finite state machine of finite state machines"". Sadly, you still think you need a single global state machine as if you never learnt to split complex things into smaller simpler things like most children do, but whatever.
I just expected that if you're willing to take all the cost of making everything async, you would like to do, you know, two things in parallel. Like loading a file and responding to UI events. At the same time, and not only returning to processing UI events when the file has completed loading.

It should be obvious to everyone that a thread can manage an unlimited number of events while also doing work of its own. Sadly...

Kevin wrote:All you've really been showing here so far is that even with asynchronous events and event loops everywhere, you can still build fully synchronous applications.

Sure - I don't really want to create a more elaborate example because I know you're going to waste my time nitpicking irrelevant details if I do.

Against my better judgement:

Code: Select all

    createProgressBar();

    for each fileNumber in fileNameArray {
        sendMessage(VFS, OPEN_FILE_REQUEST, fileDetailsArray[fileNumber].name);
    }

    do {
        if(stillCalculating) {
            while( stillCalculating && (getMessageWithTimeout(0) == TIMEOUT) ) {
                stillCalculating = doSomeCalculations();
            }
        } else {
            getMessage(0);
        }
        switch(message.type) {
            case PARENT_TERMINATED:
            case TERMINATE:
                exit();
                break;
            case GUI_BUTTON_CLICK:
                switch(message.buttonID) {
                    case PROGRESSBAR_CANCEL:
                        exit();
                        break;
                    case PROGRESSBAR_HELP:
                        if(!helpShowing) {
                            createHelpDialogBox();
                            helpShowing = true;
                        }
                        break;
                    case HELP_CLOSED:
                        helpShowing = false;
                        break;
                }
                break;
            case OPEN_FILE_REPLY:
                if(message.status != OK) exit();
                fileNumber = message.fileNumber;
                fileDetailsArray[fileNumber].fileSize = message.fileSize;
                sendMessage(VFS, READ_FILE_REQUEST, fileNumber, 0);
                break;
            case READ_FILE_REPLY:
                if(message.status != OK) exit();
                handleData();
                if(fileDetailsArray[fileNumber].filePos < fileDetailsArray[fileNumber].fileSize) {
                    sendMessage(VFS, READ_FILE_REQUEST, fileNumber, fileDetailsArray[fileNumber].filePos);
                } else {
                    sendMessage(VFS, CLOSE_FILE_REQUEST, fileNumber);
                    readFiles++;
                    redrawProgressBar();
                }
                break;
            case CLOSE_FILE_REPLY:
                break;
        }
    } while(readFiles < totalFiles);
    destroyProgressBar();

This would read and do something with an unlimited number of files, while handling a progress bar (to show how many files are left) and creating/destroying a "help" dialog box, while doing heavy processing/calculations; all in a single thread (even though GUI stuff should probably be in a higher priority thread than heavy processing). There may be more threads doing other things at the same time. This thread may do other things before or after (e.g. maybe it's just a loading screen for a large application).

Cheers,

Brendan

Posted: **Sat Oct 01, 2016 9:42 pm**

I have created a new thread for further discussion of this topic, in which I proposed a specific problem case for testing and comparing models of asynchronous event handling. Shall we adjourn to there?

Posted: **Sun Oct 02, 2016 6:01 am**

Brendan wrote:
Kevin wrote:
Brendan wrote:I didn't realise there was a "correct" for pseudo-code. The synchronous version isn't extendable in any way that leads to parallelism, and (as I've already explained) I'm not even sure it's possible to make my trivial example equally crippled.
Start another thread running the same code. Voilà, parallelism.
Unless you have an actual reason to want another thread (e.g. different priority work, needing another CPU); additional threads add more bloat for nothing (memory for stack, thread specific storage, message queues; thread switches between the thread you could've used and the unnecessary thread/s, etc).

That's a different discussion. You may claim that it would be suboptimal, but you can hardly say this isn't "any way that leads to parallelism".

In the end, the choice will be a tradeoff between multiple criteria, amongst them performance and code complexity. You don't seem to mind complexity, so you would only create new threads for different reasons, but for other people the tradeoff may look different.

Kevin wrote:In other words, handle_data() is trivial. Maybe you can finally decide whether you want to talk about more complex scenarios or if you don't? Every time I refer to your trivial examples, you say that that's not the real thing, and when I try to think of a real thing, you tell me that I should imagine something trivial instead.
No, you're trying so hard to learn nothing that you don't notice things like "or calculating a checksum, or counting how many lines of text are in the file, or...." (above).

As long as asynchronous operation is the focus of the discussion, calculating checksums, couting lines of text or doing anything else that doesn't involve messages is a trivial operation.

Sure - I don't really want to create a more elaborate example because I know you're going to waste my time nitpicking irrelevant details if I do.

Against my better judgement: [...]

I'm not sure if you will consider it nitpicking, but the most important finding for me in this piece of code is that you still have a single global event loop. Not because the calculation, the help dialog and processing the file were so tightly coupled - they are almost completely independent and the only common thing is that they are running at the same time - but because you have to in order to keep them running at the same time. There is no way to use your "state machine of state machines" here to simplify things.

For comparison, written with a threaded model, this needs almost exactly the same number of lines, but I would consider it much more readable and easier to understand because you get a lot more locality:

Code: Select all

func process_file(filename)                                                                            
{                                                                                                      
    ret = vfs_open_file(filename, &fd, &fileSize);                                                     
    if (ret != OK)
        exit();                                                                                        
                                                                                                       
    filepos = 0;                                                                                       
    do {
        ret = vfs_read_file(fd, filePos);
        if (ret != OK)
            exit();                                                                                    
        handleData();                                                                                  
    } while(filePos < fileSize);                                                                       
    
    vfs_close_file(fd);
    readFiles++;
    redrawProgressBar();                                                                               
}           
            
func calculation()
{           
    while (doSomeCalculations());
}                   
                    
func progressbar_cancel()
{                   
    exit();             
}                       
                    
func progressbar_help()
{               
    static helpShowing = false; 
    if (!helpShowing) {
        helpShowing = true;                                                                            
        helpDialogBox();                                                                               
        helpShowing = false;
    }       
}           
            
createProgressBar();
register_callback(GUI_BUTTON_CLICK, PROGRESSBAR_CANCEL, progressbar_cancel);                           
register_popup_thread(GUI_BUTTON_CLICK, PROGRESSBAR_HELP, progressbar_help);                           
            
for each fileNumber in fileNameArray {                                                                 
    thread_create(process_file, fileDetailsArray[fileNumber].name);
}           
thread_create(calculation);
                
while (readFiles < totalFiles) {
    /* Wait for event (including "thread exit") and run registered handler */                          
    mainLoop();
}       
            
destroyProgressBar();

Posted: **Sun Oct 02, 2016 9:28 am**

Hi,

Kevin wrote:I'm not sure if you will consider it nitpicking, but the most important finding for me in this piece of code is that you still have a single global event loop. Not because the calculation, the help dialog and processing the file were so tightly coupled - they are almost completely independent and the only common thing is that they are running at the same time - but because you have to in order to keep them running at the same time. There is no way to use your "state machine of state machines" here to simplify things.

There's no reason you couldn't have a "handleFileIOmessages()" function and a "handleGUImessages()" function, and use the message sender ID to determine which to call, and shift all the file IO stuff into its own/separate area, etc. You could even have a later/separate message handling loop that re-uses these functions (e.g. if you want the progress bar to stay after all the file IO is done).

Of course if you wanted to waste your time making it slower, you could build a generic "callback manager" on top (e.g. where you have one function to register a callback and another to discard/delete a previously registered callback), and never (explicitly) deal with a message handling loop after that.

Kevin wrote:For comparison, written with a threaded model, this needs almost exactly the same number of lines, but I would consider it much more readable and easier to understand because you get a lot more locality:

Except now it's inefficient because of all the pointless threads and excessive thread switching. Worse, now you're going to need to deal with concurrency control (locks, mutexes, etc) for everything mutable that's shared between threads, and for anything non-trivial getting the concurrency control right is going to be far more painful (and error-prone, in a "heisenbug" way) than "state machine" ever was.

Cheers,

Brendan

Posted: **Sun Oct 02, 2016 11:06 am**

Brendan wrote:Of course if you wanted to waste your time making it slower, you could build a generic "callback manager" on top (e.g. where you have one function to register a callback and another to discard/delete a previously registered callback), and never (explicitly) deal with a message handling loop after that.

Or, you could reuse the same dynamic state you need anyway to keep track of what you're planning to do when you receive a particular message, wrap it up with a language feature (async/await), and save time without losing any performance.

Brendan wrote:Except now it's inefficient because of all the pointless threads and excessive thread switching. Worse, now you're going to need to deal with concurrency control (locks, mutexes, etc) for everything mutable that's shared between threads, and for anything non-trivial getting the concurrency control right is going to be far more painful (and error-prone, in a "heisenbug" way) than "state machine" ever was.

Write it with async/await and you're back to no threads and no synchronization.

Posted: **Sun Oct 02, 2016 11:21 am**

In a pure asynchronous system there's no thread switching, a "thread" is just a function call and has no state on its own to be preserved, so the only context switching becomes the process itself instead. Actually this would be exactly the same thing as the worker threads idea (where a "control thread" issues tasks to whatever thread is idling), just enforced at the system level.

Hmmm, I suppose the real problem here is kernel/userspace switching actually (which if I recall correctly is indeed quite expensive on x86). I guess you could make it so the operating system provides its own "main loop" running in the userspace part of the process, leaving switching only for syscalls. Actually the part about buffering syscalls here would be useful, software would not expect them to work immediately due to the async nature, and buffering them like that would allow for multiple syscalls to be queued up to be sent all at once (reducing the switching).

I also wonder how all this would play out on a distributed system instead of one with just a CPU.

Posted: **Sun Oct 02, 2016 3:10 pm**

Sik wrote:Actually the part about buffering syscalls here would be useful, software would not expect them to work immediately due to the async nature, and buffering them like that would allow for multiple syscalls to be queued up to be sent all at once (reducing the switching).

I would guess for many cases the additional latency before actually issuing the syscall would destroy any advantages that you get by batching.

OSDev.org

the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX

Re: the suitability of microkernels for POSIX