OSDev.org

Posted: **Tue Feb 25, 2014 1:33 pm**

Hi,

I just implemented this weird thing and I'm not sure yet if it is a good idea and would like some opinions. First I need to tell you how my OS works to make it clear what I did.

A task can issue a read syscall that will read content of a file. There are basically two types of files, let's call them slow and fast files (I know UNIX has some sort of equivalent). If you read a fast file the kernel will just copy the content to the provided buffer of the read syscall and the task will resume. Then there are slow files, these are files that we really dont know the length of. An example would be a file corresponding to data sent from the keyboard. If a task reads this file it will get data only when the user presses a key on the keyboard.

The return value of the read syscall is the number of bytes that have been read. A fast file can return 0 or n number of bytes, a slow file can only return n number of bytes but never 0. This is because on slow files, if there is no data available the task will be suspended in a blocking IO state and can never continue until data is available. The only way for a task to get in this state is by issuing a read syscall on a slow file, there is no other way.

Now, when data has become available, the task will continue executing. In a normal operating system I assume at this point, data must have been copied behind the scenes to the task's buffer that was previously provided in the read syscall or else it would make no sense in continuing because the buffer would still be empty. Here is where it get's weird(er). Because I thought it would require a lot of fiddling to make the copy to userspace behind the scenes like that, what if I just rewind the instruction pointer of the task so that it will re-issue the same syscall again the first thing it does when it wakes up and this time it wont get blocked because now there is data to fetch. I actually got this to work and thought it was a pretty smooth solution. It might seem weird to have a task issue the same syscall twice, before and after it was blocked, but at the same time I don't need to do any code behind the scenes.

So what do you guys think?

Posted: **Tue Feb 25, 2014 2:48 pm**

Please note that Unix read() never returns 0 unless it is end-of-file. It promises to do at least 1 byte of input, or fail with an error of -1, or return 0 signifying an end-of-file condition.

I recommend you implement actual blocking system calls rather than whatever you have going here. I experimented with something similar when I started out (though, I didn't return to user-space as such) and it blew up in a monstrosity of cooperative multitasking that it took many months to dismantle and redo properly. Instead, you should simple have a small kernel stack for each thread and allow each thread to be either in user-space or in kernel-mode. You can then simply implement blocking system calls by kernel threads waiting for a condition variable or something traditional. This is much simpler and much nice.

Posted: **Tue Feb 25, 2014 4:52 pm**

Typically you might have both syscalls end up in the same place in the kernel, and the 'slow files' block (perhaps on a mutex/semaphore) until there's data available, at which point the same code path is executed, and then the syscall returns. For 'fast files', blocking will not occur. For 'slow files', blocking will occur. Because this all happens on the kernel side of the syscall, there's no need to retry syscalls in userspace (by playing with IP etc).

Have you considered such a design?

sortie wrote:It promises to do at least 1 byte of input, or fail with an error of -1, or return 0 signifying an end-of-file condition.

Just to build on this - if you're working with a non-blocking descriptor, and the read would block (eg, reading from a socket with no data pending), you'll get back -1 with errno=EAGAIN.

Because I thought it would require a lot of fiddling to make the copy to userspace behind the scenes like that, what if I just rewind the instruction pointer of the task so that it will re-issue the same syscall again the first thing it does when it wakes up and this time it wont get blocked because now there is data to fetch. I actually got this to work and thought it was a pretty smooth solution.

This seems technically possible, but it's worth noting that you'll want to make sure that winding back the instruction pointer doesn't cause memory leaks or create an inconsistent state.

The "POSIX" way to retry a syscall is to return -1 with errno=EINTR, which the application is supposed to handle by trying to perform the syscall again. Granted, that is typically caused by signals, but there's no reason why you couldn't have a similar concept.

Posted: **Tue Feb 25, 2014 5:26 pm**

In fact even 'fast files' need to block when disk I/O needs to be performed.

To get reasonable performance you will need to queue your disk I/O until the disk subsystem is ready and wake up the reader when the I/O is complete. If you treat disk reads and writes as synchronous your system will feel very slow.

Posted: **Tue Feb 25, 2014 5:35 pm**

Err, EINTR means that the blocking system call was interrupted at the signal handler didn't have the `automatically resume the system call' flag set. It doesn't mean that the system call should be interrupted, rather that the code got the result that it wanted: The system call as interrupted by a signal. What that means is up to the program. It's not uncommon for a program to do something else than resume it.

There is no POSIX error code that lets the kernel tell user-space (libc) that the system call should be resume, as that would be operating system internals and the error code would never read the application programmer, and POSIX is an interface meant for application programmers to use.

Posted: **Tue Feb 25, 2014 5:42 pm**

Ah yes, I forgot about sigaction and SA_RESTART. Thanks for picking that up

My wording of "application is supposed to handle" should have been "application can choose to handle".

Posted: **Wed Feb 26, 2014 1:38 am**

What you describe is quite what an interrupt would do, it would set the execution of the task at a particular point and interrupt the rest until the handler is done, what you try to do is a bit this equivalent using thread and semaphore signaling ?

I thought about doing something similar at some point. To have a system that is able to resume a thread execution at a specific point , and then resuming where it left up after this function has been executed. In order to execute something to handle some io event before to resume the task execution, to handle this kind of scenario where the task block waiting for some well defined event, to be able to have the task to process the event in a specific function before to resume normally.

I thought it could be a cool way to have something similar to the asynchtask mechanism on android, to have a call back in the task that is resumed at when the background task is finished, it can be used to do any pre processing on data that the task need before to resume its execution at the point it left

Or to have some kind of interupt - like handlers that can be executed in the 'waiting task' when the io task is done before to resume it normally. In case there is this sort of blocking io, it would then become a non blocking io, but a specific handler would be "called" or resumed at when the i/o task is finished. The code to handle the io data would normally 'wake up' and execute the code that was needing the data to be there in a sort of pre switch call back, before to switch back normally in the task where it left, a bit like an interrupt.

If the application need to do a read on a 'slow file' then using the data to do something, either it has to wait for another task to complete, or either wait for an interrupt to fired, before the data available.

If you can already separate the bit of code that need to be executed when the 'io task' has completed, you can put this code in a separate function, and sending a pointer to this function to the read call.

The read call will store this function pointer associated with task and the io stream. When the io stream data is available, either from an interrupt or another task, then this function pointer is shedule for execution into the 'waiting'/'reading' task, and then the task will resume at the address of the 'call back', for it to be executed when the execution will switch back to the 'waiting'/'reading' thread.

When the task scheduler will switch back to the 'waiting'/'reading', it will add the resume code address into the stack as return value, then resume execution at the address of the 'call back' function, and then return to the normal running code address after the routine to handle the io result has been executed.

From the point of view of the code to be executed when the data is available, it will look exactly like if the thread actually blocked before to execute it, and for the rest of application, it will look like the routine has been executed as an interrupt when the data is available.

OSDev.org

Opinion on method of blocking IO

Opinion on method of blocking IO

Re: Opinion on method of blocking IO

Re: Opinion on method of blocking IO

Re: Opinion on method of blocking IO

Re: Opinion on method of blocking IO

Re: Opinion on method of blocking IO

Re: Opinion on method of blocking IO