Asynchronous and/or non-blocking IO for disks?
Posted: Sun Apr 22, 2007 11:58 am
I'm currently wondering how to handle asynchronous I/O for file-systems. For stuff like pipes or sockets it's easy to notify a process that all data was sent, or that there's new data to be read. In fact, this works quite beautifully in my system right now. However, even with sockets, there's some issues if one wants to build the system in completely asynchronous way: operations like TCP connect() can take a long time. I'm not worried about those though, because that's relatively easy to solve: just add a non-blocking connect() and then report the success/failure as an event, or something..
However, file-systems are nasty, for several reasons actually. Assuming we have an open file, then reading/writing is more or less instant, if all the required data is in buffer caches, so there's hardly any point to make such operations non-blocking. But what to do when actual disk I/O (most of the time reading) into the buffer cache needs to be done? That would be an I/O wait one would like to make non-blocking... but how? Would it be acceptable to fail the request with "would block", then start the I/O operation, and then notify the process when the I/O operation completes? Should there be some guarantee about not losing the buffer cache before the process has a chance to attempt to read()/write()? A kernel side file-descriptor specific buffer could solve this, but it's another level of indirection then...
Maybe non-blocking file I/O is ill-defined idea? Maybe one should use special asynchronous I/O directly to user-space buffers instead? That would be relatively easy to do, and would avoid most problems with non-blocking..
But what to do with directory-lookups and stuff like open() then? Resolving a path on a floppy, or over network, could well take enough time that I'd rather not unconditionally block a thread for the whole time... so ehm? Ideas? With readdir() I think it's possible to simply read-ahead one entry asynchronously, and then fail if the read-ahead isn't ready, but open() would need to use connect() like partially open file descriptors or something, and what to do with other system calls which don't normally need any sort of file descriptors at all (rename(), link(), mkdir(), etc)??
Why do I care? Well, I'd like to avoid "hung" applications that are waiting for the kernel to do something, which will take a day, because the kernel needs to do retries and what not, because stuff isn't going right, and there's no way for the user to cancel the operation because we're stuck in kernel.
Maybe I just should forget POSIX like interface for good? Maybe that would help?
However, file-systems are nasty, for several reasons actually. Assuming we have an open file, then reading/writing is more or less instant, if all the required data is in buffer caches, so there's hardly any point to make such operations non-blocking. But what to do when actual disk I/O (most of the time reading) into the buffer cache needs to be done? That would be an I/O wait one would like to make non-blocking... but how? Would it be acceptable to fail the request with "would block", then start the I/O operation, and then notify the process when the I/O operation completes? Should there be some guarantee about not losing the buffer cache before the process has a chance to attempt to read()/write()? A kernel side file-descriptor specific buffer could solve this, but it's another level of indirection then...
Maybe non-blocking file I/O is ill-defined idea? Maybe one should use special asynchronous I/O directly to user-space buffers instead? That would be relatively easy to do, and would avoid most problems with non-blocking..
But what to do with directory-lookups and stuff like open() then? Resolving a path on a floppy, or over network, could well take enough time that I'd rather not unconditionally block a thread for the whole time... so ehm? Ideas? With readdir() I think it's possible to simply read-ahead one entry asynchronously, and then fail if the read-ahead isn't ready, but open() would need to use connect() like partially open file descriptors or something, and what to do with other system calls which don't normally need any sort of file descriptors at all (rename(), link(), mkdir(), etc)??
Why do I care? Well, I'd like to avoid "hung" applications that are waiting for the kernel to do something, which will take a day, because the kernel needs to do retries and what not, because stuff isn't going right, and there's no way for the user to cancel the operation because we're stuck in kernel.
Maybe I just should forget POSIX like interface for good? Maybe that would help?