"Connectionless" file I/O (no open()/close())

NickJohnson · Post by **NickJohnson** » Wed Jul 01, 2009 6:41 pm

I was working on the design for I/O in my system (see this thread for details), when I realized that if I made it so files didn't have to be opened before they were read, it would radically simplify many mechanisms, especially facilities for closing files on process exit/crash. The idea would be to have files be sort of "connectionless", like UDP packets, and have read and write calls be security checked by the drivers at every instance instead of once. I don't see much of a speed penalty to this, because my security system is quite simple to enforce (and the reduction in other areas would greatly outweigh it).

The only thing that I foresee as a problem is that because files cannot be "open" or "closed", they also cannot be easily locked. It would still be possible to get around this by giving drivers a mechanism to restrict reads and/or writes to a specific pid instead of a uid or gid, thereby effectively locking the file.

I'm talking exclusively about mechanism as well, not interface - open() and close() calls would exist in the C library and be necessary for handling of usermode file structures - it's just that the system itself would not use the concept of files being "open" or "closed".

Are there any other caveats to having such a "connectionless" I/O architecture? And if it really is this easy, why are there no other systems like this?

piranha · Post by **piranha** » Wed Jul 01, 2009 8:20 pm

On one hand, yes, it makes things easier in the front end. But in the long run...what happens of one process deletes a file and then another reads from it at the same time? Theres going to have to be a lot of checking, and locking and spinlock thingies to make it work without megadeath.

In my opinion, open() and close() are useful (kernel can keep track of resources, you know that files can or cannot be deleted at certain times, and having file descriptors can make other things easier (read 0, instead of read (/devices/keyb). Also, what happens of a device is renamed? Using file descriptors, it wont matter. But otherwise, you're going to have to go through every line of code that references that device and change it.

My opinion is that open() and close() are useful, and are not that difficult to implement (per process: file table. globally: master table. Generally: error checking, and space allocation, simple things.)

If you're going for POSIX, use open and close. Otherwise, I would still suggest them. You never know when you may need to see what processes are using what device.

-JL

NickJohnson · Post by **NickJohnson** » Wed Jul 01, 2009 9:50 pm

piranha wrote:On one hand, yes, it makes things easier in the front end. But in the long run...what happens of one process deletes a file and then another reads from it at the same time? Theres going to have to be a lot of checking, and locking and spinlock thingies to make it work without megadeath.

But if you want to have a multithreaded filesystem implementation (which the "at the same time" implies), you're going to have to use various complex locking methods anyway, right? If one process deletes a file, then another reads from it, the reader will just get an error.

piranha wrote:In my opinion, open() and close() are useful (kernel can keep track of resources, you know that files can or cannot be deleted at certain times, and having file descriptors can make other things easier (read 0, instead of read (/devices/keyb). Also, what happens of a device is renamed? Using file descriptors, it wont matter. But otherwise, you're going to have to go through every line of code that references that device and change it.

Actually, the way I have it, the system *does* use a sort of file descriptor model, even though it doesn't make connections. Doing a VFS query every access would be insane. My VFS is sort of like a phonebook - the user process uses a call called find() (directed to the VFS) to get a driver pid and inode number for a specific file. This information is then kept by the process like a file descriptor, but in userspace. If the name of a file changes, it's inode does not, so that is actually a non-issue. It's kind of like each process keeps a table of hardlinks to files.

I can also easily adapt the interface to a POSIX compliant one, because (as I said) the open() and close() functions in the libc are still needed, but only for a processes' own bookkeeping. The connectionless-ness is on the driver level, not the API level. But I do see how it would be useful to keep track of which processes are using which files - but probably not useful enough to outweigh the advantages of this idea (esp. in context of some difficulties with my design and connections, mostly having to do with the IPC method).

piranha · Post by **piranha** » Wed Jul 01, 2009 11:28 pm

Yeah, but it takes more error checking, and in my opinion will be harder to coordinate. If a file is being read, and another tries to delete it, it should not be deleted, so you need a system to do that.

-JL

NickJohnson · Post by **NickJohnson** » Thu Jul 02, 2009 12:06 am

But why would you necessarily need to make sure that one process doesn't delete another processes' file? There's no absolute reason that policy needs to be maintained. A process that is able to remove a file is already able to write to said file, and therefore can interfere with the reader either way. If a process really needs to lock a file, locking mechanisms are still perfectly possible, but just a bit more risky if the locker crashes (a problem that could be solved by occasional polling on the driver's part, because locks would be infrequent).

I'm also not sure what you mean by "hard to coordinate". The difference between this and a connection oriented system is that there is no coordination *at all* - nobody has to keep track of anyone else, except for some pids. There is no code or data at all that keeps track of connections - that's all replaced with a bit of extra security checking. What's faster, comparing a couple of ID numbers, or searching for a file descriptor and comparing an approximately equal amount of permission flags? And regardless of scale (which favors the former anyway), one is O(1) and another is probably O(nlgn).

Brendan · Post by **Brendan** » Thu Jul 02, 2009 1:13 am

Hi,

piranha wrote:Yeah, but it takes more error checking, and in my opinion will be harder to coordinate. If a file is being read, and another tries to delete it, it should not be deleted, so you need a system to do that.

Most OS's put a lot of work into improving "throughput". This includes having a reference count on each file (in the VFS), so that one process can ask for a file to be deleted while other processes are using the file - basically the file isn't actually deleted until the reference count becomes zero.

However, improving throughput also includes file sharing. For example, 3 processes should be allowed to read from the same file at the same time. This only works properly because of open() and close(), because the OS knows if a process intends to write to a file (even if it hasn't done a write yet), and the OS knows exactly how many processes are using a file, and exactly when each process finishes using the file. Some OSs even have "append" access (so that one process can write to the end of a file while other processes are reading it, which is good for things like logs).

Without open/close it doesn't work.

Imagine if 3 processes start reading from a file. This doesn't cause any problems so you allow the reads. Then one of the processes tries to do a write. Do you allow this write? You don't know if other processes are still reading from the file or not (no "close()"), so if you refuse to allow the write you may be refusing for no reason at all, and if you do refuse to allow the write then how long do you refuse for (until nobody has read for some arbitrary length of time, until the other processes terminate, until hell freezes over)? If you allow the write, then what happens if the other processes haven't finished reading? Do they get inconsistent data (half old and half new)? Do you refuse the reads from the other processes after a write occurs (and if so, how long for)?

Even if you get rid of all file sharing, then it still doesn't work because you've got no idea when any process has finished using any file (and when it's safe to allow another process to access the file). Maybe if a process accesses a file you could refuse to allow any other processes from accessing the same file until the first process has terminated; but this would severely cripple the OS (in terms of file sharing "throughput"), and it would also create some extra "denial of service" problems (e.g. a process that reads one byte from as many files as it can then repeatedly does "sleep()", which prevent any process from accessing any of these files forever).

Cheers,

Brendan

Combuster · Post by **Combuster** » Thu Jul 02, 2009 1:16 am

ABA Problem
Not being able to have exclusive access to resources
Having to pass the filename every syscall.

open/close methods would be O(1) when accessing data, since the input range is restricted and you can often use lookup tables. connectionless methods would be O(nlogn) since you can't "index" a filename. And then you still have to look up the file in some permission structure.

Edit: Brendan beat me to some of the points

jal · Post by **jal** » Thu Jul 02, 2009 4:52 am

Though I agree with Brendan and Combuster on traditional file systems, a database approach would circumvent these problems. Yes, it creates other problems, but those have been solved various ways in the various database designs.

Even for traditional file systems, there may be no need to pass file names every time: one could do a lookup to get a GUID, and use the GUID instead. Very often, an application doesn't really care whether a file was changed. And if it does, it could always establish an exclusive lock, or a "change protection". The first would deny writing, the second could be a 'copy on write' approach, the old version being retained until the applcation either explicitly yields or exits (in case of a versioning file system, this would become a moot point, as writing is always done to a different/new version).

So, in conclusion, I basically agree with the OP, and though Brendan and Combuster make valid points, they are not the most innovative (when it comes to OS design) persons, imho.

JAL

Combuster · Post by **Combuster** » Thu Jul 02, 2009 5:58 am

one could do a lookup to get a GUID and use the GUID instead. Very often, an application doesn't really care whether a file was changed. And if it does, it could always establish an exclusive lock

In other words, add one syscall that does open() read() close() in one go, next to the separate open() (get lock) and close() (release lock) syscalls.

So, in conclusion, I basically agree with the OP, and though Brendan and Combuster make valid points, they are not the most innovative (when it comes to OS design) persons, imho.

Ignoring the obvious troll, I hope that the OP realizes that his design was braindead (for not accounting for any of the disadvantages) It is technically *not* possible to make a system either without cleanup or with any form of locking/transactions, and that all obvious ways to fix things involves the use of adding open/close calls.

I'd almost bet a beer on it that the OP didn't think of a transactional model when he drew up that concept.

Brendan · Post by **Brendan** » Thu Jul 02, 2009 5:59 am

Hi,

jal wrote:Though I agree with Brendan and Combuster on traditional file systems, a database approach would circumvent these problems. Yes, it creates other problems, but those have been solved various ways in the various database designs.

I don't see how a database approach changes anything at all, unless you're talking of caching all writes and doing an atomic commit (if any data that was relied on remained unchanged). For databases this works because fields are typically small, but for a filesystem you may need to cache 50 GiB of "pending writes" in 2 GiB of RAM.

jal wrote:Even for traditional file systems, there may be no need to pass file names every time: one could do a lookup to get a GUID, and use the GUID instead. Very often, an application doesn't really care whether a file was changed. And if it does, it could always establish an exclusive lock, or a "change protection".

So instead of calling "open()" and later on calling "close()", you call "get_GUID()" and "get_lock()" and later on you call "free_lock()", and that's meant to be an improvement? Why not combine them into a single "get_GUID_and_lock()" function so that it's functionally exactly the same as "open()"?

jal wrote:So, in conclusion, I basically agree with the OP, and though Brendan and Combuster make valid points, they are not the most innovative (when it comes to OS design) persons, imho.

Innovative is good, but innovative *and* practical is even better. If you start with something innovative (but not practical), and then someone finds all the practical problems with it, and then you find solutions to all of these practical problems, *then* you've got something that's both innovative and practical. Of course if nobody mentions the practical problems then eventually (after wasting several years) you'll find them the hard way...

Cheers,

Brendan

ru2aqare · Post by **ru2aqare** » Thu Jul 02, 2009 6:11 am

Connectionless (UDP) network transfers work because 1) noone else has access to the data stream between your computer and the remote host (network sniffers and other issues notwithstanding) and 2) you can't seek on a network stream. Whereas files can be used by more than one entity at the same time (think log file producer and consumer processes) and you can seek to an arbitrary offset in files. You also need "open" and "close" operations on a network stream, they are just named differently ("connect" and "close" for example).

If there were no "open" operations in your system, how would you create a brand new empty file? How would you acquire a handle to an existing file (assuming whatever uses the files resides in user space, and you don't want to give it access to the kernel's data structures).

Solar · Post by **Solar** » Thu Jul 02, 2009 6:48 am

Innovative does not necessarily equal effective... I'd like to see a cost / benefit comparison here.

NickJohnson · Post by **NickJohnson** » Thu Jul 02, 2009 11:01 am

Phew, I shouldn't have slept in so long - I could have given a little more rapid response.

Brendan wrote:Imagine if 3 processes start reading from a file. This doesn't cause any problems so you allow the reads. Then one of the processes tries to do a write. Do you allow this write? You don't know if other processes are still reading from the file or not (no "close()"), so if you refuse to allow the write you may be refusing for no reason at all, and if you do refuse to allow the write then how long do you refuse for (until nobody has read for some arbitrary length of time, until the other processes terminate, until hell freezes over)? If you allow the write, then what happens if the other processes haven't finished reading? Do they get inconsistent data (half old and half new)? Do you refuse the reads from the other processes after a write occurs (and if so, how long for)?

Even if you get rid of all file sharing, then it still doesn't work because you've got no idea when any process has finished using any file (and when it's safe to allow another process to access the file). Maybe if a process accesses a file you could refuse to allow any other processes from accessing the same file until the first process has terminated; but this would severely cripple the OS (in terms of file sharing "throughput"), and it would also create some extra "denial of service" problems (e.g. a process that reads one byte from as many files as it can then repeatedly does "sleep()", which prevent any process from accessing any of these files forever).

I would probably handle the first situation like this: all reads are allowed all the times (assuming appropriate permissions), and so are writes (writes include deletes), but if a process performs a write operation, it gains an implicit lock on the file (or possibly only the modified section of it). If another process tries to write, the driver queries the first writer on it's use of that file, and then decides whether to release the lock. Remember, each process does keep track of open and closed files, but neither the kernel nor the drivers do - *that* is my limitation. If the writer performs a write, and the modifications are being read by a reader, the reader does the read, but is informed of the possibly inconsistent data.

You could potentially do a DoS using the write locks by writing to a bunch of files and then purposefully not relinquishing the locks, but that could be done equally with open(), and if that process is killed the locks are effectively relinquished. But this would not happen by accident, because the libc (and/or libsys) is responsible for the file handler table, and would respond truthfully to any driver queries.

Combuster wrote:ABA Problem
Not being able to have exclusive access to resources
Having to pass the filename every syscall.

ABA Problem - fixed by driver-level synchronization notification
Exclusive write access is totally possible (see paragraph above)
Filenames are not passed every syscall - only inode numbers

Brendan wrote: jal wrote:
Though I agree with Brendan and Combuster on traditional file systems, a database approach would circumvent these problems. Yes, it creates other problems, but those have been solved various ways in the various database designs.

I don't see how a database approach changes anything at all, unless you're talking of caching all writes and doing an atomic commit (if any data that was relied on remained unchanged). For databases this works because fields are typically small, but for a filesystem you may need to cache 50 GiB of "pending writes" in 2 GiB of RAM.

I think he meant that you could have a process that sits atop a file and does all I/O for other processes - but that would require open() and close() style things to work, and therefore defeats the original purpose.

ru2aqare wrote:Connectionless (UDP) network transfers work because 1) noone else has access to the data stream between your computer and the remote host (network sniffers and other issues notwithstanding) and 2) you can't seek on a network stream. Whereas files can be used by more than one entity at the same time (think log file producer and consumer processes) and you can seek to an arbitrary offset in files. You also need "open" and "close" operations on a network stream, they are just named differently ("connect" and "close" for example).

If there were no "open" operations in your system, how would you create a brand new empty file? How would you acquire a handle to an existing file (assuming whatever uses the files resides in user space, and you don't want to give it access to the kernel's data structures).

But connect() and close() are only relevant to the socket on your machine in terms of a UDP transfer, not the connection itself - UDP in no way makes a "connection" abstraction at any point. I was only giving that as sort of an analogy anyway to explain my use of the word "connectionless". So TCP : UDP :: open/close : connectionless I/O, but UDP != connectionless I/O.

I would also provide a pair of system calls that would equate to touch and rm, to create and remove files. You could easily use that facility to create something like a pipe - you use touch on a pipe driver, it gives you a inode but does not inform the VFS (so it doesn't appear anywhere, and has no name), and then you pass that inode to a child process. That way, it's actually *more* flexible - it decouples creation and implicit naming due to the fact that you don't need to open() the file again.

Solar wrote:Innovative does not necessarily equal effective... I'd like to see a cost / benefit comparison here.

Okay, here's what I see as a cost/benefit weighing:

Advantages:
- fewer system calls (more compact interface)
- no tracking of connections (less data and code)
- no fixed maximum number of "open" files
- no closing on exit/crash (*very* important for my design, which makes this hard otherwise)
- higher throughput on writers with many readers due to worse-is-better error handling
- gives same interface on C API level (could be made POSIX compliant)

Disadvantages:
- unorthodox in general (people will think you're insane, mostly)
- requires rapid, synchronous, preemptible IPC (which I have)
- write lock release requires some cooperation (if process is not dead)
- more complex error handling for readers (could be wrapped in libc)
- strange interface for assembly programmers

And although innovative definitely doesn't mean effective, this whole open()/close() concept has been around since the dawn of UNIX, and I've never seen it questioned. Leaving any basic design untouched for that long, regardless of how good it is, is not a good idea. Plus, I think I may be able to make the connectionless design work, and quite well too.

Combuster · Post by **Combuster** » Thu Jul 02, 2009 11:44 am

Disadvantage:
- The ABA problem (and any other synchronisation problem) still exist. A thread can't know that another process modified the file and has since released the lock.

- The only way to synchronously read is to overwrite it first.

- (indefinite) stalls on a write access under lock contention situations.

NickJohnson · Post by **NickJohnson** » Thu Jul 02, 2009 12:58 pm

Combuster wrote:- The ABA problem (and any other synchronisation problem) still exist. A thread can't know that another process modified the file and has since released the lock.

But the driver could easily keep track of which parts of the file have changed, independent of any locking mechanism, and inform the reader that the data may be corrupt from concurrent changes.

Combuster wrote:- The only way to synchronously read is to overwrite it first.

Why would this be? Doesn't the concurrent change notification make it so the reader can make sure it is reading a fully up-to-date copy of the file (by retrying until there is no error)?

Combuster wrote:- (indefinite) stalls on a write access under lock contention situations.

Yes, but those would be unlikely, and could happen on any system with a locking mechanism. It wouldn't have to stall - the write request could just return an error if denied by the locker.

OSDev.org

"Connectionless" file I/O (no open()/close())

"Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())

Re: "Connectionless" file I/O (no open()/close())