Design of file access system calls

Candy · Post by **Candy** » Wed Aug 25, 2004 1:53 am

This isn't a poll because I'd like some open answers instead of a fixed voting on a set of things I've thought up.

How would you like to access files from a application development point of view? The current ideas:

POSIX-style: fopen returning a FILE * structure with information about it, using that to fread, fclose, fwrite, fgetc etc.
Linux-style: returning an FD that can be used with read/write/close etc.
Windows-style: I don't know but I guess it's similar to Linux
Mmap-style: Memory mapping a file that may/may not grow, with a maximum size, and using mainly memory mapped files (only?).

I'm hoping for a few new fresh ideas, especially from the application developers perspective, how you would like to access your files. Suggestions:

A write_at() function that writes into a file without opening/closing semantics, thread safe
An append_to() function that appends a given string to a file without open/close, and no thread problems
Locked for writing, locked for multiple readers or Do-it-yourself file protection for file opening/closing?
Special modes, protection, access denial/giving
Using the library functions to access remote files in a uniform way (the prot:// format being the most obvious, with #program and ~user for program and user sections, # and ~ aliased to the current prog/user)
Allowing transparent caching of files, so if a file is created, marked as cache, then the file is deleted with the cache-backlink intact,the file can be opened again without any extra work, just like it's a local file

What sort of naming system would you use to make every file / device unique, or more or less unique? Say, when playing a game you want the videos / audio on harddisk for speed, but when not playing the game you might want to delete them for space for encoding. How would you form the backlink for that file (cdrom!Civ3CD://... ?)

Do you consider the naming scheme of Windows or of Linux to be most intuitive? What do you think of a naming scheme where each device would be content-adressed, such as the Civ3CD example right above this being that cd, and not a specific cdrom drive? Of these three, which would you consider best?

Thanks for answering in advance, they've become more questions than I'd expected

Legend · Post by **Legend** » Wed Aug 25, 2004 2:23 am

I think Windows drive letter naming conventions is weak in comparision to mounting devices into the file system tree in linux. Why? I think you can control things easier in Linux without having for example all your programs install to D:
You still install them to /opt/ in most cases (if you want) and the applications don't care if it is a seperate volume or a directory in the root filesystem. Minor thing, but sometime noticeable.

I would go for a C++ api that can fetch files over networks, etc. too, but that is a different story!

However, I would not go for POSIX-style as it does not seem to able to do async I/O.

Solar · Post by **Solar** » Wed Aug 25, 2004 2:33 am

fopen() / fclose() / fprintf() etc. aren't POSIX, they are Standard C. Every C/C++ developer will expect them to be there, and that most likely includes developers of language interpreters (e.g., Perl).

When I implement fopen() (as I currently do for PDCLib), there's a lot I can (and must!) do in the C Library space: Setting up the FILE structure, assigning buffers etc. - but in the end, I have to call the OS kernel to actually open the file. Traditionally, the function I call is open(), which returns a FD (file descriptor, usually an int). That FD gets stored in the FILE structure for later reference.

When I implement fprintf(), I can again do most things in C library space: Assembling the string to print using the format string and variable list, and writing the string to the internal buffer. When the buffer is full, though, or fflush() / fclose() is called, I have to actually write the data, which again I must delegate to the kernel: By getting the FD from FILE, and calling a kernel function traditionally called write(). (It's similar for binary writing using fwrite(), but fprintf() is a better example of stuff done in-library and other stuff being delegated.)

You see, what you called "POSIX style" are actually standard C functions. Perl has another standard for accessing files, and Ruby has another, and Python has yet another...

In the end, though, they all have to call primitives provided by the kernel. Those primitives are completely up to you, the kernel designer, but whatever language you want to port to your kernel, you'll have to do a mapping from the primitives the library / interpreter expects to the primitives you provide.

newlib expects open / read / write / close / ... as specified by POSIX (what you called "Linux style"). glibc does the same. My PDCLib will most likely require quite similar primitives, since at this basic level, there isn't much choice really. (What should a file descriptor be if not an integer? Why should the kernel worry about buffering when every language has its own way of allowing / disallowing / modifying buffering?)

Now, what you call "mmap-style", that's a purely OS-dependent thing. POSIX defines it, Linux provides it, and most languages have some way to access it, but if you don't want your OS to be YAPI (yet another POSIX implementation), you can do whatever you want.

But developers will still expect fopen() to work.

I hope that clarifies it a bit.

Candy · Post by **Candy** » Wed Aug 25, 2004 2:44 am

Solar wrote: You see, what you called "POSIX style" are actually standard C functions. Perl has another standard for accessing files, and Ruby has another, and Python has yet another...

In the end, though, they all have to call primitives provided by the kernel. Those primitives are completely up to you, the kernel designer, but whatever language you want to port to your kernel, you'll have to do a mapping from the primitives the library / interpreter expects to the primitives you provide.

But, the main point is, you can implement those functions on top of open/close/write/read but also on top of mmap. Possibly some other method even (pretend-writes to a segment which represents the file). Which would be the preferred method?

But developers will still expect fopen() to work.

And the stdlib fopen() will work, but based on what? I'm trying to kickstart a discussion since I can't figure out in my mind which would be the best (I only know posix and my mind was poisoned with it before I could think about others, so now I can barely think of any other way) and I was hoping somebody here could help me, or had some really good idea but didn't publish it yet.

As Legend put it,

I would go for a C++ api that can fetch files over networks, etc. too, but that is a different story!

This can be done with fopen too. The default fopen() should work on the default program directory, but if you prepend it with a something:// it might access anything, even http or ftp files. Making the life of the developer as easy as it can (but no easier, no blue/red buttons for him to encode some very complex things, but easy syscalls to take the regular burden off him) is one of my priorities, so having a function that does this is nice.

But, of course, even this can be auto-mmapped and auto-deleted on unmmap.

Pype.Clicker · Post by **Pype.Clicker** » Wed Aug 25, 2004 2:51 am

naming scheme
both windows approach and unix approach have their pro and cons. I like the "A:" much more than "/where_is/floppy_mounted", where each distro have it's favourite mountpoint/naming scheme ... For hard disks, however, i dislike the "C: D: E:" autonaming feature of windows: add a secondary drive to a multipartitions system and the whole stuff get screwed up, renaming "D" to "E", etc without any control of yours ... blah !

As a user, i tend to have "/music" in addition to "/home" and "/install" is one of the most frequent too ... so probably the best for me would be a system where i can describe logical sections that add a 'maximum space consumed by this subdir of this disk' and magically merge content from several disks ... but you know this already

access scheme
The 'write_at()' and 'append_to()' calls are interresting. Another thing that would be great would be the ability to create *connections* between data sources and sinks at kernel level.

For instance, if i wish to copy a file, i no longer write

Code: Select all

   while (!feof(input)) {
        read(buffer,input);
        write(output, buffer);
   }

but instead something like

Code: Select all

    connect(input,output,NO_SIZE_LIMIT);

Which could be especially handy in the case of a microkernel environment so that we directly have FS-server internal operations or FS-to-network direct connection established when providing some data:

Code: Select all

    println output "HTTP/1.1 200 ok";
    println output "Transfer-encoding: chunked";
    connect(resource, output, NO_SIZE_LIMIT|USE_PATTERN,
                 "%(hex:BlockSize)\r\n%(raw:Data)");

FS killer feature
as a programmer, i guess the things i miss the most in the current file systems is the ability to catch at user level a kind of "exception" telling that a program A is trying to access my file F for writing. And yes, i mean it would be kewl to have a copy-on-write option for filesystems.

In the same idea-drawer is kept the dream of being notified of directory events (new file added, file changed, etc). and a consistant API to access file's meta-data

distantvoices · Post by **distantvoices** » Wed Aug 25, 2004 2:52 am

I'd go for both ways: providing the mmap way for those who want to treat the file like an array, and standard read/write for those who just need to do the day to day data saving job.

What I don't see is a wish to perform read/write operations on files without *opening* them previously. after all, the action *open* tells: Hey, Kernel, I want *this* file, give it to me.

How should the kernel treat a *not previously opened* file? should it treat it as if opened? where to put the actual position in the file if not in a file pointer structure (which usually is created upon the first open)? Create it per se upon read on that file - to avoid double and thrice rechecking(does it exist, has the user access rights and so forth) on each read from that file? Ok, the idea has a certain thrill.

But I don't want to have a read/write operation be mmapped per default. I want to have the choice whether to operate on a stream (which would be mmapped) or per record (normal read/write).

Regarding the POSIX/Linux/Windows thing in the user space lib: I don't care as long as it does its job without bugs.

Solar · Post by **Solar** » Wed Aug 25, 2004 2:55 am

As for drive handling... I hate both Windows and Linux, and still yearn for golden Amiga times...

Windows Drive Letters

They shift around when you install new hardware / partitions.

They are installation-dependent. Having a ZIP drive named "F:\" on your system doesn't mean you can rely on a script finding "F:\file.txt" on another system.

They are unintuitive, unless you manually patch "R:\" to be your CD-R and "F:\" to be your compact flash reader.

There are only 26 of them.

Linux Mount Points

They are installation-dependent. Having a ZIP drive mounting to /mnt/zip/ on your system doesn't mean you can rely on a script finding /mnt/zip/file.txt on another system.

They require fiddling with /dev and /etc/fstab, including knowledge of device driver lore, several rather arcane flags, and details of file system lore.

Writing to /mnt/zip/file.txt could just as well not write to your ZIP because you forgot to mount it. There is no error message.

You have no (easy) way of getting a quick overview over installed devices, inserted media, and their used / free capacity.

AmigaOS-alike

Every device has a ("physical") name, like "ZIP", "HD0", "FD0" or whatever, determined by the driver config. That makes thousands of combinations instead of 26, and intuitive ones too. They also aren't likely to shift around when you install new hardware (as /dev/hda or D:\ would).

Every media (or, partition thereof) has a ("logical") name, determined by the user at formatting time.

File names could be qualified using either physical or logical device name.

Having a media "MyProject" allows you to access "MyProject:file.txt" on any system, regardless of what that other user called his hardware physically. Error messages would be like "Please insert MyProject in any drive" - which means your script would work on a ZIP, CD-R or floppy, as long as it has been baptized "MyProject".

With a feature called "assigns", you could assign "drive" names to directories. For example, you could "Assign MyProject: some/place/on/hd", and access that directory just like a media formatted as "MyProject".

With multiple assigns you can create somewhat like a "search path". The AmigaOS equivalend to "export PATH=..." was "Assign C: SYS:C SYS:Prefs", and when your shell looked for a command, it will look in the specified locations in turn. Same goes for LIBS:, DEVS:, SYS: (boot device, which actually allowed booting from any media), and various other "search paths".

Having multiple media of the same name must be handled somehow by the OS, of course. AmigaOS appended serial numbers (MyProject, MyProject.1).

I could rant on, but I've work to do - I hope this gave you an idea of how it could be done if you break free of the "like Windows or like Linux?" trap.

Candy · Post by **Candy** » Wed Aug 25, 2004 3:21 am

beyond infinity wrote: I'd go for both ways: providing the mmap way for those who want to treat the file like an array, and standard read/write for those who just need to do the day to day data saving job.

What I don't see is a wish to perform read/write operations on files without *opening* them previously. after all, the action *open* tells: Hey, Kernel, I want *this* file, give it to me.

Well, for quick logging in an application it'd be a nice thing to use append_to("debug.log", "this is screwed, null pointer in function do_call\n"); instead of (FILE *x = fopen("debug.log); if (x != NULL) { fwrite(x, "this is ... \n", strlen("this is ... \n")); fclose(x); } else { exit(2); /* or something */ } ).

But I don't want to have a read/write operation be mmapped per default. I want to have the choice whether to operate on a stream (which would be mmapped) or per record (normal read/write).

You can do this difference entirely transparently

Solar · Post by **Solar** » Wed Aug 25, 2004 3:34 am

Candy wrote: Well, for quick logging in an application it'd be a nice thing to use append_to("debug.log", "this is screwed, null pointer in function do_call\n"); instead of (FILE *x = fopen("debug.log); if (x != NULL) { fwrite(x, "this is ... \n", strlen("this is ... \n")); fclose(x); } else { exit(2); /* or something */ } ).

You happily assume that your append_to() can never fail...

Your application should have the fopen("debug.log") at the start of main(), the fwrite() at the appropriate place, and the fclose() at the end.

I think you're trying to solve the problem at the wrong place. Why not implement a system-wide logging service, with issuing process, time stamp etc. comfortably added by the OS, and errors (disk full...) handled by the logging service, so that your application can issue a MyOS_logmessage("This is screwed...") without having to worry whether the message goes to file debug.log or per SMS to the system administrator?

Legend · Post by **Legend** » Wed Aug 25, 2004 4:42 am

Pype.Clicker wrote: FS killer feature
as a programmer, i guess the things i miss the most in the current file systems is the ability to catch at user level a kind of "exception" telling that a program A is trying to access my file F for writing. And yes, i mean it would be kewl to have a copy-on-write option for filesystems.

In the same idea-drawer is kept the dream of being notified of directory events (new file added, file changed, etc). and a consistant API to access file's meta-data

Wouldn't Windows be able to do this? At least it does some of the things. (MSVC ask if it should reload files edited by other programs and as far as I know the standard explorer search window keeps active waiting for changes after it has finished a search etc.)

Solar · Post by **Solar** » Wed Aug 25, 2004 5:42 am

Check fcntl(2). It provides various notification features for Linux >= 2.4.

Candy · Post by **Candy** » Wed Aug 25, 2004 5:58 am

Solar wrote: Check fcntl(2). It provides various notification features for Linux >= 2.4.

fcntl also handles all the file operations that were thought of after the others were created. How about removing fcntl and coming up with a different interface?

Suggestion:

register_change_callback(FILE *file, int flags, void (*callback)(FILE *file, struct *change));

first idea, might be lots better ideas

Solar · Post by **Solar** » Wed Aug 25, 2004 6:11 am

I never said I liked it

, but Legend asked whether Windows can't do file notification, and I just happened to read the fcntl(2) manpage this morning while working on PDCLib.

(Because RedHat's newlib requires fcntl(2) as underlying "primitive", and I wanted to know what it's about and whether PDCLib would need a similar "primitive". Looks like RedHat's and my idea of "primitive" varies wildly.

(

Legend · Post by **Legend** » Wed Aug 25, 2004 8:07 am

Candy wrote:
Solar wrote: Check fcntl(2). It provides various notification features for Linux >= 2.4.
fcntl also handles all the file operations that were thought of after the others were created. How about removing fcntl and coming up with a different interface?

Suggestion:

register_change_callback(FILE *file, int flags, void (*callback)(FILE *file, struct *change));

first idea, might be lots better ideas

At least the direction is very good! ;D
I think fcntl is for files what ioctl is for devices and as that - a mess! >:(

Colonel Kernel · Post by **Colonel Kernel** » Wed Aug 25, 2004 12:30 pm

Solar wrote: AmigaOS-alike

Every device has a ("physical") name, like "ZIP", "HD0", "FD0" or whatever, determined by the driver config. That makes thousands of combinations instead of 26, and intuitive ones too. They also aren't likely to shift around when you install new hardware (as /dev/hda or D:\ would).

Every media (or, partition thereof) has a ("logical") name, determined by the user at formatting time.

File names could be qualified using either physical or logical device name.

Having a media "MyProject" allows you to access "MyProject:file.txt" on any system, regardless of what that other user called his hardware physically. Error messages would be like "Please insert MyProject in any drive" - which means your script would work on a ZIP, CD-R or floppy, as long as it has been baptized "MyProject".

Now that is cool. Seems like a much more elegant way of doing things.

In terms of file system interfaces, I'm quite curious about some of the more "out-there" ideas being kicked around by researchers, but I haven't had time to look into them. Is anyone familiar with orthogonal persistence (and able to summarize?

).

OSDev.org

Design of file access system calls

Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls

Re:Design of file access system calls