Page 1 of 1

VFS theory

Posted: Tue Jul 28, 2009 7:29 am
by zity
Hello :)
I'm currently implementing filesystem stuff, including devfs, vfs, and fat12. But I'm uncertain whether I've have done it the right way, or the most efficient way. My filesystem stuff is currently working like this:

Filesystems is registered in VFS with name and functions for file operations, so when a file operation is requested, it goes through the following steps.

1. An application calls the VFS for a file operation, e.g. open()
2. The VFS figures out which mountpoint the file belongs to.
3. The VFS redirects the operation to the appropriate filesystem driver.
4. The filesystem processes the request and returns its data to VFS.
5. VFS returns to the application.

I guess this is the right (or most widely used) approach for file operations?

But further more, my on-disk filesystems (like fat12) access the hardware though /dev/fd0. This means, that a file operation goes though all the following steps (it's a little tricky sketched up, but '->' means call to function and '<-' means return to function).

Code: Select all

Application -> VFS -> FAT12 -> VFS -> DEVFS -> FLOPPY DRIVER <- DEVFS <- VFS <- FAT12 <- VFS <- Application
So, I've 6 function calls that all returns to the previous function, and to me this seems like overkill?

PART II
My FAT12 driver stores the current directory in a structure in the driver. This means that every time a file operation is required, the driver most likely has to change directory, because the calls comes from different applications with different working directories. This means that there is a lot of directory changing all the time. Is the a normal way of working with filesystem, or should I store data for the current directory for each open file?

I'm a little confused about how I implement VFS in the most efficent way, and I cannot really find anything about how the VFS is created in a smart way :)

Re: VFS theory

Posted: Tue Jul 28, 2009 10:11 am
by cyr1x
Why not let the FAT driver call the device driver direcetly?

Re: VFS theory

Posted: Tue Jul 28, 2009 11:32 am
by manonthemoon
zity wrote: Is the a normal way of working with filesystem, or should I store data for the current directory for each open file?
I would store the current directory for each program/process. Why would the FAT12 driver need to know the current directory? (except to resolve the full pathname of a file.) The current directory is more of a "virtual" thing to make filenames shorter and independent of their exact path. So it should be something between the program and the VFS. For example, a program asks for "file.txt" and the VFS uses that program's current directory and asks the filesystem for "/folder/file.txt". The FAT driver doesn't need to know anything about current directories, or wildcards if you ever use those.
cyr1x wrote:Why not let the FAT driver call the device driver direcetly?
I agree. That should cut down on some of those function calls. The VFS keeps track, for example, that /mnt/floppy is a mount point of FAT12 on /dev/fd0. So when it asks the FAT12 driver to open a file, it should pass "/dev/fd0" along with the request. That way the FAT driver doesn't have to keep track of anything or make extra function calls to the VFS. (or, it could keep track of things, for caching.)

Re: VFS theory

Posted: Tue Jul 28, 2009 3:41 pm
by Brendan
Hi,
manonthemoon wrote:
zity wrote: Is the a normal way of working with filesystem, or should I store data for the current directory for each open file?
I would store the current directory for each program/process. Why would the FAT12 driver need to know the current directory? (except to resolve the full pathname of a file.) The current directory is more of a "virtual" thing to make filenames shorter and independent of their exact path. So it should be something between the program and the VFS. For example, a program asks for "file.txt" and the VFS uses that program's current directory and asks the filesystem for "/folder/file.txt". The FAT driver doesn't need to know anything about current directories, or wildcards if you ever use those.
Agreed, but I'd go a step further - the entire concept of a working directory only exists in user space (e.g. it's an illusion created by some sort of file I/O library); and as far as the VFS is concerned everything is relative to the root directory; and as far as each file system is concerned everything is relative to its mount point. For e.g. if an application thinks its working directory is "/foo" and asks to open the file "bar/woot.txt"; then the library the application is using asks the VFS to open the file "/foo/bar/woot.txt"; and if a FAT12 file system is mounted at "/foo/bar" then the VFS asks the FAT12 code for the file "/woot.txt".
cyr1x wrote:Why not let the FAT driver call the device driver direcetly?
Because it's nice to be able to mount normal files; especially if you're an OS developer (who tends to work with disk images more often than most people, and who doesn't see why tools things like "mtools" and loopack devices should be necessary).
zity wrote:So, I've 6 function calls that all returns to the previous function, and to me this seems like overkill?
I agree, but you're under-estimating the problem and over-estimating the problem at the same time.

If an application wants to open the file "/a/b/c/d/e/f/g/h/i/j/k/hello.txt", then the VFS may need to ask the file system/s about each subdirectory (e.g. does "/a" exist, does "/a/b" exist, "/a/b/c", exist, etc), and you'll probably have something like 50 function calls and returns.

However, you forgot about caching things (directory entries and file data) in the VFS layer. If the data that's needed is already in the VFS cache then it becomes "Application -> VFS <- Application; and if the data that's needed isn't cached by the VFS the overhead of lots of function calls will be insignificant compared to the time it takes to fetch data from the disk drive.

I'd also look into asynchronous I/O - it'd be nice if several tasks could get data from the VFS cache (and from hard drives, etc) while another task is waiting for 1 MiB of data to be read from a slow floppy...


Cheers,

Brendan

Re: VFS theory

Posted: Tue Jul 28, 2009 11:08 pm
by zity
Thanks for all the replies, much appreciated! :)

I'll change my mount procedure, so the file operations for /dev/fd0 is passed directly to the driver. I'll furthermore try to look into some sort of caching in VFS to avoid unnecessary calls to the filesystem. To begin with, I'll try to cache directories and later on files.

I'm not sure whether I made myself clear about the working directory stuff. The FAT driver does not keep track of the working directories for each application. The applications working directory is currently stored in the task structure and changed by VFS when an application call chdir. Should I move the handling of CWD to my C library? The FAT12 driver only stores the last opened directory on the drive in a structure.

About mount points. If I for example have a mountpoint called /mnt/cdrom, and the file /mnt/cdrom/folder/hello.txt is opened, the VFS will strip /mnt/cdrom from the path and pass along /folder/hello.txt to the driver, without going though the /mnt/cdrom path, because I'm sure the mount point exists, since I do not allow a mountpoint to be deleted.

Re: VFS theory

Posted: Wed Jul 29, 2009 1:05 am
by Brendan
Hi,
zity wrote:I'll change my mount procedure, so the file operations for /dev/fd0 is passed directly to the driver. I'll furthermore try to look into some sort of caching in VFS to avoid unnecessary calls to the filesystem. To begin with, I'll try to cache directories and later on files.
Caching (especially for directory entries) isn't something I'd recommend adding later, as it changes everything.

For example, imagine an application asks to open the file "/mnt/cdrom/bar/foo.txt". The VFS looks in its cache and the directory entry isn't there, so the VFS asks the file system mounted at "/mnt/cdrom" for a complete listing of everything in the "/bar" directory. For each directory entry returned by the file system code there's some sort of reference number, where the VFS doesn't care what this reference number means, and it can have a different meaning for each type of file system (e.g. for FAT it could be the number of the first cluster in the chain of clusters used to store the file or subdirectory; and for ISO9660 it could be the LBA sector number of the first sector used to store the file or subdirectory). Later, when the VFS receives the list of directory entries from the file system it find all requests that where waiting for this information (including the original application's "open()" request, but there may be other requests by this time), and for the application's "open()" request the VFS can tell the application that the file has been opened successfully (without bothering to tell the file system).

The application might just do an "fstat()" then close the file, and in this case the VFS can handle the "fstat()" and the "close()" without caring about the file system (as the file system still doesn't know that any file was opened and therefore doesn't need to know the file was closed, and the VFS would/should still have the directory information needed for the "fstat()" in its cache).

If the application reads from the file and if the data being read is in the VFS cache, then the VFS still doesn't need to tell the file system that the file is opened.

If the application reads data from the file that isn't in the VFS cache, or writes new data to the file; then the VFS can tell the file system that the file was opened (if it didn't already, due to a previous read or write) and ask the file system to read/write the data (and the VFS would also update the data stored in its cache). When the application calls "close()" the VFS checks to see if the file system knows that the file was opened, and only lets the file system know about the "close()" if the file system knew about the "open()".

When the VFS asks the file system to add, delete or rename a directory entry, or when the VFS asks the file system to read or write file data, then VFS tells the file system its own reference number for the directory or file, so that the file system doesn't need to try to figure out where the subdirectory or file is on the disk. For example, the application (after opening "/mnt/cdrom/bar/foo.txt") might want to read 4096 bytes from offset 123456 in the file, so the VFS might ask the ISO 9660 file system to read 4096 bytes from the file associated with the reference number that the ISO9660 previously provided (note: there's no need for the VFS to tell the file system the file name), and the ISO9660 file system (that uses this reference number to track the starting sector) can find the data it needs to read without looking at (or keeping track of) any directory information at all.

Basically there's no need for the file system to keep track the last opened directory on the drive in a structure; because the file system knows that the directory information that was last used by the VFS will be in the VFS cache, and that the VFS will keep track of the file system's reference numbers. Except for managing (one or more) queues of pending requests; you'd be able to implement an ISO9660 file system without tracking any data at all in the file system code, and for FAT file systems the only thing you'd need to track in the file system code is the "FAT" (Cluster Allocation Table) itself.


Cheers,

Brendan

Re: VFS theory

Posted: Wed Jul 29, 2009 3:00 am
by zity
Thanks Brendan!
Your answer(s) has definitely helped me clear up my mind. I'll start working on caching as soon as possible. Your explanation made me realize that there are a few more things I need to change in order to simplify my filesystem design and make it more efficient.

I'm glad I posted this topic in this early stadium of my filesystem design, before it became too clumsy. I think I'll be able to create a much more efficient, lightweight and well functioning design now :)

Re: VFS theory

Posted: Wed Jul 29, 2009 3:36 am
by cyr1x
Brendan wrote:
cyr1x wrote:Why not let the FAT driver call the device driver direcetly?
Because it's nice to be able to mount normal files; especially if you're an OS developer (who tends to work with disk images more often than most people, and who doesn't see why tools things like "mtools" and loopack devices should be necessary).
You can pretend that the particular file is a "driver".