in VFS or in FS

Beastie · Post by **Beastie** » Mon Dec 17, 2007 3:06 pm

Hi,

If i've a file "/dir1/dir2/file.txt" for example, and i want to read this file, i'll assume i've two layers: VFS and FS (FAT for example).

Where should i handle the separation of path in VFS or FS?
Let me simple my question the steps to read this file is :
(1) read root dir
(2) get cluster(s) of dir1 & read it
(3) get cluster(s) of dir2 & read it
(4) get cluster(s) of file.txt

should this steps done in VFS or FS, i think its better handled in VFS since may be some dir mounted somewhere else and so on.

What do you think guys?

AndrewAPrice · Post by **AndrewAPrice** » Mon Dec 17, 2007 4:54 pm

The FS of course. How the VFS manages files is on a lot higher level than the FS. The VFS deals with the name, size, permissions, caching directories. The FS deals with reading the contents, managing clusters, etc.

elfenix · Post by **elfenix** » Mon Dec 17, 2007 8:53 pm

Beastie wrote:Hi,

If i've a file "/dir1/dir2/file.txt" for example, and i want to read this file, i'll assume i've two layers: VFS and FS (FAT for example).

Where should i handle the separation of path in VFS or FS?
Let me simple my question the steps to read this file is :
(1) read root dir
(2) get cluster(s) of dir1 & read it
(3) get cluster(s) of dir2 & read it
(4) get cluster(s) of file.txt

should this steps done in VFS or FS, i think its better handled in VFS since may be some dir mounted somewhere else and so on.

What do you think guys?

I'd say you almost surely want this in the VFS. It sounds like you are trying to do things the "Unix" way. If you want to have "Volumes" or anything else similar, then this advice goes out the window.

To go the Unix way -

A file does not have a filename, it has a unique identifier known as an inode. Using the inode (and the filesystem) you can grab a chunk of data from the disk. An inode who's data groups other inodes with filenames is a directory. The directory separator (a / in *nix) tells the operating system that the whole group preceding represents one of these special inodes.

So, a directory path -
/usr/src/myos/kernel.c

Really means -
the inode associated with kernel.c
found in the directory myos
found in the directory src
found in the directory usr
found in the mount root

This is also the setup that you describe for your kernel. The important part here, is the definitions apply across more than one file system, and every file system will need to be the same to do match standard behavior (unless you don't want to).

A filesystem driver will have a natural and required interface to obtain a directory or directory listing (mapping) - as that is part of what a filesystem does.

Now, it's possible that portions of your path live on different filesystems than your root directory. Placing the separation code in the FS will cause the filesystem driver to call back into the VFS layer to determine if any given directory is a mount point. Which means you are going to end up duplicating (and possibly conflicting) that logic for every one of your filesystems.

Now, that might be a good thing if you do NOT want the *nix behavior, but keeping the path code in the VFS allows all filesystems to have identical behavior in terms of searching, etc...

FWIW, The major *nix vendors place this code in the VFS.

Beastie · Post by **Beastie** » Tue Dec 18, 2007 1:38 pm

So i've two types of inodes (1) VFS inodes and (2) FS inodes ????

Beastie · Post by **Beastie** » Tue Dec 18, 2007 1:42 pm

I found alot of structs (inode, superblock, dentry, ...) and i got messed up

Nothing better than a working example:

Assuming the file is not yet cached. (no inode-cache & dentry cache)
Path = /usr/src/myos/kernel.c

A user process invoked an OPEN syscall to this file (Path).

Can any one plz tell the steps the kernel (*NIX) should do after the execution of open syscall (communication between VFS & FS) ???

I'll be thankful for the answer

AndrewAPrice · Post by **AndrewAPrice** » Tue Dec 18, 2007 5:58 pm

Beastie wrote:So i've two types of inodes (1) VFS inodes and (2) FS inodes ????

If you want to think of it that way, yes. Basically, have you VFS inode, and one of it's members is a pointer to the FS inode. The VFS ignores this completely. What is stored in the FS inode is completely up to the FS driver (e.g. a FAT driver would store clusters and the like in here).

Brendan · Post by **Brendan** » Tue Dec 18, 2007 6:25 pm

Hi,

Beastie wrote:Can any one plz tell the steps the kernel (*NIX) should do after the execution of open syscall (communication between VFS & FS) ???

I'll be thankful for the answer

The simple answer would be to traverse a tree of entries (while keeping track of details for the current file system), asking the file system for more data when necessary, until you reach the correct entry. Then mark that entry as "opened" somehow and build a structure that describes how it's opened.

For example, for the file "/dir1/dir2/dir3/foo.txt" the VFS would start at "/" and the root file system. If the entry for "/" says none of the directory contents are in the tree then it'd ask the current file system for a listing of everything in the "/" directory and add this information into the tree (possibly getting rid of other entries to make space). Then it'd search for "dir1".

If it finds "dir1" it'd check if it's a mount point, and if it is a mount point it'd set a new current file system and a new current mount point name. If the entry for "/dir1" says none of the directory contents are in the tree then it'd ask the current file system for a listing of everything in the "/dir1" directory (after removing the current mount point name from the "/dir1" string) and add this information into the tree (possibly getting rid of other entries to make space). Then it'd search for "dir2".

If it finds "dir2" it'd check if it's a mount point, and if it is a mount point it'd set a new current file system and a new current mount point name. If the entry for "/dir2" says none of the directory contents are in the tree then it'd ask the current file system for a listing of everything in the "/dir2" directory (after removing the current mount point name from the "/dir2" string) and add this information into the tree (possibly getting rid of other entries to make space). Then it'd search for "dir3".

If it finds "dir3" it'd check if it's a mount point, and if it is a mount point it'd set a new current file system and a new current mount point name. If the entry for "/dir3" says none of the directory contents are in the tree then it'd ask the current file system for a listing of everything in the "/dir3" directory (after removing the current mount point name from the "/dir3" string) and add this information into the tree (possibly getting rid of other entries to make space). Then it'd search for "foo.txt".

Um, it's recursive...

Some notes:
1) before the VFS does anything it'd sanitize the path name to create a unique absolute path. Something like "/a/../b/../c/d/../foo.txt" would be converted into "/c/foo.txt", and something like "~/bar.txt" might be converted into "/home/Brendan/bar.txt". If the application tries to create a new file called "/foo/*?*?*" then they get a bad filename error.
2) You'd keep track of the current mount point name, so that if the VFS is looking at "/dir1/dir2/dir3" and "/dir1/dir2" happens to be the most recent mount point, then it'd ask the file system mounted at "/dir1/dir2" about the directory "/dir3" (and not the directory "/dir1/dir2/dir3").
2) If at any step the next piece of the path is not found, then return "file not found".
3) There's file permission checks in there somewhere.
4) The VFS needs to handle "mount" and "unmount" too.
5) For fun, use hashes to speed it up (convert string into a number/hash, compare the number/hash with the number/hash stored in each entry in the directory, if numbers/hashes match compare strings to double check).
6) To speed it up more optimize for cache locality. Only put data you need in the tree (address of a list of children and name hash) and for each entry in the tree have a pointer to further information (name string, permissions, owner, group, etc). Then keep the entries in the tree seperate from everything else (e.g. have two seperate heaps, one for tree entries and another for everything else).

When you find the correct entry (e.g. "/dir1/dir2/dir3/foo.txt" if an existing file is being opened, or possibly just "/dir1/dir2/dir3/" if the file is being created), create a new file handle. Add the file handle to a list of all file handles that have opened the file and keep track of how the file was opened so that file sharing works. For example, (usually) a file can be opened many times as "read only" but can only be opened once with write access. Some OS's support "append" where you can add data to the end of the file while other people read.

Some more notes:
7) All decent OSs do all the above asynchronously - build a structure describing the state of each request and come back to the request later if you can't complete it immediately (e.g. if you need to wait for the file system to do anything).
8 ) All decent OSs support notifications. Software can ask the VFS to let it know if a file or directory is modified, and the VFS does.
9) Some OSs support I/O priorities. For example, if a low priority thread wants to open "foo.txt" and a high priority thread wants to open "bar.txt" then the VFS doesn't do anything for "foo.txt" if it can do something for "bar.txt". I/O priorities should extend to file systems and device drivers too. For example, if a low priority request to read 50 MB of data from the disk is in progress and a high priority request to write 4 KB of data arrives, then the high priority request should preempt the low priority request. Imagine the high priority request is the swap space and the low priority request is defrag and you'll see why.
10) File systems may not be file systems - consider "/proc" and "/dev".
11) You'll want to keep track of "least recently used" entries so you can free up space when you need to.
12) You'll eventually want to find some way of balancing RAM, VFS cache and swap space - I imagine this is hard to do right. For e.g. if the VFS cache has data that was last used 10 seconds ago and a thread has data that was last used 8 seconds ago, then it might be better to send the thread's data to swap space (even though it's not "least recently used") if you can read it back from swap faster than you can get the VFS cache data back.
13) You might want to cache file data too. Resist the urge until everything else works perfectly (it's complicated enough already).
14) Don't forget that a lot of things depends on you and your OS!

15) There's lots of stuff you could add - versioning, fault tolerance, sparse files, encryption, compression, search and indexing, meta-data, snapshots/rollbacks, etc.
16) Don't forget I made this all up while I typed...

Cheers,

Brendan

elfenix · Post by **elfenix** » Tue Dec 18, 2007 6:51 pm

9) Some OSs support I/O priorities. For example, if a low priority thread wants to open "foo.txt" and a high priority thread wants to open "bar.txt" then the VFS doesn't do anything for "foo.txt" if it can do something for "bar.txt". I/O priorities should extend to file systems and device drivers too. For example, if a low priority request to read 50 MB of data from the disk is in progress and a high priority request to write 4 KB of data arrives, then the high priority request should preempt the low priority request. Imagine the high priority request is the swap space and the low priority request is defrag and you'll see why.

You want to be very wary of doing this until you have a decent priority scheme implemented, in place, stable, and understand all of the repercussions... Priority inversion is a common problem in such schemes and can lead to difficult, hard to catch, annoying, evil, disgusting, bugs...

As for "VFS" vs "FS" inodes - they can be done by the VFS node containing the FS node. They can also be done using inheritance. If you are using an OO language and inheritance, then the whole problem really devolves into an ordered tree search. There are trade-offs here, and it really depends, again, on what you want your design to be good at.

urxae · Post by **urxae** » Thu Dec 20, 2007 8:05 am

(Note: some of the stuff below may only apply to Linux, as I don't have any significant experience with other Unix-like systems. Even so, that still means the quoted points don't describe the only way and at least one system in somewhat common use does things differently.)

Brendan wrote:Some notes:
1) before the VFS does anything it'd sanitize the path name to create a unique absolute path. Something like "/a/../b/../c/d/../foo.txt" would be converted into "/c/foo.txt",

That's not how Linux does it, IIRC. If /a, /b or /c/d is a symlink the following '..' will go to the parent of the target.
If any of them do not exist it's an error (see your second "2)").

and something like "~/bar.txt" might be converted into "/home/Brendan/bar.txt".

That's often done before it's passed to the kernel though, for example by the shell or perhaps some user-space library.

If the application tries to create a new file called "/foo/*?*?*" then they get a bad filename error.

Again, not on Linux:

Code: Select all

urxae@urxae:~/tmp$ touch '*?*?*'
urxae@urxae:~/tmp$ ls -l '*'*
-rw------- 1 urxae urxae 0 2007-12-20 14:34 *?*?*

(The quotes are necessary to tell bash not to expand *?*?* to a list of filenames in the directory)
As far as I know, Linux allows any character other than '/' (and maybe '\0') in filenames.

10) File systems may not be file systems - consider "/proc" and "/dev".

I guess that depends on your definition of "file system". Specifically, whether it means the contents needs to be saved onto some kind of non-volatile storage device attached to the local machine.
Some other kinds of "non-traditional" file systems: ramdisks, network filesystems (nfs/smb/ftp etc.), mounted disk images...

16) Don't forget I made this all up while I typed...

One might say it shows

.

JamesM · Post by **JamesM** » Thu Dec 20, 2007 9:12 am

Code: Select all

urxae@urxae:~/tmp$ touch '*?*?*' 
urxae@urxae:~/tmp$ ls -l '*'* 
-rw------- 1 urxae urxae 0 2007-12-20 14:34 *?*?*

Why is the default permissions mask on your system user-only read-writeable, and not readable by group or other?

urxae · Post by **urxae** » Sun Dec 23, 2007 8:22 am

JamesM wrote:
Code: Select all
urxae@urxae:~/tmp$ touch '*?*?*' 
urxae@urxae:~/tmp$ ls -l '*'* 
-rw------- 1 urxae urxae 0 2007-12-20 14:34 *?*?*
Why is the default permissions mask on your system user-only read-writeable, and not readable by group or other?

Short answer: Because I'm paranoid.
Long answer: On the Linux-server we can use at school the default group for students is 'student'. I don't want others (especially other students) to be able to read my files, so I added 'umask 077' to .bashrc. Since I try to use the same .bashrc everywhere, that's also in my local .bashrc.

OSDev.org

in VFS or in FS

in VFS or in FS

Re: in VFS or in FS