A design of VFS

nbdd0121 · Post by **nbdd0121** » Wed Jan 01, 2014 6:19 am

Since that the VFS system in linux is too complicated and I want to develop a more abstract interface between file system, devices and applications, I come up with this idea about VFS:
1. Everything is file.
2. Everyfile can be mounted to another file.
3. When a file is mounted, the file it mounts to will be invisible for everyone, or `overrided`, any operation to that file will be redirected to the mounted file.
4. File systems are also files (base on 1), so when we open a file system, we in fact mount a instance of a file system as a file to the destination.
5. No hard link is allowed, hard link should be dealt by specific FS instead of VFS.
This is the set of operation on file (I don't think more is essential, or if it is, please tell me):
read
write
open
close
mmap
set_permission
get_permission
get_size
mount /* This is only the handler, mainly VFS will deal everything */
umount /* For a file system, it needs to flush the buffer, close the device, etc */
delete /* Since this VFS does not deal with hard link, so delete instead of unlink */
create, readdir, finddir /* only for directories */

rdos · Post by **rdos** » Wed Jan 01, 2014 7:02 am

I think it is a bad idea to view everything as files. I also think it is a bad idea to view directories as files. However, these are my personal preferences, so you can just discard them.

What I think you need to consider though in implementing a fast VFS, is how to create the cache for disc-blocks, directories and file contents. This is key to achieving good performance, so should be dealt with in any VFS design.

In my design, I have three caches:
1. A raw sector cache, which also is used when requesting/writing specific sectors from/to the low-level disc driver.
2. A directory cache. This is implemented as part of the FAT-FS, and thus is not generic
3. A file data cache. This is part of the VFS

nbdd0121 · Post by **nbdd0121** » Wed Jan 01, 2014 7:13 am

rdos wrote:In my design, I have three caches:
1. A raw sector cache, which also is used when requesting/writing specific sectors from/to the low-level disc driver.
2. A directory cache. This is implemented as part of the FAT-FS, and thus is not generic
3. A file data cache. This is part of the VFS

Well, actually, I think this can be done with my VFS design, also. Firstly, if allowed, use mmap operation to access file, and it can avoid many inessential memory copies, the memory mapped area can just be the cache of the file. As for raw sector cache, I think it can be done with the device files' operation, for example, the device file maintain a cache by itself instead of kernel, and then the file system can use mmap operation to access the device file. I think this design can be suitable for many devices, for example, for hard disks, device file can maintain a cache, but for hot-plug devices, the device file should write data back to device ASAP.
However, about directory cache, I think accomplishing a dir cache (or dcache in linux) is great but I just cannot think of any efficient ideas about the refresh of the dcache. If directory is maintained inside a dcache, and I deleted a directory from a hard link, so the dcache will not be able to figure out the change. Thus, in my design, I did not say anything about directory cache, which means that all cache works shall be done by files' operation, instead of the VFS. A file system can maintain a directory cache and use it when VFS call the readdir or finddir.

rdos · Post by **rdos** » Wed Jan 01, 2014 9:05 am

nbdd0121 wrote:Firstly, if allowed, use mmap operation to access file, and it can avoid many inessential memory copies, the memory mapped area can just be the cache of the file.

That causes other problems. First, not all file data is used in the application by mmap. Applications could use read/write rather than memory-mapping the file. Second, in an effective scheme, caching a file should be done in a smart manner. Simple ways that won't work very well might be to cache the whole file, or just the amount read/mmaped. A smart algorithm might decide to cache for instance 1MB even if the application only reads a single byte, in anticipation that it will eventually read the whole file.

You also need to consider what would happen as your physical memory is too low, and you need to free some in order to satisfy requests. This is when you want to free caches for unused or seldom used files. If you just mmap whole files, it would be hard for a generic swapper to know that a certain area is a file cache that isn't used a lot.

In a 32-bit (less likely in a 64-bit) kernel, you also risk running out of linear address space if you mmap whole large files.

Hot plug FSes (like USB sticks) pose special problems. In those cases read/write operations can stop working at any time, and you need to recover from those anywhere. You also need to be able to dismount the system at any point. These are issues that I have not fully solved. I've solved the issue for a non-active FS without active commits, but not the other cases.

sortie · Post by **sortie** » Wed Jan 01, 2014 10:28 am

Well, your set of operations is hardly ideal:

You don't have the 'truncate' operation that changes the size of a file. With your current set of operations, you can never make a file smaller except by deleting it and writing it out again.
You don't have a rename operation, though that may be by design.
You don't have a seek operation, though that might be done at a higher level, or through pread and pwrite primitives.
You don't have a proper 'stat' operation that gives file meta data.
You don't have a "file modification time" abstraction.

I get the impression you are inexperienced with filesystem semantics and VFS design, which is alright. I think the problem here is that you are trying to map things out in advance, but things may well turn out quite different when you actually implement it. You likely already know some of issues there are in your design. Anyways, my recommendation is not to think too much about the design in advance, but to implement it. Your "basic set of operations" is likely good enough in theory, but might not well suited for a real implementation. So yeah, go implement your design rather than over-planning this and learn from whatever mistakes you made. You should not make the mistake of ignoring how existing systems do this - it's perfectly alright to diverge - but you should understand the designs you disagree with.

nbdd0121 · Post by **nbdd0121** » Wed Jan 01, 2014 5:43 pm

sortie wrote:

You don't have the 'truncate' operation that changes the size of a file. With your current set of operations, you can never make a file smaller except by deleting it and writing it out again.

You don't have a rename operation, though that may be by design.

You don't have a seek operation, though that might be done at a higher level, or through pread and pwrite primitives.

You don't have a proper 'stat' operation that gives file meta data.

You don't have a "file modification time" abstraction.

Actually, I forget to add `truncate` operation, but you can see that I considered about size by giving `get_size`. Rename, I think it shall be a operation to the directory containing the file, but not directly to the file itself, since that the directory may contain some information about the file. Offset will be done by an argument pass to read. As for `stat`, in my opinion, it will be considered a individual file. In my design, there are no difference between file and directory, so a file can also contain other files, such as metadata. File modification time can also be a metadata of a file. Thank you for your advice, and it is true that I am inexperienced with VFS since that every work I done before is about CPU. It is my first try to implement an abstract layer like VFS.

bluemoon · Post by **bluemoon** » Wed Jan 01, 2014 6:20 pm

Those operation are blocking calls, have you considered things like non-blocking send() and recv(), or async io?

On the other hand, how do meta info, tags and versioning enter the game?

OSDev.org

A design of VFS

A design of VFS

Re: A design of VFS

Re: A design of VFS

Re: A design of VFS

Re: A design of VFS

Re: A design of VFS

Re: A design of VFS