VFS caching

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
zity
Member
Member
Posts: 99
Joined: Mon Jul 13, 2009 5:52 am
Location: Denmark

VFS caching

Post by zity »

Hi
I have decided to rewrite the code for my virtual file system in order to implement some new features, one of them being caching of directory entries (I'm not going to worry about caching of file data, not yet). My problem is simple, the solution apparently not so much. How should I store the directory entries in my cache?

I have been thinking about this for a while, but I cannot seem to find a solution that fit my needs. I want it to be easy to look up a given file or directory and it should be possible to edit the cache when a file is changed or deleted. When a file is changed I will find it acceptable if the entire directory must be re-fetced into the cache from the actual file system, so the problem is more or less how to structure the cache itself.

My "solution" so far is a cache for each mount point with a two-level tree like structure. The structure contains a list of all cached directories identifiable by file name and inode. Each of these directory entries contains a list of all files (and directories) inside this directory. What do you think? I would like some comments on this :)
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: VFS caching

Post by gerryg400 »

I don't think you need to keep the entire contents of a directory cached. The dcache only needs to contain entries that will be requested often.

It's also a big win if you design your cache so that it also lists objects that are known NOT to exist. A lot of filesystem time is wasted searching again and again for files that don't exist. Consider what happens when a (dumb) shell keeps statting for binary files in all the directories on the $PATH or a compiler continually searches for include files in all the directories in the include path list.

A simple idea might be to keep a tree which basically represents a partial filesystem. Not all directories need to be fully cached, though you should mark the ones that are so that you don't need to search them again.
If a trainstation is where trains stop, what is a workstation ?
User avatar
zity
Member
Member
Posts: 99
Joined: Mon Jul 13, 2009 5:52 am
Location: Denmark

Re: VFS caching

Post by zity »

gerryg400 wrote:I don't think you need to keep the entire contents of a directory cached. The dcache only needs to contain entries that will be requested often.
You are probably right, but I cannot know if a file is requested once or multiple times, so all I can do is to cache a file when it's opened. When a file is opened via the VFS, the entire directory is read in the file system, so I could just aswell store all the entries with the only disadvantage being wasting memory (not that it doesn't matter).

It seems like a very good idea to store a list of non-existing files, hadn't thought of that. I will probably implement this feature.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: VFS caching

Post by gerryg400 »

You are probably right, but I cannot know if a file is requested once or multiple times, so all I can do is to cache a file when it's opened. When a file is opened via the VFS, the entire directory is read in the file system, so I could just aswell store all the entries with the only disadvantage being wasting memory (not that it doesn't matter).
You need a caching algorithm to evict old entries, LRU or somesuch.

Also, there may be some very large directories in your file system. If you are searching a directory contain 1000 files and you find the file in the first position there is little sense continuing. I think it's better to not limit your design in that way. For example my /usr/bin contains more than 1086 files. A large number of these I never (or rarely) use. The ones that I do use must for the sake of average performance, be in the directory cache.
If a trainstation is where trains stop, what is a workstation ?
User avatar
zity
Member
Member
Posts: 99
Joined: Mon Jul 13, 2009 5:52 am
Location: Denmark

Re: VFS caching

Post by zity »

gerryg400 wrote:Also, there may be some very large directories in your file system. If you are searching a directory contain 1000 files and you find the file in the first position there is little sense continuing. I think it's better to not limit your design in that way. For example my /usr/bin contains more than 1086 files. A large number of these I never (or rarely) use. The ones that I do use must for the sake of average performance, be in the directory cache.
If we only consider opening files your point is definitely valid, because one rarely uses more than a few files in a directory. But if the entire directory is cached I can avoid contact with the filesystem if I want to list the content of a directory using readdir().

When programming my OS I want simplicity rather than maximum performance, to some extent at least. This is why I'm a little unsure how to do this, because performance IS importent, but I will not trade code simplicity for a minor performance boost. The easiest and simplest would be the solution suggested by berkus, with a single array of files and directories identifiable by filename and inode (and maybe a hash value of the filename). But if I want to cache entire directories, a single array is imho not the best solution.

So the question is how often the content of a directory is listed and how often a single file is opened.
pranjas
Member
Member
Posts: 25
Joined: Sun Jan 16, 2011 9:18 pm

Re: VFS caching

Post by pranjas »

You can have an option that would allow applications to hint your FS to cache the dentries. Since this is more or less related to a user opening files, which your FS can't guess ofcourse. The amount of memory you are willing to keep for cache is another issue, but i guess that should be configurable by a user as well.

For prefetching, dentries and putting them into cache, you can put a minimum limit also configurable by user. Caching all dentries wouldn't be a good idea in a directory, like if a user changes to a directory and you cache all its entries, however since cache memory you've set aside is limited some older entries need to be removed. And then again user goes back to original directory will waste your prefetch.

Prefetching works well where you are sure that's what i think like block reading, overdoing it ofcourse is a bad idea even when reading blocks of disks.

you can have something like

Code: Select all

struct dentry
{
struct dentry *parent;
struct dentry *siblings_list;
struct dentry *children_list;
char name[MAX_LENGTH];
unsigned int flags; /*Like if this dentry can be removed or not, like root directory can't be removed unless umount or a file currently being used is also a candidate for not having its dentry thrown out of cache.*/
};

Code: Select all

siblings_list 
i guess you can have in a tree like structure(AVL), instead of just a list.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: VFS caching

Post by gerryg400 »

zity wrote:So the question is how often the content of a directory is listed and how often a single file is opened.
I'd estimate that on my PC, open (and stat) are at least an order of magnitude more common than listing a directory. Also, I suspect that directory listings are cached in application software (like explorer, finder, nautilus etc.) and so it doesn't matter as much if getting full listings is slow.

I guess it depends on the expectation of your apps.
If a trainstation is where trains stop, what is a workstation ?
User avatar
zity
Member
Member
Posts: 99
Joined: Mon Jul 13, 2009 5:52 am
Location: Denmark

Re: VFS caching

Post by zity »

I think that I'll settle on a solution where I only cache single files and not entires directories. I still gain a measurable performance boost if I don't have to loop through the file path next time the file is opened. The solution is simple but the number of calls to the file system is heavily reduced the second time a file is requested.
Post Reply