Hi,
afr0ck wrote:I am a self-taught CS student and i was reading the book : The DESIGN OF THE UNIX OPERATING SYSTEM. In chapter 4, the author writes about the internal representation of a file system. In that version, the directory structure is just unordered array of { filename, inode} structs. This implies that insertion time complexity will be O(1) but search time complexity is O(N) (Linear search). In order to improve the search time complexity, the author suggests using K-Ary trees or Hashtables (Actually, it is not a suggestion but it is a question to the reader to design such a system). K-Ary trees will make a O(logk(N)) complexity, whereas Hashtables will have a best case of O(1) and an average case of O(K) (Amortized linear search).
I'm not sure if you're talking about the internal representation (in RAM) used by the VFS, or the internal representation (on disk) used by a specific file system.
From my perspective; the VFS layer should be used as a cache, to cache both directory information and file data; and when there's a "VFS cache miss" on a directory the VFS would ask the file system for the entire directory's contents (and not just one thing in that directory). For example, if someone opens the file "/foo/bar/baz.txt" but the directory "/foo/bar" isn't cached in VFS, then VFS would ask the file system to fetch the entire "/foo/bar" directory (and not just the entry for "/foo/bar/baz.txt" alone). This is partly because it simplifies the VFS code (either all directory entries are cached or no directory entries are cached), partly because it's likely that something else in the same directory will be needed soon anyway, and partly because fetching the entire directory is likely to take a similar amount of time as fetching one entry in the directory (because the "on disk" structures used by a lot of file systems put all directory entries together in the same sectors of the disk). Of course the VFS would also cache a few "file system specific location values" for each file and directory on behalf of the file system (e.g. where one of these values might be a "starting cluster number" for FAT, or an "starting LBA of the file's data" for ISO9660, or "LBA for the list of extents" for ext2, or ... ; and where the other might be where the directory information is stored). That way when a file's data isn't cached in VFS but the file's directory info is; when the VFS asks the file system to read or write from the file the VFS can tell the corresponding file system its own "file system specific location values", and the file system code doesn't need to find or cache any information about where a file is itself.
This also means that the data structures used by VFS and the data structures used by file systems have extremely different usage. Specifically, (except for finding free space) the file system's code code will never need to search for or find anything itself, and the file system's "on disk" data structures can be designed knowing that the file system's code code will never need to search for anything - e.g. storing directory entries as a linear array of entries, or a linked list of entries, or (my preference) a linked list of arrays of entries.
afr0ck wrote:I was thinking how to design the Hashtable and the only idea that comes to my mind is using a linear probing scheme. Each directory block will be organized as a Hashtable of struct {filename, inode}. In case of a collision, a linear search will be tempted until we find a free slot otherwise the block is full and a new block is allocated to the directory and that entry will be inserted in the Hashtable of that new block.
For "on disk data structures", to me this sounds horrible (especially for "rotating disk" hard drives where seeks are very expensive). If you're fetching the entire directory from disk you don't want the disk heads to bounce all over the place (as you seek to each different hashtable entry and follow the linked list of entries with the same hash). Of course a file system designed for "rotating disk" (where the main consideration is minimising seeks) is very different to a file system designed for SSD/NVRAM (where seeks are irrelevant).
afr0ck wrote:For the design with N-Ary trees, i really don't have a clear idea how to do that. The reason is because we have no order between the nodes (Order of insertion).
"No order" just means that you're free to choose whatever order you feel like (e.g. maybe alphabetical order of file name, maybe numerical order of creation time, etc), and as soon as you make a choice it's ordered.
afr0ck wrote:If it is a binary search tree or a K-Ary heap, then we could just implement it as an array but there will be a cost associated with every insertion and deletion (That looks like a lot of overhead) and in the case we want to keep the tree balanced that's really another level of overhead. I hear things about using B-trees in file systems (Btrfs) but i don't know if they are used in this context.
What do you propose as a design idea using those different data structures ? And are there any possible flaws with the ideas i mentioned above ? Thanks.
NB: i quote the question from the previously mentioned book : "11 . Design a directory structure that improves the efficiency of searching for path names by avoiding the linear search. Consider two techniques: hashing and n -ary trees."
For VFS (where you're caching entries in RAM, don't care about seeks, and do care about things like CPU data cache misses and TLB misses); I'd optimise data structures for the "everything cached" (because a VFS cache miss will involve actual IO and will be slow regardless of how the VFS's data structures are optimised). I'd probably start with a single large (about half the size of L3 cache, or a few MiB) hash table based on the full path not the file name (e.g. "hash = getHash("foo/bar/baz.txt")" and not "hash = getHash("baz.txt")"); so that (when there's no collision, and if the entry is in the VFS's cache) you can go directly to the entry with the least CPU data cache misses and TLB misses. Directories (which would only be accessed for things like "ls", and wouldn't be used for things like "stat" or "open" because you can go directly to the right entry in those cases) would probably just be a simple array of pointers (because you'd never need to search them).
For a full example, assuming nothing is cached by VFS; if a process asks to open the file "/foo/bar/baz.txt", the VFS would:
- Use the single large hash table to try to find the entry for "/foo/bar/baz.txt", and realise it's not in the cache
- Use the single large hash table to try to find the entry for "/foo/bar", and realise it's not in the cache
- Use the single large hash table to try to find the entry for "/foo", and realise it's not in the cache
- Ask the file system to fetch the entire "/foo" directory and add all that to the cache (while remembering where the entry for "/foo/bar" is when you add that to the cache)
- Ask the file system to fetch the entire "/foo/bar" directory and add all that to the cache (while remembering where the entry for "/foo/bar/baz.txt" is when you add that to the cache)
- Once the entry for "/foo/bar/baz.txt" is found (or not found because it doesn't exist) do whatever you need to do to open the file (create a file handle structure), possibly including telling the file system code to create the file if the file doesn't already exist.
Of course if a process asks to open the file "/foo/bar/baz.txt" and everything is already cached by VFS, the VFS would only:
- Use the single large hash table to try to find the entry for "/foo/bar/baz.txt"
- Once the entry for "/foo/bar/baz.txt" is found (or not found because it doesn't exist) do whatever you need to do to open the file (create a file handle structure), possibly including telling the file system code to create the file if the file doesn't already exist.
Note that for my OS a process' working directory and things like ".." are just user-space things that don't exist as far as the VFS is concerned - the VFS only ever deals will full paths. However; I'm fairly sure that this doesn't work well for *nix due to the completely idiotic/counter-intuitive behaviour of things like (e.g.) "cd .." in the presence of symbolic links (e.g. where "cd /foo/.." may not take you to the root directory like any sane person would assume, but could take you to the directory "/unexpected/disaster" because "foo" happened to be a symbolic link that pointed to the directory "/unexpected/disaster/here").
Cheers,
Brendan