Filesystem mounting

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Caleb1994
Member
Member
Posts: 83
Joined: Sun Feb 13, 2011 4:55 pm

Filesystem mounting

Post by Caleb1994 »

Hey, been gone for a while. I found that the mix in coding style from different tutorials made debugging a pain in the butt, so I decided to go back and start from scratch with my OS. I had originally named my kernel "SimpleKernel" since I didn't plan on going very far. I have since renamed it "NSSKernel" or "Not So Simple Kernel" since I've realized how even a tiny kernel is definitely not simple lol. Like I said, I rewrote my entire kernel from scratch. I now have a higher half (linked at 0xC0000000) kernel with basic TTY support similar to that of Linux. I have completely redesigned my paging interface, and have written a completely new scheduler. I have multitasking up and running, and am now moving on to an abstract filesystem design (I'm planning on linking my INITRD to the abstract filesystem so I can test loading executable's without dealing with filesystems and devices).

I am following the Unix style paths with mounting/umounting on arbitrary directories. I currently have a very basic fs_node structure with the readdir, finddir, open, close, etc... function pointers, and two pointers to these nodes in each task. One for the root filesystem, and one for the current working directory. I am trying to decide how I'm going to handle mounting, and I've come up with this structure:

Code: Select all

struct mount_point
{
	struct fs_node* src; // Mount source
	struct fs_node* dst; // Mount point
	u32 flags; // Mount flags
	
	struct list_head* link; // Link to the next mount point
};
where "src" is the directory/device node that was mounted at "dst"
(P.S. yes, I'm using the linux list_head implementation for lists. It is so nice, I couldn't resist :P)

I first put a list of these in each task, but then I realized that would be a bad idea, since one process could mount without the rest of the system knowing :P so I decided that a global mount_point list was needed.

with this design, each time a path was referenced, you would have to check each directory against the mount_point list to see if it was mounted somewhere else. If it was, swap your pointer with the mount_point::src. Like this:

Code: Select all

dir = check_mounts(dir);
and check_mounts() would either return the same pointer, or a pointer to a different directory node which was mounted to "dir". The only problem is that this would need to be done for every directory in every path that is parsed. :/ (granted on my machine there are only 6 things actually mounted, 7 or 8 sometimes, but still).

My question: would this be a good design? If not, do you have an idea for a better design?
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Filesystem mounting

Post by jnc100 »

In general your os will be asked to perform an operation on a file or directory which is identified by a path in the form of a string. This string may contain things like .., . or entries which resolve to symbolic links. The first thing you need to do is parse this string into a 'hard path' (one without any of the above). Then you need some (quick) way of splitting this path into a mount point (i.e. a device) and the subdirectory below that. To do this, you need to check whether the whole path is a mount point (if so return mount point + subdirectory = "/"), else split it at the last '/' and try again, and keep going until you identify a mount point.

e.g.

mounts:
hd1 on /mnt
initrd on /

search for path /mnt -> is /mnt a mount-point? yes -> device = hd1, subdirectory = /
search for path /mnt/file1 -> is /mnt/file1 a mount-point? no -> is /mnt a mount-point? yes -> device = hd1, subdirectory = /file1
search for path /dev/null -> is /dev/null a mount-point? no -> is /dev a mount-point? no -> is / a mount-point? yes -> device = initrd, subdirectory = /dev/null

I find the quickest way to search for these is to keep your list of mount points as a hash table, indexed on the path of the mount-point (as a string).

Regards,
John.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Filesystem mounting

Post by gerryg400 »

I keep my mount points in a tree in my RFS (root file system) server. When looking for a file I start at the root of the RFS and traverse the RFS tree. If I hit a mount point I send the request off to the server responsible.
If a trainstation is where trains stop, what is a workstation ?
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Filesystem mounting

Post by jnc100 »

gerryg400 wrote:I keep my mount points in a tree in my RFS (root file system) server. When looking for a file I start at the root of the RFS and traverse the RFS tree. If I hit a mount point I send the request off to the server responsible.
I may be misunderstanding you but what if you have a device mounted on /a, and then another mounted on /a/b, and request file /a/b/c. Would not starting at the root and searching that way identify this as file /b/c on device 'a' and erroneously send the request to the server for device 'a'?

Regards,
John.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Filesystem mounting

Post by gerryg400 »

Hmm, you're right. So I checked my code...

Code: Select all

    while (*p) {
        if (*p == '/') {
            ++p;
            continue;
        }
        /* Look in the current dir for the entry and walk the tree as far
         * as we can
         */
        n = search_dir(curr_node, p);
        if (n == NULL)
            break;

        /* We continue to the next level */
        p = nameskip(p);
        curr_node = n;
    }
It goes as far as it can finding the deepest match. I didn't know that anyone read my posts !! I'll have to be more careful from now on.
If a trainstation is where trains stop, what is a workstation ?
Caleb1994
Member
Member
Posts: 83
Joined: Sun Feb 13, 2011 4:55 pm

Re: Filesystem mounting

Post by Caleb1994 »

gerryg400 wrote:Hmm, you're right. So I checked my code...

Code: Select all

    while (*p) {
        if (*p == '/') {
            ++p;
            continue;
        }
        /* Look in the current dir for the entry and walk the tree as far
         * as we can
         */
        n = search_dir(curr_node, p);
        if (n == NULL)
            break;

        /* We continue to the next level */
        p = nameskip(p);
        curr_node = n;
    }
It goes as far as it can finding the deepest match. I didn't know that anyone read my posts !! I'll have to be more careful from now on.
That's actually very similar to the code I whipped up earlier today. Here is what I wrote (untested, more like psuedo):

Code: Select all

/*! find a node within a specified node (node must be a directory) */
struct fs_node* find_node(struct fs_node* node, const char* path)
{
	if( !(node->flags & FS_DIRECTORY) ) return 0;
	char name[64] = {0};
	
	const char* next = strchr(path, '/');
	while( next ){
		memcpy(name, path, int(next-path));
		name[int(next-path)] = 0;
		if( !(node->flags & FS_DIRECTORY) ) return 0;
		node = check_mount(finddir_fs(node, name)); // Find the item, and check if it is a mount point!
		if( !node ) return 0;
		path = next+1;
		next = strchr(path, '/');
	}
	/*! No more slashes. All that's left is the last name, or there was a leading slash. */
	if( path[0] == '\0' ) return node;
	else{
		if( !(node->flags & FS_DIRECTORY) ) return 0;
		node = finddir_fs(node, path); // This could return null, if so just return it
	}
	
	return node;
}
Edit yes I know, the memcpy of the name isn't good. I was thinking of changing so that it just changes the '/' pointed to by 'next' to a '\0' then finds the node, and changes it back to a '/'. That way finddir could use path like a normal null terminated string with no '/'s.

in that, the initial node would be a directory to search for a file/directory in. It goes down the path (from the start) and searches "node" for that file/directory, until it either can't find it (that file/directory doesn't exist) or there are no more "/"s in which case, we just need to check the last file in the path to make sure the requested file/directory exists. All of this is done through the node->finddir function (here we use finddir_fs which is a wrapper in case 1. finddir isn't implemented and 2. so you don't have to write node twice (node->finddir(node, path)) ) which should return the struct fs_node* for a given file name in a given directory.

In that code, check_mount checks to see if the new directory node matches any mount points. If it doesn't, it returns the node as returned by finddir_fs, but if the node is a mount point, it returns the node that was mounted there (see struct mount_point above). Here is the source to that function:

Code: Select all

/*! Check all the mount points to see if this node should redirect
	This might be a slow way, but there should never be many mount points
	anyway. (On my current slackware machine, there are 6 mount points */
struct fs_node* check_mount(struct fs_node* node)
{
	if( !node ) return 0;
	
	if( !(node->flags & FS_DIRECTORY) ) return node; // Not a directory, just return the node
	
	struct list_head* iter = 0;
	struct mount_point* mp = 0;
	list_for_each(iter, &mount_list){
		mp = list_value(iter, struct mount_point, link);
		if( mp->dst == node ) return mp->src; // if the mount point is this node, return the mounted node
	}
	
	return node; // Otherwise return the original node
	
}
How does that look?
jnc100 wrote:I find the quickest way to search for these is to keep your list of mount points as a hash table, indexed on the path of the mount-point (as a string).
well then I'd need to implement a hash table... :roll: lol

Your way does make sense if I wanted a device and path, but I don't want to deal with devices. Only fs_node's so (for the sake of abstraction), there is really no need to have any path left at all is there? If I keep a list of what node is mounted where, then if I really do need the device, there should be a way to get it through the driver and node.

Actually... if, using your method, instead of returning a device, I just return a node for that device (that might be what you meant). So it would be a hash table of strings, which point to fs_node's. On a side note, wouldn't that make it (theoretically) possible to mount a device/directory on a completely bogus path. Like lets say we did this:

mount /dev/sda1 /these/dont/exist

and it didn't check to see if the path "/these/dont/exist" was valid, and just put the string and node in the hash. Then when you did something like:

cat /these/dont/exist/text.txt

it would just look for "/these/dont/exist" in the mount hash table, and give you a valid fs_node. I don't see the usefulness of this, and I assume there would be a lot of problems but it's interesting to think about.
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Filesystem mounting

Post by jnc100 »

Caleb1994 wrote:wouldn't that make it (theoretically) possible to mount a device/directory on a completely bogus path
That is why, as you noted, you should always check the mount point is a viable path. The reason for identifying a device + path is if you write a microkernel and therefore want to dispatch a request for a particular file to the appropriate filesystem driver process. Yes it is entirely possible to build a tree of all file system objects within the scope of the vfs itself (possibly with links to drivers for underlying devices within each node) but that would depend on your design philosophy. In my idea of a microkernel the user library functions open/close/read/write etc (assuming you strive for posix compliance) should send a message to the vfs with the path of the file and the operation to perform on it which the vfs process then forwards to the appropriate driver. IMHO comparing parts of the provided path to a hash table of strings is one of the fastest ways for the vfs to decide which device to forward the message to. It is obviously not the only way.

Regards,
John.
Caleb1994
Member
Member
Posts: 83
Joined: Sun Feb 13, 2011 4:55 pm

Re: Filesystem mounting

Post by Caleb1994 »

jnc100 wrote:Yes it is entirely possible to build a tree of all file system objects within the scope of the vfs itself (possibly with links to drivers for underlying devices within each node) but that would depend on your design philosophy.
That is my design, ATM. As I have not really written any code for drivers, this could change, but for now, it seems like a good design. The way I see it, it makes accessing the driver functions as simple as pNode->open(...) or pNode->readdir(...) etc... If you have any specific down sides, I would love to hear them. Like I said, I haven't gotten to a formal driver interface as of yet. Right now, my initrd setup function returns the root node of the initrd with the correct function pointers set already, and any node created sets those pointers, so it's not a formal driver, but I guess you could call it that :P
jnc100 wrote:In my idea of a microkernel the user library functions open/close/read/write etc (assuming you strive for posix compliance) should send a message to the vfs with the path of the file and the operation to perform on it which the vfs process then forwards to the appropriate driver.
Well, my design is more monolithic. I was planning on things like this to be part of the kernel (to my understanding, that is what you would call a "Monolithic Kernel"), but I see you point anyway. Aside from the microkernel part, could you give any advantages of splitting the path into a device and a local path, and then sending it to the driver, as opposed to finding a node, and having the node hold pointers to the driver functions?
jnc100 wrote:IMHO comparing parts of the provided path to a hash table of strings is one of the fastest ways for the vfs to decide which device to forward the message to. It is obviously not the only way.
I wrote up a basic hash table library, which will just allocate an array of (void*)'s of a specified length, and then you can add a value like so:

Code: Select all

hash_t hash_add(struct hash_table* table, const void* key, void* data);
Each hash table has a pointer to a function of this type:

Code: Select all

typedef hash_t(*hash_func_t)(struct hash_table* table, const void* key);
which returns the hash of the specified key (it's set up so that the key doesn't necessarily have to point to a string, but could point to anything provided the correct hash function).

my current hash function (for a string) is this:

Code: Select all

hash_t hash_string(struct hash_table* table, const char* str)
{
	hash_t hash = 0;
	hash_t x = 0;
	
	while(*str){
		hash = (hash << 4) + *str;
		if( (x = hash & 0xF0000000L) == 0 )
			hash ^= (hash >> 24);
		hash &= ~x;
		*str++;
	}

	return hash % table->size;
}
It's called the "UNIX ELF Hash." This was an arbitrary choice. It seems to work well with a 128 item hash. I tested it with some names like "/mnt" and "/mnt/cdrom" and "/mnt/windows" etc... no collisions, but that doesn't mean there won't be. I have seen suggested that in each has item I keep a linked list of all items that match that table index, but that would slow down the algorithm (since I would first have to hash the string, then search the list, and compare the original string with each item in the list) I don't like this at all [-X lol

Will this hash function/design work? What do you think? I definitely don't want lost mount points :D
fronty
Member
Member
Posts: 188
Joined: Mon Jan 14, 2008 5:53 am
Location: Helsinki

Re: Filesystem mounting

Post by fronty »

Caleb1994 wrote:(since I would first have to hash the string, then search the list, and compare the original string with each item in the list)
Or just make your hash function return just plain hash, not mod size, and store the hash in nodes in the list. If there's several items in the list, compare first the original hashes and if they're too the same, then compare the strings. Comparing integers is fast, and that would cut at least some of the string comparisons (don't have any statistics, but I've always seen things done like that).
Caleb1994
Member
Member
Posts: 83
Joined: Sun Feb 13, 2011 4:55 pm

Re: Filesystem mounting

Post by Caleb1994 »

Code: Select all

You have a bug in elf_hash implementation
You are correct! It's odd that it still worked... I feel dumb... haha

Code: Select all

Do you know how hash collisions are resolved?
I've been searching online. I have found things like the linked list approach above. Yesterday, I also came across double hashing (if this is common, I've never had to use hashes before... hahaha). Most things I read suggest a secondary hash function of:

h2(k) = p - (k % p)

where p is "a prime less then m (the size of the hash)"

With double hashing, I also saw linear, and quadratic, but it appears that the general consensus is that double hashing is faster then linear or quadratic.

I have also seen people keeping a secondary buffer, and when they find a collision searching that whole buffer, but I think if I wanted to search a list, the linked list implementation would be faster (since you only search the items that hashed to the entry you are looking at).
fronty wrote:Or just make your hash function return just plain hash, not mod size, and store the hash in nodes in the list. If there's several items in the list, compare first the original hashes and if they're too the same, then compare the strings. Comparing integers is fast, and that would cut at least some of the string comparisons (don't have any statistics, but I've always seen things done like that).
That makes sense. I think this is looking like the best way. Because things like linear, quadratic, or double hashing all still require searching the list as whole, but the linked list only searches items that have collided with each other instead of any items that happen to have been put in a specific entry.
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Filesystem mounting

Post by jnc100 »

Caleb1994 wrote:Well, my design is more monolithic. I was planning on things like this to be part of the kernel (to my understanding, that is what you would call a "Monolithic Kernel"), but I see you point anyway. Aside from the microkernel part, could you give any advantages of splitting the path into a device and a local path, and then sending it to the driver, as opposed to finding a node, and having the node hold pointers to the driver functions?
No, if the whole vfs tree is cached in memory accessible by all the drivers anyway then dealing purely in nodes as you do is perfectly efficient.

I would advise you to keep a list of mounted shares somewhere though (albeit not necessarily as a hash table). This is especially important to throw errors should the user try something along the lines of: mount /dev/hda1 /mnt; mount /dev/hdb1 /mnt/another_mount; umount /mnt - i.e. check that you are not unmounting something that another mount depends on. You would also similarly want to keep a list of open files to prevent the user unmounting a filesystem that had a file in use.

Regards,
John.
Caleb1994
Member
Member
Posts: 83
Joined: Sun Feb 13, 2011 4:55 pm

Re: Filesystem mounting

Post by Caleb1994 »

jnc100 wrote:No, if the whole vfs tree is cached in memory accessible by all the drivers anyway then dealing purely in nodes as you do is perfectly efficient.
Yeah, currently I have a function like "struct fs_node* aquire_node(struct fs_node*)" which will either 1. Allocate a new node and return it (if the parameter was NULL) or 2. increment the reference count of the passed node. Then, "u32 release_node(struct fs_node*)" will check the reference count and delete the node from memory if necessary. It always returns the node reference count from after the call. So if the return value is 0, the node was free'd. Otherwise, its reference count was decremented. The VFS is allocated on the kernel heap, so the drivers should be able to access them.

That way, all nodes are allocated and free'd in one place, and I don't have to worry about drivers allocating nodes with me knowing it. (Well I guess they still could, but this should make it a little easier to maintain :P).
jnc100 wrote:I would advise you to keep a list of mounted shares somewhere though (albeit not necessarily as a hash table).
I was thinking of just grabbing the array at "mount_hash->arr" and just searching that, instead of keeping another array. I would have to search each list inside the array (I went with the linked list approach. It seemed the safest to me), but that way I don't have to maintain two arrays.


Write now, I have the VFS working. I wrote up a little kernel test (I have a set of functions called kernel_*_test which test different functionalities). Here's an image!
Attachments
Exploring my VFS
Exploring my VFS
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Filesystem mounting

Post by Solar »

jnc100 wrote:I would advise you to keep a list of mounted shares somewhere though (albeit not necessarily as a hash table). This is especially important to throw errors should the user try something along the lines of: mount /dev/hda1 /mnt; mount /dev/hdb1 /mnt/another_mount; umount /mnt - i.e. check that you are not unmounting something that another mount depends on.
This might be the right time to mention that once upon a time - before the Church of GNU came to bless us all - there actually was some dispute whether the "single root" concept is a good idea after all.

Of course, today you're burned at the stake if you bring up the topic, because due to some strange reflex the GNUists believe that not having a single root would somehow automaticly mean having drive letters - which, of course, is equally brain-dead. And of course, GNUists are always right.
Every good solution is obvious once you've found it.
Caleb1994
Member
Member
Posts: 83
Joined: Sun Feb 13, 2011 4:55 pm

Re: Filesystem mounting

Post by Caleb1994 »

Solar wrote:
jnc100 wrote:I would advise you to keep a list of mounted shares somewhere though (albeit not necessarily as a hash table). This is especially important to throw errors should the user try something along the lines of: mount /dev/hda1 /mnt; mount /dev/hdb1 /mnt/another_mount; umount /mnt - i.e. check that you are not unmounting something that another mount depends on.
This might be the right time to mention that once upon a time - before the Church of GNU came to bless us all - there actually was some dispute whether the "single root" concept is a good idea after all.

Of course, today you're burned at the stake if you bring up the topic, because due to some strange reflex the GNUists believe that not having a single root would somehow automatically mean having drive letters - which, of course, is equally brain-dead. And of course, GNUists are always right.
Hahaha, well right now I do technically have a mount point at "/" but I also have a member in task_s called "root" which is the effective root node for that task. At boot, the initrd is mounted to "/" and when tasking is enabled, the first task gets tsk->root set to the mount point found at "/". Then any subsequent calls to fork inherit the root and cwd (current working directory) fields of the current process. Upon a call to an execve like function, I plan on setting the "new" tasks root field to the node mounted to "/". Although, now that I think of it, I'm not sure whether the "new" task created by execve should inherit the root field or not. The root field is mostly just for things like "chroot". But since *technically* it's a new task I don't think it should inherit what chroot did to the old task... Maybe it should... idk I haven't needed to use chroot much in a working system, so I don't know what the norm is.

I can kind of see what they mean though. If you don't have one single root tree that everything is mounted on, then you have to reference multiple root trees in some way. You could reference them by device name like "sda1/..." but you would need a way to distinguish between device names and immediate directory names... which is the essentials of drive lettering. No matter how you designated it ( "[sda1]/" or "sda1:/" or "drive=sda1/" off the top of my head ) it would be almost identical to drive lettering. You might as well make it easy on yourself at that point and shorten the drive names to letters, and incorporate the ":". I'm not saying that there are only two ways to design it I'm merely stating the fact that without one single root node, you have to have a logical way to reference different drives. The simplest way that I can see or think of, off the top of my head, is drive letters. They are short, and easy to process.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Filesystem mounting

Post by Kevin »

Can you detail some more of what kind of multi-root layout you're thinking, Solar? If you use it for the same purpose as drive letters, it seems mostly equivalent with a single "super root" that contains all the roots as subdirectories. So you might have more than that in mind?

tyndur does have multiple roots, and it actually makes a difference there because of another mechanism. While file:/ could well be represented as /file/, things become a bit more complicated when you use a resource as an input for another service. For example you could access the content of an ISO image by using the path file:/test.iso|iso9660:/ - and that seems to be hard to represent with a Unix style layout because /file/test.iso would already be a file. Even if you could magically turn it into a directory, it would have to contain all services that can use this file as input, which isn't obvious either. So there's certainly a difference here, but it's only partially related to the multi-root nature.

(No, I'm not absolutely convinced that this system is a good idea, but I guess it's relatively unique at least ;))
Developer of tyndur - community OS of Lowlevel (German)
Post Reply