Root-Level Filesystem Dilemma

Rhob · Post by **Rhob** » Fri Feb 22, 2013 11:23 am

Hi everyone,

I've recently gotten back into OS development. I've written a custom bootloader and code that has successfully done things like enter protected mode, enumerate the PCI bus, and perform I/O on a parallel ATA hard drive in PIO mode. I've now started writing a protected-mode kernel in C++.

One dilemma I have at this point is how best to organize the (virtual) filesystem at the root level. I say "dilemma" because I've narrowed the question down to two choices. The first choice is the standard Unix model, where the filesystem containing the OS is set up as the root file system per se, and all other filesystems that may exist are mounted to it. The other choice is what I'm calling the CP/M model, where each filesystem encountered is mounted separately under an implicit root.

So what I'm wondering is whether you guys think there are any clear advantages that one choice has over the other. With that said, please keep in mind the following:

I'm not trying to conform to any existing OS standard (e.g. POSIX).
The target environment is home desktop computers.

Thanks in advance for your input.

SDS · Post by **SDS** » Fri Feb 22, 2013 11:47 am

Depends on how you want users/software to interact with it.

If you have everything mounted under an implicit root, then working with that structure is not very different programatically than working with a real root filesystem. In that context, I would be tempted to just use a real root filesystem so that you maintain uniformity in your access routines - why add part of your filesystems that work in a different way?

Regarding users, it depends on how you want to present things. If you want different storage components to be entirely logically separate, then that is the way to go. However, this makes no sense if you want to have the capacity to have logical sub-mounts...

Rhob · Post by **Rhob** » Fri Feb 22, 2013 12:46 pm

Thanks for your reply, SDS.

SDS wrote:If you have everything mounted under an implicit root, then working with that structure is not very different programatically than working with a real root filesystem. In that context, I would be tempted to just use a real root filesystem so that you maintain uniformity in your access routines - why add part of your filesystems that work in a different way?

I'm not sure I understand this. I guess I don't see how the "CP/M model" would make my access routines non-uniform. Could you explain?

SDS wrote:Regarding users, it depends on how you want to present things. If you want different storage components to be entirely logically separate, then that is the way to go. However, this makes no sense if you want to have the capacity to have logical sub-mounts...

From what I've read (which is by no means exhaustive - I'm still quite a noob at this), the Unix filesystem design was closely tied to its original target environment, namely a single (relatively powerful) computer shared by many users accessing it through individual terminals. In such an environment, the single computer's hardware is typically maintained by a special group of users, while the rest of the users don't have to care about the hardware at all.

Contrast that with the typical home computer setup, where there's a single user at a time, and the (main) user is much closer to the hardware. I think the typical user of a home computer cares much more (at least) about which physical devices are hosting which of his files. I also doubt that he'd divide any of his hard drives into multiple partitions - the most common case of that IMO is if he wants to boot two or more operating systems from the same drive, and even that case seems to be pretty rare.

My overriding concern with this design issue is presenting things to users and software in ways that are both consistent and intuitive.

Brendan · Post by **Brendan** » Sat Feb 23, 2013 12:46 am

Hi,

Rhob wrote:So what I'm wondering is whether you guys think there are any clear advantages that one choice has over the other.

The biggest disadvantage of the CP/M model is that minor changes break everything. For example, imagine you've got one file system that contains everything, but the disk is getting full, so you buy another hard drive and plug it in. For the CP/M model you'd start moving files from one drive to another (to make room on the first drive), causing all sorts of things to break because the paths to a lot of files change (e.g. "C:/foo/bar/" becomes "D:/foo/bar", so any script or software that expects files to be in "C:/foo/bar" becomes broken).

For the Unix model, you can move files and directories from "/foo" to the new disk, then mount the new disk at "/foo". In this case everything looks the same and nothing breaks at all (e.g. "/foo/bar/hello" is still "/foo/bar/hello" even though it's on a completely different file system now).

Basically, for the purpose of system administration (especially for servers, etc), for "non-removable media" you'd want to use the Unix model.

In my opinion; removable media is entirely different, and both the Unix model and the CP/M model are bad. For example, imagine if you've got 4 different USB flash sticks and plug them all into your computer in a random order. For the CP/M model you might get "F:", "G:", "H:" and "I:", and there's no sane/easy way to figure out which one corresponds to which USB flash stick. For the Unix model it's typically a mess - you end up with something (scripts?) to auto-mount the devices to avoid hassle, and you might end up with "/mnt/usb1", "/mnt/usb2", etc (and still have no sane/easy way to figure out which one corresponds to which USB flash stick).

For removable media you want to use labels that are stored on the device/file system itself. For example you might create a file system on a USB flash stick and give that file system the label "my_black_and_silver_stick", so that it always becomes "/mnt/my_black_and_silver_stick" and it doesn't matter which USB port you plug it into or which order you plug USB devices in.

Cheers,

Brendan

Combuster · Post by **Combuster** » Sat Feb 23, 2013 12:54 am

Brendan wrote:For the Unix model it's typically a mess - you end up with something (scripts?) to auto-mount the devices to avoid hassle, and you might end up with "/mnt/usb1", "/mnt/usb2", etc (and still have no sane/easy way to figure out which one corresponds to which USB flash stick).

Across my linux machines, they're pretty consistently mounted at /media/FAT_VOLUME_NAME_HERE, so that's already an issue of the past...

Rhob · Post by **Rhob** » Mon Feb 25, 2013 10:16 am

Brendan wrote:The biggest disadvantage of the CP/M model is that minor changes break everything. For example, imagine you've got one file system that contains everything, but the disk is getting full, so you buy another hard drive and plug it in. For the CP/M model you'd start moving files from one drive to another (to make room on the first drive), causing all sorts of things to break because the paths to a lot of files change (e.g. "C:/foo/bar/" becomes "D:/foo/bar", so any script or software that expects files to be in "C:/foo/bar" becomes broken).

For the Unix model, you can move files and directories from "/foo" to the new disk, then mount the new disk at "/foo". In this case everything looks the same and nothing breaks at all (e.g. "/foo/bar/hello" is still "/foo/bar/hello" even though it's on a completely different file system now).

That's a really good point. For some reason, I hadn't thought of it already. But I completely agree, so I don't see any good reason now to favor the CP/M model over the Unix model. I really appreciate your input!

Brendan wrote:Basically, for the purpose of system administration (especially for servers, etc), for "non-removable media" you'd want to use the Unix model.

Agreed. Although I'm not targeting servers, using the Unix model would make it easier to do so in the future, so that's an added bonus.

Brendan wrote:In my opinion; removable media is entirely different, and both the Unix model and the CP/M model are bad. For example, imagine if you've got 4 different USB flash sticks and plug them all into your computer in a random order. For the CP/M model you might get "F:", "G:", "H:" and "I:", and there's no sane/easy way to figure out which one corresponds to which USB flash stick. For the Unix model it's typically a mess - you end up with something (scripts?) to auto-mount the devices to avoid hassle, and you might end up with "/mnt/usb1", "/mnt/usb2", etc (and still have no sane/easy way to figure out which one corresponds to which USB flash stick).

For removable media you want to use labels that are stored on the device/file system itself. For example you might create a file system on a USB flash stick and give that file system the label "my_black_and_silver_stick", so that it always becomes "/mnt/my_black_and_silver_stick" and it doesn't matter which USB port you plug it into or which order you plug USB devices in.

Agreed here as well. Thanks again!

OSChan · Post by **OSChan** » Mon Feb 25, 2013 9:55 pm

I like the CPM model better if you don't stick with the crappy drive letter nonsense. no reason why you can't refer to a drive as the volume label or a named device.

For example you could have a USB labelled my_black_and_silver_stick and access files as my_black_and_silver_stick:my/directory/file
You can refer to USB drives as USB0: and USB1: instead of F: or whatever.

I just feel the unix way is sloppy and confusing. especially since you can have files and directories under the /mnt directory along with mounted filesystems. then what is on main hd or what is on something mounted? sure that's awesome power for server, but confusing for end-user.

turdus · Post by **turdus** » Wed Feb 27, 2013 7:47 am

I've came around this problem, I support both path styles in a simple way.

In order to do that, I broke POSIX compatibility, and allow two entries with the same name in a directory. One must be a file, and other must be a directory (or mount point as a matter of fact).
Now if you need CP/M style path and no device given, I add the prefix "root:", which stands for root ramdisk (holding the root filesystem, google for "File Hierarchy Standard" if you don't know what it is). If you need UNIX style path, and device given, I do a simple string manipulation: (dev):(path) -> /dev/(dev)/(path) or if device file does not exists, -> /mnt/(dev)/(path). Let's assume you have a device called disk1:
/dev/disk1 - it's a file, provides raw access to the device
/dev/disk1/ - it's a directory, auto mounted on request.
As for the labels, the mounter creates symlinks for removable devices, like /mnt/my_black_and_silver_stick/ -> /dev/disk1/ and deletes them on umount.

As a result, all of these paths refer to the same directory on disk1:
disk1:/folder
/dev/disk1/folder
root:/dev/disk1/folder
/mnt/my_black_and_silver_stick/folder
my_black_and_silver_stick:/folder

For removable media, I prefer CP/M style paths, and use UNIX style paths otherwise.

To push it to the limits, and gain full disk order and label independence, I also allow the use of GPT partition UUID's in paths: uuid:(uuid)/(path). If no GPT found on a disk, UUIDs are generated for each fs on the disk. I use a bijective algorithm for that, meaning it always generates the same UUID for the same disk/partition. Using such a hash for every fs behind the curtain makes path lookups very fast and easy.

Rhob · Post by **Rhob** » Thu Feb 28, 2013 10:09 am

Hey Brendan,

I thought more about what you said and I've concluded that the argument you brought up is more limited than I first thought. With the example you gave, everything would work fine under the Unix model as long as a single existing directory is moved to the new disk. But if I wanted to move files from both "/foo" and "/bar" to the new disk, then (assuming I still mounted the new disk at "/foo") the paths to my files would become "/foo/foo" and "/foo/bar", respectively, and things would still break.

What are your thoughts on this? Please correct me if I'm wrong here, as I'm not a Unix expert (I'm mainly a Windows user).

Combuster · Post by **Combuster** » Thu Feb 28, 2013 10:21 am

That's because filesystem support IRL generally lacks the ability to mount anything but it's root folder, and filesystems themselves were never designed to be dynamically moved.

The first item on the list can be fixed if you give it some thought.

.

On the other hand, many systems nowadays provide a system/library call to retrieve the RO/RW/cache folders associated with an application, and especially on cellphones their use is pretty much enforced in practice unlike windows where every nut can hardcode "C:\Program Files\" and get away with it for quite a while.

Rhob · Post by **Rhob** » Thu Feb 28, 2013 12:03 pm

Combuster wrote:That's because filesystem support IRL generally lacks the ability to mount anything but it's root folder, and filesystems themselves were never designed to be dynamically moved.

So my understanding is correct?

On the other hand, if filesystems were never designed to be dynamically moved, then what's the point of the mount and umount commands in Unix? Or am I misunderstanding what you mean by "dynamically moved"?

Combuster wrote:The first item on the list can be fixed if you give it some thought. .

That assumes I think it's a problem in the first place.

Combuster wrote:On the other hand, many systems nowadays provide a system/library call to retrieve the RO/RW/cache folders associated with an application, and especially on cellphones their use is pretty much enforced in practice unlike windows where every nut can hardcode "C:\Program Files\" and get away with it for quite a while.

For one thing, I'm not targeting cell phones in the slightest. For another, I'm not sure what you mean by "the RO/RW/cache folders associated with an application". To me that implies some sort of standard or convention regarding the names and locations of those folders - is that correct?

Combuster · Post by **Combuster** » Thu Feb 28, 2013 12:51 pm

Rhob wrote:if filesystems were never designed to be dynamically moved, then what's the point of the mount and umount commands in Unix?

Changing the mount point is a different thing - and generally not useful either. Moving 200GB of data on the harddisk because the filesystem is physically stored in the wrong place is just not done, and only really provides a fix for people that only install windows (which takes just everything) and are nevermore able to find space on their harddrive to install linux.

That process comes with that feature that if you stop halfway, the filesystem doesn't know a thing about what happened and you simply lose everything. Hence, filesystems aren't movable at a whim, let alone 10 times a day.

Rhob · Post by **Rhob** » Fri Mar 01, 2013 2:40 pm

Okay, so you're talking about file systems being able to be moved physically while staying in the same place logically. Then yeah, changing the mount point is different from that - it's the exact opposite.

Anyway, I think we've gotten rather far afield of where I wanted this discussion to go. To reiterate, my concerns are consistent and intuitive presentation to both users and application software. Since I'm targeting home desktop computers, I'm trying to tailor my use-cases around them. I can readily envision a home desktop computer user running out of space on his hard drive, buying an additional one, and moving part of his existing file structure onto the new drive to make room on the old one. If anyone thinks this use case is unlikely, please let me know. Again, I'm not concerned with the type of system administration that typically happens on Unix servers.

That being said, I've thought some more about what you (Combuster) said here:

Combuster wrote:That's because filesystem support IRL generally lacks the ability to mount anything but it's root folder

Right, mounting has traditionally (at least) been done only on entire file systems, which involves mapping the file system's local root directory to a global non-root directory. For the ability to mount non-root directories, I suppose a higher level of abstraction would be needed - i.e. treat a mounted non-root directory as a (logical) file system in its own right. I think the structure would look like this:

Code: Select all

Physical Device -> Physical File System -> Logical File System

Does that make sense?

Either way, I'm still wondering about 1) how easy such an ability would/could be for home desktop users to use, and 2) how much of a need those users would have for it.

Combuster · Post by **Combuster** » Fri Mar 01, 2013 3:11 pm

The desired end result as described would look something like the following:

Code: Select all

Dir "c:\program files\"
app1 -> filesystem_1\program files\app1
app2 -> filesystem_1\program files\app2
game1 -> filesystem_2\program files\game1
game2 -> filesystem_3\program files\game2

On an *nix system it's completely possible to demonstrate this with symlinks, and you can actually move around stuff without anything noticing by copying a folder to a different filesystem and pointing the symlinks to the new location.

The real problems start to occur when you try to move something that's in use, or when a power failure happens. The system you'd probably make would need to take care of such scenario's at a more fundamental level, and you'll have to break with current implementations somewhere to make this work safely and transparently.

1) how easy such an ability would/could be for home desktop users to use

Typical windows installs gather so much junk that many people are better off reinstalling it. Beyond that, it takes some good design of both user interface and implementation for this to be ever workable for end users. Most programmers can't deal with my mom as an end user.

how much of a need those users would have for it.

There'll probably be a need, as even I have had that problem on occasion. The question is if your time is better spent on making the world a better place elsewhere.

Rhob · Post by **Rhob** » Fri Mar 01, 2013 3:58 pm

Combuster wrote:The desired end result as described would look something like the following:
Code: Select all
Dir "c:\program files\"
app1 -> filesystem_1\program files\app1
app2 -> filesystem_1\program files\app2
game1 -> filesystem_2\program files\game1
game2 -> filesystem_3\program files\game2

Are you saying here that e.g. "/app1" would map to "filesystem_1/program files/app1"? Or what? Sorry but it's a bit cryptic to me.

Combuster wrote:On an *nix system it's completely possible to demonstrate this with symlinks, and you can actually move around stuff without anything noticing by copying a folder to a different filesystem and pointing the symlinks to the new location.

True, symbolic links are another way to deal with the issue, although I think using them would be a bit more complicated than what I have in mind. To me it's still a question of 1) what common use cases exist for home desktop computer users and 2) what would be too confusing for the average home desktop computer user.

Combuster wrote:The real problems start to occur when you try to move something that's in use, or when a power failure happens. The system you'd probably make would need to take care of such scenario's at a more fundamental level, and you'll have to break with current implementations somewhere to make this work safely and transparently.

To be honest, those issues are way off my radar. But as I mentioned in my OP, I'm not worried in the slightest about breaking with current implementations. I'm not trying to make a Windows clone, a Unix clone, or any other kind of clone.

Combuster wrote:Typical windows installs gather so much junk that many people are better off reinstalling it.

I'm not sure what you mean by this, sorry.

Combuster wrote:Beyond that, it takes some good design of both user interface and implementation for this to be ever workable for end users. Most programmers can't deal with my mom as an end user.

Agreed, but I think that's another issue entirely. What I'm thinking right now is that the Unix model (i.e. unified file system) is at least marginally more flexible, the CP/M model (i.e. each file system presented separately) may be more transparent - and thus easier to use - for home desktop computer users, in spite of the lesser flexibility.

There'll probably be a need, as even I have had that problem on occasion. The question is if your time is better spent on making the world a better place elsewhere.

Whether my time is better spent on making the world a better place elsewhere is entirely up to me.

OSDev.org

Root-Level Filesystem Dilemma

Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma

Re: Root-Level Filesystem Dilemma