OSDev.org

Posted: **Fri Jan 13, 2006 10:03 am**

Your talk of directory and file name limits made me think of something. Isn't it incredibly wasteful repeating the entire path in the name of every single file?

Posted: **Fri Jan 13, 2006 10:24 am**

Hi,

I've put together a "preliminary draft specification". This draft is incomplete - it currently lacks a copyright/licencing notice and the "Algorithms" section. Everything else is (hopefully) adequately described.

There are 2 notable differences between the draft specification and the descriptions in this thread.

The first difference is that I've added an "entry type" field so that an entry in the index area can be used to mark unusable areas of the disk (bad sectors), deleted entries and directories. The original plan was to use special values for the file length instead, but this method makes it easier to "undelete" (i.e. no need to try to guess what the original file length was).

I'll let you read about the second difference in the "compatibility" section of the specification. ::)

Tomorrow (Saturday) it is likely that my web server will be rebooted often for the purpose of tracking down a bug in my OS (it's the only computer I have that is effected by this bug). Therefore I have attached the draft specification to this message, and also put it on my web site at http://bcos.hopto.org/sfs.html.

The forums didn't like the ".html" file extension, so I've repaced the extension and called it "sfs.txt" instead of "sfs.html" - after downloading it from the attachment link below, just rename it...

Cheers,

Brendan

Posted: **Fri Jan 13, 2006 10:37 am**

Cool. Was going to compress all the suggestions in this post for a draft. Will read this and give feedback.

I like this project we have all agreed on,for once!! ;D

Posted: **Fri Jan 13, 2006 11:48 am**

Kemp wrote: Your talk of directory and file name limits made me think of something. Isn't it incredibly wasteful repeating the entire path in the name of every single file?

Yes.

But, if you think about it from the point of transferring files, it doesn't matter. The overhead will be filecount * avgdirlength => filecount for calculation example is 10000, avgdirlength is about 50 (guess), so the overhead is half a megabyte (very roughly) for a very full filesystem.

You gain very very easy lookup for given file names.

Of course, we could transform that into links to previous inode (file entry, whatever) where an inode would point to its parent directory. For implementation however, I think the second is easier for being simple.

Anybody want a vote on this?

------- on the topic of the draft

Nice... I like the way you can stuff a FAT12 short-file-name-only thing in the reserved sectors and boot block. You can then include the driver for common (most used) OSes so those people don't have to find a driver for it. Not sure you intended that, but I do consider it quite nice

Posted: **Fri Jan 13, 2006 12:41 pm**

Pype.Clicker wrote:Yes i have. And hopefully enough, there is ID3 tags that virtually any program will prefer to display compared to a filename. Same kind of things occur with pictures (your digicam will just call it 00105613.jpg, but it will have album name, shooting date, etc. in comments) ...

I agree in principle that using file metadata to describe each file is a better way to do it. However, that's not the way the (90% Windows) world works today. It's very difficult to put genies back into bottles, and trying to usually annoys end users ("What do you mean, I have to change my AudioGrabber settings and re-name my entire mp3 collection to just so I can copy it to my flash drive? I'll just use FAT, thanks...")

Posted: **Fri Jan 13, 2006 1:03 pm**

I'm not sure the 90 byte limit on dir+file name is enough. I'm doing a check on the linux systems at my school now to see what the average + max file name length is...

procfs + . + .. + sysfs + devfs kind of mess up my results, trying again...

files: 466106
Sizes: min 0, avg 57, max 201

That indicates that 90 is less than half... 57 would be average, so I'm guessing that you need more than 90 to be able to use it consistently...

I suggested earlier using a mechanism of chaining together a few entries not similar to FAT to do so. Any comments on that?

Posted: **Fri Jan 13, 2006 2:06 pm**

Nice... I like the way you can stuff a FAT12 short-file-name-only thing in the reserved sectors and boot block. You can then include the driver for common (most used) OSes so those people don't have to find a driver for it. Not sure you intended that, but I do consider it quite nice

This ability makes it possible to have a floppy disk or flash memory device (for example) with a small FAT file system that contains any instructions, drivers or files needed to access the Simple File System on the rest of the media.

Quoting from the spec already, lol.

Just one minor thing:

For example, for the file "foo/bar.txt" this field will contain the characters "foo/bar.txt" followed by a zero to mark the end of the string

This implies a character 0 rather than a 0 value.

Posted: **Fri Jan 13, 2006 2:07 pm**

Kemp wrote:
Nice... I like the way you can stuff a FAT12 short-file-name-only thing in the reserved sectors and boot block. You can then include the driver for common (most used) OSes so those people don't have to find a driver for it. Not sure you intended that, but I do consider it quite nice

This ability makes it possible to have a floppy disk or flash memory device (for example) with a small FAT file system that contains any instructions, drivers or files needed to access the Simple File System on the rest of the media.
Quoting from the spec already, lol.

Well... ok, I posted that when at the start of the spec, so that's not entirely fair. Should've edited the post though.

Posted: **Fri Jan 13, 2006 5:32 pm**

Also, one final thought, why is the last modified date signed? What are the chances of having a file that hasn't been modified since before 1970?

Posted: **Fri Jan 13, 2006 10:17 pm**

Hi,

Ok, it was late at night when I posted last (early in the morning actually), so I posted the draft and went to sleep without reading previous posts. This post is a quick catch-up of those post and posts made since then (in order of newest to oldest).

Kemp wrote:Also, one final thought, why is the last modified date signed? What are the chances of having a file that hasn't been modified since before 1970?

Part of the idea here is that a well designed OS could use the same format for "clock_t" and "time_t", and applications could use the same format for calandars and historical purposes.

The "signed 64 bit mS since <epoch>" format is capable of storing values for times that are (roughly) 292 million years before and after the epoch. If the value was unsigned, then it'd go for 584 million years after the epoch. A file modified before 1970 is much more likely than the file system being used for millions of years, so making the value signed makes better use of the most significant bit.

However, I have been thinking that "signed 64 bit 1/65536ths of a second" might be more practical - it gives much higher accuracy (about 15.25 uS) and would still go (roughly) 4.46 million years before and after the epoch. This would also make it possible to read an unsigned 32 bit value at "offset+2", which would give you seconds since 1970, and would work as long as the time stamp is after the year 1970 and before the year 2106.

Candy wrote:Nice... I like the way you can stuff a FAT12 short-file-name-only thing in the reserved sectors and boot block. You can then include the driver for common (most used) OSes so those people don't have to find a driver for it. Not sure you intended that, but I do consider it quite nice

Thanks - I thought it'd be nice too. Unfortunately I'm not too sure if it'd work on Microsoft OSs in practice (the inbuilt FAT driver might detect the FAT file system and not give the installable SFS code a chance to detect the SFS file system).

Candy wrote:I'm not sure the 90 byte limit on dir+file name is enough. I'm doing a check on the linux systems at my school now to see what the average + max file name length is...

procfs + . + .. + sysfs + devfs kind of mess up my results, trying again...

files: 466106
Sizes: min 0, avg 57, max 201

That indicates that 90 is less than half... 57 would be average, so I'm guessing that you need more than 90 to be able to use it consistently...

How often would you do "cd /; cp -r * /mnt/temp"? I'm guessing that "cd /home/Candy/mywork; cp -r * /mnt/temp" or "cd /usr/src/linux-2.6.14-gentoo-r5; cp -r * /mnt/temp" would be far more likely.

For fun, I got a list of files in the linux source code (/usr/src/linux-2.6.14-gentoo-r5). Out of 20966 files the longest was "include/asm-cris/arch-v32/hwregs/iop/asm/iop_fifo_out_extra_defs_asm.h" which is 60 characters. Of course for this I'd actually use "tar -z -cf linux-2.6.14-gentoo-r5 /mnt/temp/linux-2.6.14-gentoo.tz", which only needs 22 characters.

Despite this, it would be easy to make the "index area entries" 256 bytes long instead of 128 bytes long. That would make the maximum file name length 230 bytes.

[countined]

Posted: **Fri Jan 13, 2006 10:20 pm**

[countined]

Candy wrote:Of course, we could transform that into links to previous inode (file entry, whatever) where an inode would point to its parent directory. For implementation however, I think the second is easier for being simple.

When the file system is mounted, I'd scan through the "index area" and build a sorted linked list of free areas and an array of hashed file names. This array would allow for fast file searches, using "repne cmpsd" on the array rather than "strcmp()" on every file name.

For example, if the user does "cat foo/bar.txt" you'd convert "foo/bar.txt" into the hash, then do a "repne cmpsd" to find the index area entry, then load this index area entry and do one "strcmp()" on it to make sure the file name is correct (and then grab the "starting block number" and "file length").

If the file system used file names (without paths) and links to the parent, then you'd need to do at least 2 searches (one for the directory "foo" and the other for the file "bar.txt"), and the second search would give false matches more often (for e.g. if you're searching for "a/makefile" you'd find the entry for "b/makefile" and "c/makefile", and you'd need to look at the index entry itself to determine if the parent directory is correct). To make it worse, trying to find the file "a/b/c/d/e/bar.txt" would involve a minimum of 6 searches instead of one.

These file searches are very frequent (every time you open a file, delete a file, create a file, etc).

Kemp wrote:Your talk of directory and file name limits made me think of something. Isn't it incredibly wasteful repeating the entire path in the name of every single file?

It is a little wasteful, but if I was concerned about efficiency I'd look at the data area instead of the index area. The chance of the length of a file's data being a multiple of the block size is small (for e.g. a one byte file consumes one whole block). With a block size of 512 bytes, this works out to an average of 256 bytes wasted per file (twice as much as the entire index entry). By using "starting and ending byte numbers" instead of "starting and ending block numbers" this could save a lot of space (especially for many small files), but the effect on performance wouldn't be so nice...

Cheers,

Brendan

Posted: **Sat Jan 14, 2006 3:37 am**

Brendan wrote: However, I have been thinking that "signed 64 bit 1/65536ths of a second" might be more practical - it gives much higher accuracy (about 15.25 uS) and would still go (roughly) 4.46 million years before and after the epoch. This would also make it possible to read an unsigned 32 bit value at "offset+2", which would give you seconds since 1970, and would work as long as the time stamp is after the year 1970 and before the year 2106.

Very big pluspoint for my OS. I could bitshift the entire time into exact preciseness for my own format.

Thanks - I thought it'd be nice too. Unfortunately I'm not too sure if it'd work on Microsoft OSs in practice (the inbuilt FAT driver might detect the FAT file system and not give the installable SFS code a chance to detect the SFS file system).

I figured you could install a recogniser for that, or make something similar to "partitioning"... In the worst case, you can default-partition partitionable media into two bits, one FAT, one SFS. Just the floppy case that remains, although I don't consider that one important (but others could). I'm damn sure you can partition USB sticks.

Despite this, it would be easy to make the "index area entries" 256 bytes long instead of 128 bytes long. That would make the maximum file name length 230 bytes.

What about making it stretchable entries? You just have a first byte/word/dword that indicates how many 64byte blocks it is and it can be any size. The entry placement algorithm is then exactly identical to the file placement algorithm, so you can use the same implementation or knowledge/experience and you can have infinite length filenames (at least, stretchable lengths). My main point is, you don't get arbitrary failures "file path too long" such as were common in DOS (with a limit of iirc 64 or 80 something characters).

Brendan wrote: When the file system is mounted, I'd scan through the "index area" and build a sorted linked list of free areas and an array of hashed file names. This array would allow for fast file searches, using "repne cmpsd" on the array rather than "strcmp()" on every file name.

Think I'd go for an array of hashed filenames with trees below each hash entry with alphabetic sorting for speed, agree ont he linked list of free areas (but with indexes so atomic speed could be achieved - have to document that algorithm).

For example, if the user does "cat foo/bar.txt" you'd convert "foo/bar.txt" into the hash, then do a "repne cmpsd" to find the index area entry, then load this index area entry and do one "strcmp()" on it to make sure the file name is correct (and then grab the "starting block number" and "file length").

<snip> text on file searching

That's what I said, easier lookup of known filenames etc. It does limit the path length as opposed to the directory length, so I would then vote for a long path allowance. People test stuff by doing tests they normally wouldn't (what would happen if I make a document with 50 pages on my 4MB 486? yep - it crashed, what length filename is the limit etc.) and if it passes all those tests they assume it works in any case. If it doesn't they know limits and they dislike the system for any that doesn't impose those limits. So, if you could stretch the filename/path limit to a length that the file system would not be their problem, 100% good.

It is a little wasteful, but if I was concerned about efficiency I'd look at the data area instead of the index area. The chance of the length of a file's data being a multiple of the block size is small (for e.g. a one byte file consumes one whole block).

Assuming an even distribution, which it isn't. I'll run a test and report back. I'm betting that it's more or less evenly distributed but with a peak at non-prime numbers and a very large peak at the exact sector size, due to for example fixed size binary structures and memory mapped files.

It fits in one message!

Posted: **Sat Jan 14, 2006 8:45 am**

That's looking good thusfar Brendan! Nice job.

I have some thoughts to add:

1. It looks like it is a conscious decision to have only one date for a file. Perhaps it would be a good idea to have a create date as well? (I don't care for the access date which Windows also has, for example). That might be a piece of crucial information you want to give the receiver.
2.
For the directory entry type, all other fields in the index entry except for the entry name must be ignored when the entry is read and filled with zeros when the entry is being written
Shouldn't a directory have a (create, see point 1) modify date as well? I assume it can be renamed?
3.
For the deleted directory entry type and the delete file type, all other fields are the same as they would be for the directory type and the file type. This allows a file or directly to be marked as deleted by changing the entry type, and restored (or "un-deleted") by restoring the entry type.
Perhaps it would be wise to include guidelines/recommendations/algorithms in the spec to advice people that when a new file get's added or an existing file expanded the list of deleted files gets checked. If it intersects with a deleted file that entry in the index should be set to 0 (unused).

This overcomes the problem when un-deleting a file that you get garbage back. You can't do this at un-delete since you could've added/updated another file and undone that operation in the meantime. That would update the sectors but not necessarely indicate so in the index.

Assuming that most drivers can probably maintain the index (or at least a sector list) in memory this should be fairly easy to implement.

I wouldn't go as far as making it a requirement of the spec. Just a recommendation to make the experience for the end user as pleasant as possible.

Of course this doesn't apply to deleted directory entries. However we may add some guidelines/recommendations for those as well. Specifically about when creating now index entries what to do with deleted entries. Keep them? If so, for how long?

Any undelete algorithm should also make sure that the proper directorie entries get undeleted as well. An undelete action has some very similar impacts as my next point (4)!!
4. Keep in mind that renaming a directory is an "expensive" operation. Since you also need to rename all (including all sub directories) all files and directories under that directory.

This can cause a very dangerous problem I think. And that's one of concurrency. If I rename the directory entry and I haven't yet been able to rename all of the files / directories under it and another process / thread reads the file system it is corrupt (basically).

A solution could be to collect all blocks / index entries in memory
that are effected, update them there and write them out as
fast as possible. However, a "deadlock" could still happen.

Of course these are problems of the driver that implements this filesystem. But I strongly vote we address the issue in guidelines/recommendations/algorithms. Sample drivers should correctly implement this as well.

Such an implementation could have an internal flag for a directory rename with three states:

0 -> no rename in progress
1 -> rename in progress (in memory)
2 -> changes are being written to disk

A second rename or any other index change must be queued if the flags is anything else than zero. Any read from the index must also be queued when the flag is 2 (unless the complete index is also in memory and it can service the request).

It might be wise to have the driver implement undelete functionality as well (see my point 3). This to make sure it happens correctly and the updates don't intervere with other operations going on (same as directory rename)

I expect most of these points to be especially important when it is used on a harddisk or as a boot platform for the Operating System (as hinted at by the spec). Then there could be lots of reads and writes going on all the time.

The joys of filesystem design!

Posted: **Sat Jan 14, 2006 8:57 am**

As a side note another perhaps possible way to implement a directory rename would be to do this:

1. create a new directory entry with the new name

2. update all entries below the old directory entry so that it points to the new one

3. delete the old entry

This has the advantage that the driver code should be far less "complicated" to rename a directory. The downside is that if another process / thread is reading it could see files in the old directory before they are "moved".

That could lead to problems if an application opens a file, keeps the path, and then later wants to do some additional things with that path.

Thinking a bit more about these a rename might even need to block all access. Even if the changes are done in memory first. Otherwise another process could still get the old path.

So in my previous points reading should be queued with boths steps 1 and 2, I think.

Hmmmm

Posted: **Sun Jan 15, 2006 6:08 am**

Wasn't there a reply under my posts about single / multi-
threading?

Anyway, what about a volume label? I think Windows needs
it. Not 100% sure if it is really required though.

OSDev.org

Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System