Page 1 of 9
Universal File System
Posted: Tue Jan 10, 2006 11:49 pm
by Brendan
Hi,
Now that
Microsoft have patented the FAT file system, there is no simple file system that can be used for transferring data easily between computers, and OS developers who are using FAT
are screwed may be sued if Microsoft get bored.
It seems obvious that a common file format is needed to replace FAT. Such a file system should be extremely simple to implement and be licenced under a non-restrictive licence (e.g. BSD licence or public domain). File security isn't needed, performance doesn't matter much, and long file names (hopefully in UTF8) would be nice.
IMHO existing file systems (ext2, ext3, reiser, etc) are too complex for this - they support advanced features that aren't needed (file permissions, file owner/group, sparse files, journalling, etc).
Anyway, the idea would be to develop an extremely simple standard file system that all OS developers could use, then try to get someone to write Linux and Windows support for it, and hope that other manufacturers (flash memory, digital cameras, etc) will eventually adopt it.
Any ideas?
Cheers,
Brendan
Re:Universal File System
Posted: Wed Jan 11, 2006 2:40 am
by dushara
Sounds good. I'm happy to join in such an effort.
Re:Universal File System
Posted: Wed Jan 11, 2006 2:44 am
by Candy
Brendan wrote:
Any ideas?
Take our *FS project and strip it to the bare bones, with no multiple slice / section support (just one section), plain file support (leave out the timestamping etc, although you might want to add that). The scheme I published here a while ago can be adjusted for that goal easily. I'll write up a prop if you want me to, but it'll have to wait until the weekend of 21 january or possibly the weekend after that.
Re:Universal File System
Posted: Wed Jan 11, 2006 3:11 am
by kataklinger
Did Microsoft sued Linux developers for supporting NTFS? There's a lot of companies which can be sued by MS before they begin with hobby OSes.
I'll help anyway. Like Candy said we could use some well known FS and strip all advanced things.
Re:Universal File System
Posted: Wed Jan 11, 2006 3:12 am
by Solar
AFAIK, there are ext2fs drivers for Windows readily available. I also think that implementing an ext2fs driver for
YourOS shouldn't be too hard, with it being as well-documented as it is. (Easier in any way than designing your own FS, and drivers for Windows, Linux, *BSD and all the other OS out there to make it viable for data interchange...)
Just IMHO.
Re:Universal File System
Posted: Wed Jan 11, 2006 4:30 am
by Brendan
Hi,
Solar wrote:AFAIK, there are ext2fs drivers for Windows readily available. I also think that implementing an ext2fs driver for YourOS shouldn't be too hard, with it being as well-documented as it is. (Easier in any way than designing your own FS, and drivers for Windows, Linux, *BSD and all the other OS out there to make it viable for data interchange...)
I'm talking simple - so simple that a camel stuck in the desert with nothing to use but a dry stick could implement it, so simple that if you threw a handfull of little magnets at the fridge and rubbed a floppy drive head over it it'd boot into a small single tasking OS, so simple that, well you get the idea...
For example, the first sector would contain 5 values:
- - number of bytes per sector (8 bit)
- the total number of sectors (64 bit)
- the offset for first unused "index entry" (64 bit)
- the logical sector number for first used "data sector" (64 bit)
- the number of reserved sectors
The "reserved sectors" would be at the beginning, and used for compatibility (partition table, BIOS parameter block, whatever) and/or boot sector/s.
Immediately following the reserved sectors would be the "index" portion of it. The index portion contains any number of 256 byte entries, with the following format:
- - a 64 bit logical starting sector for the file (zero if entry not used)
- a 64 bit logical ending sector for the file
- a 64 bit file length (in bytes)
- the zero terminated file name in UTF8 (consuming the rest of the entry)
There are no directories or anything - instead these are built into the file name. For example, the file name "foo/bar.txt" would be stored directly as the file name (instead of one entry containing "foo" and another entry containing "bar.txt"). All file names that begin with "foo/" are therefore considered part of the same directory.
The "index area" is followed by unused/free space. File data is stored from the end of the disk to the end of this unused/free space.
This gives the following limits:
- - file names must be 231 characters or less (including any parent directory names)
- the index area is limited to 4 GB in size (or 16777216 files)
- the data area is limited to "sector_size * 16777216 TB" (less whatever is used by the index area)
Files can never be fragmented, but free space would become fragmented. This would only occur when files are deleted or data is appended to them. The OS can deal with free space fragmentation by ignoring it (wasting space) or by doing time consuming copying. This shouldn't matter as it's only meant to be used for transferring files between systems, rather than as a normal file system.
Well, that's a start maybe - I can't get it much simpler, and at least it gives people some idea of how "advanced" I think the file system should be...
Cheers,
Brendan
Re:Universal File System
Posted: Wed Jan 11, 2006 4:45 am
by Candy
I can only see a few very minor things:
- Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
- No timestamps. If the only place, transferrable mediums are places where time stamps are a very good thing. Helped my girlfriend figure out when a given document was written using them last night.
- Beforementioned space fragmentation. If you intend it to be used as semi-permanent storage for for instance photo's, it's a problem. The camera would have to defragment or use memory inefficiently. Also, those long file names are very uncommon iirc, you might just as well shrink them to 64byte entries leaving 40 bytes for the file name. Even that allows pretty long names. Although, second thought, if you include directory prefix, make them 128 byte, still leaving 104 bytes for it.
Of course, concerning FAT I'm going to be the arse that just implements it and ignores the rest. Yes, that's a liability. No, I don't care.
Re:Universal File System
Posted: Wed Jan 11, 2006 5:03 am
by JoeKayzA
Candy wrote:
- Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
My proposal:
Create an index entry with start_sector, end_sector and file_length = 0, then use a filename with a trailing slash, like "emptydir/". In the same way you could also implement empty files, just strip the trailing slash then (although empty files are pretty pointless without timestamps, permissions or extended attributes...).
There is a set of rules though that we'll have to make up: Is an empty directory entry mandatory, even when there are some files in the directory? If not, and the user deletes all files from a given one, you'll explicitly have to create an empty directory, because he/she hasn't yet deleted it.
I like the idea of encoding the file path directly in its name, though. I didn't think that you can make it _that_ simple.
cheers Joe
Re:Universal File System
Posted: Wed Jan 11, 2006 5:21 am
by Pype.Clicker
Brendan wrote:
For example, the first sector would contain 5 values:
- - number of bytes per sector (8 bit)
- the total number of sectors (64 bit)
- the offset for first unused "index entry" (64 bit)
- the logical sector number for first used "data sector" (64 bit)
- the number of reserved sectors
bytes-per-sector on 8 bits ? you know neither 512 nor 4096 fits 8 bit, so i guess you forgot to mention by what factor that value should be multiplied.
Immediately following the reserved sectors would be the "index" portion of it. The index portion contains any number of 256 byte entries, with the following format:
I'd be tempted to allow packing of two 128-byte index entries in a 256-byte index entry or two 64 bytes entries in one 128-byte entry, etc.
- - a 64 bit logical starting sector for the file (zero if entry not used)
- a 64 bit logical ending sector for the file
- a 64 bit file length (in bytes)
- the zero terminated file name in UTF8 (consuming the rest of the entry)
There are no directories or anything - instead these are built into the file name.
that means if i *do* want to list boot/*, i still have to scan all the files ... well, i guess that's not a big problem ...
Files can never be fragmented, but free space would become fragmented. This would only occur when files are deleted or data is appended to them. The OS can deal with free space fragmentation by ignoring it (wasting space) or by doing time consuming copying. This shouldn't matter as it's only meant to be used for transferring files between systems, rather than as a normal file system.
That's a major problem, imho. I'd be more happy with something stating "index entries points towards the first and the last 'extents' of the file". Extents are runs of contiguous sectors that are part of the file. If needed (for fragmented stuff), the first sector of an extent could be used to list more extents of the file.
Of course, requiring 512 bytes for a single-extent file will sound excessive waste of space, which suggest to group all the 'extents' information in single area of the disk.
Re:Universal File System
Posted: Wed Jan 11, 2006 5:24 am
by Brendan
Hi,
Candy wrote:I can only see a few very minor things:
- Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
I agree with JoeKayzA here - zero length files with a trailing slash would work fine, and we could make seperate directory entries a requirement to avoid the need to check if a directory becomes empty when a file is deleted.
In this case we'd need to check if the directory entry exists before creating a file in a directory, but this is normal for any file system. It'd use up some extra space in the "index area", but it does make much more sense - thanks for this JoeKayzA
Candy wrote:- No timestamps. If the only place, transferrable mediums are places where time stamps are a very good thing. Helped my girlfriend figure out when a given document was written using them last night.
Timestamps would be easy to add, but deciding the format for them could be a little tricky - I prefer "signed 64 bit mS since 1/1/2000", but other OSs do it differently.
Candy wrote:- Beforementioned space fragmentation. If you intend it to be used as semi-permanent storage for for instance photo's, it's a problem. The camera would have to defragment or use memory inefficiently.
Devices that use flash memory could copy data relatively quickly (as compared to floppies for example). The easiest way would be to shift all data when a file is deleted, but more complex methods could be implemented instead. If all files are the same size (likely for a digital camera) then a new file/photo would fit perfectly in the hole left by deleting a file/photo.
Candy wrote:Also, those long file names are very uncommon iirc, you might just as well shrink them to 64byte entries leaving 40 bytes for the file name. Even that allows pretty long names. Although, second thought, if you include directory prefix, make them 128 byte, still leaving 104 bytes for it.
I agree - it'd be more efficient, and 104 bytes (or 103 characters) is still plenty long enough.
The other problem I'm seeing is appending data to files. It'd be better if the data area was at the start of the file system (just after the reserved sectors) growing upwards, with the index area at the end of the file system growing downwards.
Candy wrote:Of course, concerning FAT I'm going to be the arse that just implements it and ignores the rest. Yes, that's a liability. No, I don't care.
I can afford to wait. My boot code uses raw disk sectors without any file system, and it'll take a while before I need to worry about sharing files with other OSs.
Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
Cheers,
Brendan
Re:Universal File System
Posted: Wed Jan 11, 2006 5:35 am
by Candy
Brendan wrote:
I agree with JoeKayzA here - zero length files with a trailing slash would work fine, and we could make seperate directory entries a requirement to avoid the need to check if a directory becomes empty when a file is deleted.
In this case we'd need to check if the directory entry exists before creating a file in a directory, but this is normal for any file system. It'd use up some extra space in the "index area", but it does make much more sense - thanks for this JoeKayzA
Agreed upon (as in, I vote in favor).
Timestamps would be easy to add, but deciding the format for them could be a little tricky - I prefer "signed 64 bit mS since 1/1/2000", but other OSs do it differently.
It wouldn't matter really, as long as you can determine seconds from them it's ok. milliseconds, seconds, 100-nanosecond-intervals, whatever takes your fancy. For terms of getting people to use it as something they know, my vote is on unix timestamps (seconds since 1/1/1970) since most people at the level of filesystems know them and recognise them.
Devices that use flash memory could copy data relatively quickly (as compared to floppies for example). The easiest way would be to shift all data when a file is deleted, but more complex methods could be implemented instead. If all files are the same size (likely for a digital camera) then a new file/photo would fit perfectly in the hole left by deleting a file/photo.
Devices on flash have limited writes for retries, since flash memory goes bad relatively quickly (as compared to floppies which just go bad anyway or harddisks which can be written until the mounting thing fails or a number of years). They need fragmentation, although for a start I can consider it not a very important topic. For plain exchange, it suffices.
I agree - it'd be more efficient, and 104 bytes (or 103 characters) is still plenty long enough.
What about making them in 64-byte intervals with an 8-bit "size bit" at the front? indicating how many 64-byte blocks were used? Microsoft does this as well in their index.dat files (and I can't recall a patent on that ...
).
The other problem I'm seeing is appending data to files. It'd be better if the data area was at the start of the file system (just after the reserved sectors) growing upwards, with the index area at the end of the file system growing downwards.
Would only help a single file. Pathological test case: program having two files open, writing a byte to each alternatingly, for 2GB each.
Fragmentation is good.
I can afford to wait. My boot code uses raw disk sectors without any file system, and it'll take a while before I need to worry about sharing files with other OSs.
Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
So very much agreed. Busy with linux device drivers atm so I think it'd be no problem figuring out how to make Linux recognise it.
I'll try to implement it in atlantisos as soon as I get the need for *any* fs driver.
Re:Universal File System
Posted: Wed Jan 11, 2006 5:52 am
by Brendan
Hi,
Pype.Clicker wrote:bytes-per-sector on 8 bits ? you know neither 512 nor 4096 fits 8 bit, so i guess you forgot to mention by what factor that value should be multiplied.
You're right - I was thinking of "2[sup](N + 7)[/sup]", so N = 0 would be 128 bytes per sector, N = 1 would be 256 bytes per sector, N = 2 would be 512 bytes per sector and N = 255 would be incredibly huge. This is the same format used by the floppy drive controller, and the same format used by DOS's "BIOS parameter block".
Pype.Clicker wrote:I'd be tempted to allow packing of two 128-byte index entries in a 256-byte index entry or two 64 bytes entries in one 128-byte entry, etc.
I'm not sure that the additional complexity would be worth the space savings and/or extra file name length. Entries that are always the same length are easier for camels to scratch into the sand with their sticks
.
Pype.Clicker wrote:Files can never be fragmented, but free space would become fragmented. This would only occur when files are deleted or data is appended to them. The OS can deal with free space fragmentation by ignoring it (wasting space) or by doing time consuming copying. This shouldn't matter as it's only meant to be used for transferring files between systems, rather than as a normal file system.
That's a major problem, imho. I'd be more happy with something stating "index entries points towards the first and the last 'extents' of the file". Extents are runs of contiguous sectors that are part of the file. If needed (for fragmented stuff), the first sector of an extent could be used to list more extents of the file.
I wouldn't consider it a major problem, but it is a hassle. Of course it is only really meant for file transfers, and isn't designed for high performance.
For example, consider how I use floppies now - I format the floppy, dump files on it, shift it to another computer and copy the files to the other computer's file system. I don't actually modify or change the files on the floppy, but do occasionally add some extra files.
For this file system, the code to add files can be very simple and defragging free space can be optional. The file system code could be more complex (i.e. defragging free space on the fly or to make room for more files when needed), but doesn't need to be. Even if the user (occasionally) has to copy the files, format the disk and then copy the files back, then it's still not much of a problem.
IMHO if we were to add the complexity of managing extents, then it'd probably make more sense to implement "ext2"...
Cheers,
Brendan
Re:Universal File System
Posted: Wed Jan 11, 2006 6:49 am
by Brendan
Hi,
Candy wrote:It wouldn't matter really, as long as you can determine seconds from them it's ok. milliseconds, seconds, 100-nanosecond-intervals, whatever takes your fancy. For terms of getting people to use it as something they know, my vote is on unix timestamps (seconds since 1/1/1970) since most people at the level of filesystems know them and recognise them.
I have a problem with "seconds since", as it can be too inaccurate for some things - I had a problem with this for my "System Build Utility", which resulted in reduced efficiency (same html files being generated multiple times) as the problem couldn't be resolved on Gentoo/ReiserFS.
Using "signed 64 bit mS since 1/1/1970" would be a good compromise though - for my OS it'd mean adding a constant, and for *nix it'd mean dividing by a constant - simple in both cases.
Candy wrote:Devices on flash have limited writes for retries, since flash memory goes bad relatively quickly (as compared to floppies which just go bad anyway or harddisks which can be written until the mounting thing fails or a number of years). They need fragmentation, although for a start I can consider it not a very important topic. For plain exchange, it suffices.
It would be possible to write complex file system code to manage free space fragmentation more efficiently (and do other things, like caching and sorting the "index area", or deliberately adding free space at the end of some files to make appending data easier/faster). This should all be optional though - someone who wants to quickly add support for any file system could skip it.
Candy wrote:Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
So very much agreed. Busy with linux device drivers atm so I think it'd be no problem figuring out how to make Linux recognise it.
I'll try to implement it in atlantisos as soon as I get the need for *any* fs driver.
I'll implement it as the very first file system my OS supports (when I get to the need to any FS) - it'd be the easiest way to find design problems in my VFS code.
In the meantime, I'll wait a week or so for any other suggestions or improvements and then create a formal draft specification for it. After that I can write a generic "boot loader" for it - something that will find a file named "boot.bin", load it and jump to it.
Thanks,
Brendan
Re:Universal File System
Posted: Wed Jan 11, 2006 7:09 am
by Candy
Brendan wrote:
I have a problem with "seconds since", as it can be too inaccurate for some things - I had a problem with this for my "System Build Utility", which resulted in reduced efficiency (same html files being generated multiple times) as the problem couldn't be resolved on Gentoo/ReiserFS.
Using "signed 64 bit mS since 1/1/1970" would be a good compromise though - for my OS it'd mean adding a constant, and for *nix it'd mean dividing by a constant - simple in both cases.
I was going for 1/(2^64)th of a second as unit for my OS, with crudeness as rdtsc will allow me to do, especially for these cases. Although I can't see any such problem with file transfers, as they really don't have any need for such precision. Hence my vote for seconds, since people usually (I do for instance) don't care about subsecond differences. You can't properly switch any medium within a second.
Brendan wrote:
It would be possible to write complex file system code to manage free space fragmentation more efficiently (and do other things, like caching and sorting the "index area", or deliberately adding free space at the end of some files to make appending data easier/faster). This should all be optional though - someone who wants to quickly add support for any file system could skip it.
What if we define two levels of support, with level 1 being default with only static allocation (no fragmentation) and level 2 allowing fragmentation? That way you could promote level 1 as common exchange format (and I'll slave at making all drivers as fast as possible, so they offer an advantage to the common user) and level 2 as common storage format (for more advanced things such as your personal USB stick section etc).
Brendan wrote:
In the meantime, I'll wait a week or so for any other suggestions or improvements and then create a formal draft specification for it. After that I can write a generic "boot loader" for it - something that will find a file named "boot.bin", load it and jump to it.
Why wait a full week? If everybody (or at least some amount) agree to the current idea, fix it up and publish the draft, we'll comment again and it'll be done before 5PM.
Re:Universal File System
Posted: Wed Jan 11, 2006 7:52 am
by Pype.Clicker
Brendan wrote:
I have a problem with "seconds since", as it can be too inaccurate for some things - I had a problem with this for my "System Build Utility", which resulted in reduced efficiency (same html files being generated multiple times) as the problem couldn't be resolved on Gentoo/ReiserFS.
i'd say that a matter of the OS or the VFS to cope with such inconsistence even if the filesystem itself cannot. I mean, if you need <1s precision on the building date, you could have your VFS caching the ultraprecise timing (converted into nanoseconds or whatever) of the latest modified files so that you can resolve race conditions. You might also have your build system aware of the filesystem restriction and have it 'post-dating' or 'ante-dating' the files so that everything remains consistent.
... now, i suppose you won't be using UFS for precision timing anyway, will you ?