Universal File System
Universal File System
Hi,
Now that Microsoft have patented the FAT file system, there is no simple file system that can be used for transferring data easily between computers, and OS developers who are using FAT are screwed may be sued if Microsoft get bored.
It seems obvious that a common file format is needed to replace FAT. Such a file system should be extremely simple to implement and be licenced under a non-restrictive licence (e.g. BSD licence or public domain). File security isn't needed, performance doesn't matter much, and long file names (hopefully in UTF8) would be nice.
IMHO existing file systems (ext2, ext3, reiser, etc) are too complex for this - they support advanced features that aren't needed (file permissions, file owner/group, sparse files, journalling, etc).
Anyway, the idea would be to develop an extremely simple standard file system that all OS developers could use, then try to get someone to write Linux and Windows support for it, and hope that other manufacturers (flash memory, digital cameras, etc) will eventually adopt it.
Any ideas?
Cheers,
Brendan
Now that Microsoft have patented the FAT file system, there is no simple file system that can be used for transferring data easily between computers, and OS developers who are using FAT are screwed may be sued if Microsoft get bored.
It seems obvious that a common file format is needed to replace FAT. Such a file system should be extremely simple to implement and be licenced under a non-restrictive licence (e.g. BSD licence or public domain). File security isn't needed, performance doesn't matter much, and long file names (hopefully in UTF8) would be nice.
IMHO existing file systems (ext2, ext3, reiser, etc) are too complex for this - they support advanced features that aren't needed (file permissions, file owner/group, sparse files, journalling, etc).
Anyway, the idea would be to develop an extremely simple standard file system that all OS developers could use, then try to get someone to write Linux and Windows support for it, and hope that other manufacturers (flash memory, digital cameras, etc) will eventually adopt it.
Any ideas?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Universal File System
Take our *FS project and strip it to the bare bones, with no multiple slice / section support (just one section), plain file support (leave out the timestamping etc, although you might want to add that). The scheme I published here a while ago can be adjusted for that goal easily. I'll write up a prop if you want me to, but it'll have to wait until the weekend of 21 january or possibly the weekend after that.Brendan wrote: Any ideas?
- kataklinger
- Member
- Posts: 381
- Joined: Fri Nov 04, 2005 12:00 am
- Location: Serbia
Re:Universal File System
Did Microsoft sued Linux developers for supporting NTFS? There's a lot of companies which can be sued by MS before they begin with hobby OSes.
I'll help anyway. Like Candy said we could use some well known FS and strip all advanced things.
I'll help anyway. Like Candy said we could use some well known FS and strip all advanced things.
Re:Universal File System
AFAIK, there are ext2fs drivers for Windows readily available. I also think that implementing an ext2fs driver for YourOS shouldn't be too hard, with it being as well-documented as it is. (Easier in any way than designing your own FS, and drivers for Windows, Linux, *BSD and all the other OS out there to make it viable for data interchange...)Brendan wrote: Now that Microsoft have patented the FAT file system, there is no simple file system that can be used for transferring data easily between computers...
Just IMHO.
Every good solution is obvious once you've found it.
Re:Universal File System
Hi,
For example, the first sector would contain 5 values:
Immediately following the reserved sectors would be the "index" portion of it. The index portion contains any number of 256 byte entries, with the following format:
The "index area" is followed by unused/free space. File data is stored from the end of the disk to the end of this unused/free space.
This gives the following limits:
Well, that's a start maybe - I can't get it much simpler, and at least it gives people some idea of how "advanced" I think the file system should be...
Cheers,
Brendan
I'm talking simple - so simple that a camel stuck in the desert with nothing to use but a dry stick could implement it, so simple that if you threw a handfull of little magnets at the fridge and rubbed a floppy drive head over it it'd boot into a small single tasking OS, so simple that, well you get the idea...Solar wrote:AFAIK, there are ext2fs drivers for Windows readily available. I also think that implementing an ext2fs driver for YourOS shouldn't be too hard, with it being as well-documented as it is. (Easier in any way than designing your own FS, and drivers for Windows, Linux, *BSD and all the other OS out there to make it viable for data interchange...)
For example, the first sector would contain 5 values:
- - number of bytes per sector (8 bit)
- the total number of sectors (64 bit)
- the offset for first unused "index entry" (64 bit)
- the logical sector number for first used "data sector" (64 bit)
- the number of reserved sectors
Immediately following the reserved sectors would be the "index" portion of it. The index portion contains any number of 256 byte entries, with the following format:
- - a 64 bit logical starting sector for the file (zero if entry not used)
- a 64 bit logical ending sector for the file
- a 64 bit file length (in bytes)
- the zero terminated file name in UTF8 (consuming the rest of the entry)
The "index area" is followed by unused/free space. File data is stored from the end of the disk to the end of this unused/free space.
This gives the following limits:
- - file names must be 231 characters or less (including any parent directory names)
- the index area is limited to 4 GB in size (or 16777216 files)
- the data area is limited to "sector_size * 16777216 TB" (less whatever is used by the index area)
Well, that's a start maybe - I can't get it much simpler, and at least it gives people some idea of how "advanced" I think the file system should be...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Universal File System
I can only see a few very minor things:
- Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
- No timestamps. If the only place, transferrable mediums are places where time stamps are a very good thing. Helped my girlfriend figure out when a given document was written using them last night.
- Beforementioned space fragmentation. If you intend it to be used as semi-permanent storage for for instance photo's, it's a problem. The camera would have to defragment or use memory inefficiently. Also, those long file names are very uncommon iirc, you might just as well shrink them to 64byte entries leaving 40 bytes for the file name. Even that allows pretty long names. Although, second thought, if you include directory prefix, make them 128 byte, still leaving 104 bytes for it.
Of course, concerning FAT I'm going to be the arse that just implements it and ignores the rest. Yes, that's a liability. No, I don't care.
- Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
- No timestamps. If the only place, transferrable mediums are places where time stamps are a very good thing. Helped my girlfriend figure out when a given document was written using them last night.
- Beforementioned space fragmentation. If you intend it to be used as semi-permanent storage for for instance photo's, it's a problem. The camera would have to defragment or use memory inefficiently. Also, those long file names are very uncommon iirc, you might just as well shrink them to 64byte entries leaving 40 bytes for the file name. Even that allows pretty long names. Although, second thought, if you include directory prefix, make them 128 byte, still leaving 104 bytes for it.
Of course, concerning FAT I'm going to be the arse that just implements it and ignores the rest. Yes, that's a liability. No, I don't care.
Re:Universal File System
My proposal:Candy wrote: - Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
Create an index entry with start_sector, end_sector and file_length = 0, then use a filename with a trailing slash, like "emptydir/". In the same way you could also implement empty files, just strip the trailing slash then (although empty files are pretty pointless without timestamps, permissions or extended attributes...).
There is a set of rules though that we'll have to make up: Is an empty directory entry mandatory, even when there are some files in the directory? If not, and the user deletes all files from a given one, you'll explicitly have to create an empty directory, because he/she hasn't yet deleted it.
I like the idea of encoding the file path directly in its name, though. I didn't think that you can make it _that_ simple.
cheers Joe
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Universal File System
bytes-per-sector on 8 bits ? you know neither 512 nor 4096 fits 8 bit, so i guess you forgot to mention by what factor that value should be multiplied.Brendan wrote: For example, the first sector would contain 5 values:
- - number of bytes per sector (8 bit)
- the total number of sectors (64 bit)
- the offset for first unused "index entry" (64 bit)
- the logical sector number for first used "data sector" (64 bit)
- the number of reserved sectors
I'd be tempted to allow packing of two 128-byte index entries in a 256-byte index entry or two 64 bytes entries in one 128-byte entry, etc.Immediately following the reserved sectors would be the "index" portion of it. The index portion contains any number of 256 byte entries, with the following format:
- - a 64 bit logical starting sector for the file (zero if entry not used)
- a 64 bit logical ending sector for the file
- a 64 bit file length (in bytes)
- the zero terminated file name in UTF8 (consuming the rest of the entry)
that means if i *do* want to list boot/*, i still have to scan all the files ... well, i guess that's not a big problem ...There are no directories or anything - instead these are built into the file name.
That's a major problem, imho. I'd be more happy with something stating "index entries points towards the first and the last 'extents' of the file". Extents are runs of contiguous sectors that are part of the file. If needed (for fragmented stuff), the first sector of an extent could be used to list more extents of the file.Files can never be fragmented, but free space would become fragmented. This would only occur when files are deleted or data is appended to them. The OS can deal with free space fragmentation by ignoring it (wasting space) or by doing time consuming copying. This shouldn't matter as it's only meant to be used for transferring files between systems, rather than as a normal file system.
Of course, requiring 512 bytes for a single-extent file will sound excessive waste of space, which suggest to group all the 'extents' information in single area of the disk.
Re:Universal File System
Hi,
In this case we'd need to check if the directory entry exists before creating a file in a directory, but this is normal for any file system. It'd use up some extra space in the "index area", but it does make much more sense - thanks for this JoeKayzA
The other problem I'm seeing is appending data to files. It'd be better if the data area was at the start of the file system (just after the reserved sectors) growing upwards, with the index area at the end of the file system growing downwards.
Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
Cheers,
Brendan
I agree with JoeKayzA here - zero length files with a trailing slash would work fine, and we could make seperate directory entries a requirement to avoid the need to check if a directory becomes empty when a file is deleted.Candy wrote:I can only see a few very minor things:
- Empty directories can't exist, so how do you create a directory and a file in it, if the intermediate state doesn't exist?
In this case we'd need to check if the directory entry exists before creating a file in a directory, but this is normal for any file system. It'd use up some extra space in the "index area", but it does make much more sense - thanks for this JoeKayzA
Timestamps would be easy to add, but deciding the format for them could be a little tricky - I prefer "signed 64 bit mS since 1/1/2000", but other OSs do it differently.Candy wrote:- No timestamps. If the only place, transferrable mediums are places where time stamps are a very good thing. Helped my girlfriend figure out when a given document was written using them last night.
Devices that use flash memory could copy data relatively quickly (as compared to floppies for example). The easiest way would be to shift all data when a file is deleted, but more complex methods could be implemented instead. If all files are the same size (likely for a digital camera) then a new file/photo would fit perfectly in the hole left by deleting a file/photo.Candy wrote:- Beforementioned space fragmentation. If you intend it to be used as semi-permanent storage for for instance photo's, it's a problem. The camera would have to defragment or use memory inefficiently.
I agree - it'd be more efficient, and 104 bytes (or 103 characters) is still plenty long enough.Candy wrote:Also, those long file names are very uncommon iirc, you might just as well shrink them to 64byte entries leaving 40 bytes for the file name. Even that allows pretty long names. Although, second thought, if you include directory prefix, make them 128 byte, still leaving 104 bytes for it.
The other problem I'm seeing is appending data to files. It'd be better if the data area was at the start of the file system (just after the reserved sectors) growing upwards, with the index area at the end of the file system growing downwards.
I can afford to wait. My boot code uses raw disk sectors without any file system, and it'll take a while before I need to worry about sharing files with other OSs.Candy wrote:Of course, concerning FAT I'm going to be the arse that just implements it and ignores the rest. Yes, that's a liability. No, I don't care.
Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Universal File System
Agreed upon (as in, I vote in favor).Brendan wrote: I agree with JoeKayzA here - zero length files with a trailing slash would work fine, and we could make seperate directory entries a requirement to avoid the need to check if a directory becomes empty when a file is deleted.
In this case we'd need to check if the directory entry exists before creating a file in a directory, but this is normal for any file system. It'd use up some extra space in the "index area", but it does make much more sense - thanks for this JoeKayzA
It wouldn't matter really, as long as you can determine seconds from them it's ok. milliseconds, seconds, 100-nanosecond-intervals, whatever takes your fancy. For terms of getting people to use it as something they know, my vote is on unix timestamps (seconds since 1/1/1970) since most people at the level of filesystems know them and recognise them.Timestamps would be easy to add, but deciding the format for them could be a little tricky - I prefer "signed 64 bit mS since 1/1/2000", but other OSs do it differently.
Devices on flash have limited writes for retries, since flash memory goes bad relatively quickly (as compared to floppies which just go bad anyway or harddisks which can be written until the mounting thing fails or a number of years). They need fragmentation, although for a start I can consider it not a very important topic. For plain exchange, it suffices.Devices that use flash memory could copy data relatively quickly (as compared to floppies for example). The easiest way would be to shift all data when a file is deleted, but more complex methods could be implemented instead. If all files are the same size (likely for a digital camera) then a new file/photo would fit perfectly in the hole left by deleting a file/photo.
What about making them in 64-byte intervals with an 8-bit "size bit" at the front? indicating how many 64-byte blocks were used? Microsoft does this as well in their index.dat files (and I can't recall a patent on that ... ).I agree - it'd be more efficient, and 104 bytes (or 103 characters) is still plenty long enough.
Would only help a single file. Pathological test case: program having two files open, writing a byte to each alternatingly, for 2GB each.The other problem I'm seeing is appending data to files. It'd be better if the data area was at the start of the file system (just after the reserved sectors) growing upwards, with the index area at the end of the file system growing downwards.
Fragmentation is good.
So very much agreed. Busy with linux device drivers atm so I think it'd be no problem figuring out how to make Linux recognise it.I can afford to wait. My boot code uses raw disk sectors without any file system, and it'll take a while before I need to worry about sharing files with other OSs.
Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
I'll try to implement it in atlantisos as soon as I get the need for *any* fs driver.
Re:Universal File System
Hi,
For example, consider how I use floppies now - I format the floppy, dump files on it, shift it to another computer and copy the files to the other computer's file system. I don't actually modify or change the files on the floppy, but do occasionally add some extra files.
For this file system, the code to add files can be very simple and defragging free space can be optional. The file system code could be more complex (i.e. defragging free space on the fly or to make room for more files when needed), but doesn't need to be. Even if the user (occasionally) has to copy the files, format the disk and then copy the files back, then it's still not much of a problem.
IMHO if we were to add the complexity of managing extents, then it'd probably make more sense to implement "ext2"...
Cheers,
Brendan
You're right - I was thinking of "2[sup](N + 7)[/sup]", so N = 0 would be 128 bytes per sector, N = 1 would be 256 bytes per sector, N = 2 would be 512 bytes per sector and N = 255 would be incredibly huge. This is the same format used by the floppy drive controller, and the same format used by DOS's "BIOS parameter block".Pype.Clicker wrote:bytes-per-sector on 8 bits ? you know neither 512 nor 4096 fits 8 bit, so i guess you forgot to mention by what factor that value should be multiplied.
I'm not sure that the additional complexity would be worth the space savings and/or extra file name length. Entries that are always the same length are easier for camels to scratch into the sand with their sticks .Pype.Clicker wrote:I'd be tempted to allow packing of two 128-byte index entries in a 256-byte index entry or two 64 bytes entries in one 128-byte entry, etc.
I wouldn't consider it a major problem, but it is a hassle. Of course it is only really meant for file transfers, and isn't designed for high performance.Pype.Clicker wrote:That's a major problem, imho. I'd be more happy with something stating "index entries points towards the first and the last 'extents' of the file". Extents are runs of contiguous sectors that are part of the file. If needed (for fragmented stuff), the first sector of an extent could be used to list more extents of the file.Files can never be fragmented, but free space would become fragmented. This would only occur when files are deleted or data is appended to them. The OS can deal with free space fragmentation by ignoring it (wasting space) or by doing time consuming copying. This shouldn't matter as it's only meant to be used for transferring files between systems, rather than as a normal file system.
For example, consider how I use floppies now - I format the floppy, dump files on it, shift it to another computer and copy the files to the other computer's file system. I don't actually modify or change the files on the floppy, but do occasionally add some extra files.
For this file system, the code to add files can be very simple and defragging free space can be optional. The file system code could be more complex (i.e. defragging free space on the fly or to make room for more files when needed), but doesn't need to be. Even if the user (occasionally) has to copy the files, format the disk and then copy the files back, then it's still not much of a problem.
IMHO if we were to add the complexity of managing extents, then it'd probably make more sense to implement "ext2"...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Universal File System
Hi,
Using "signed 64 bit mS since 1/1/1970" would be a good compromise though - for my OS it'd mean adding a constant, and for *nix it'd mean dividing by a constant - simple in both cases.
In the meantime, I'll wait a week or so for any other suggestions or improvements and then create a formal draft specification for it. After that I can write a generic "boot loader" for it - something that will find a file named "boot.bin", load it and jump to it.
Thanks,
Brendan
I have a problem with "seconds since", as it can be too inaccurate for some things - I had a problem with this for my "System Build Utility", which resulted in reduced efficiency (same html files being generated multiple times) as the problem couldn't be resolved on Gentoo/ReiserFS.Candy wrote:It wouldn't matter really, as long as you can determine seconds from them it's ok. milliseconds, seconds, 100-nanosecond-intervals, whatever takes your fancy. For terms of getting people to use it as something they know, my vote is on unix timestamps (seconds since 1/1/1970) since most people at the level of filesystems know them and recognise them.
Using "signed 64 bit mS since 1/1/1970" would be a good compromise though - for my OS it'd mean adding a constant, and for *nix it'd mean dividing by a constant - simple in both cases.
It would be possible to write complex file system code to manage free space fragmentation more efficiently (and do other things, like caching and sorting the "index area", or deliberately adding free space at the end of some files to make appending data easier/faster). This should all be optional though - someone who wants to quickly add support for any file system could skip it.Candy wrote:Devices on flash have limited writes for retries, since flash memory goes bad relatively quickly (as compared to floppies which just go bad anyway or harddisks which can be written until the mounting thing fails or a number of years). They need fragmentation, although for a start I can consider it not a very important topic. For plain exchange, it suffices.
I'll implement it as the very first file system my OS supports (when I get to the need to any FS) - it'd be the easiest way to find design problems in my VFS code.Candy wrote:So very much agreed. Busy with linux device drivers atm so I think it'd be no problem figuring out how to make Linux recognise it.Other people aren't as fortunate, and IMHO it'd be nice if the world stopped relying on Microsoft's proprietory formats...
I'll try to implement it in atlantisos as soon as I get the need for *any* fs driver.
In the meantime, I'll wait a week or so for any other suggestions or improvements and then create a formal draft specification for it. After that I can write a generic "boot loader" for it - something that will find a file named "boot.bin", load it and jump to it.
Thanks,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Universal File System
I was going for 1/(2^64)th of a second as unit for my OS, with crudeness as rdtsc will allow me to do, especially for these cases. Although I can't see any such problem with file transfers, as they really don't have any need for such precision. Hence my vote for seconds, since people usually (I do for instance) don't care about subsecond differences. You can't properly switch any medium within a second.Brendan wrote: I have a problem with "seconds since", as it can be too inaccurate for some things - I had a problem with this for my "System Build Utility", which resulted in reduced efficiency (same html files being generated multiple times) as the problem couldn't be resolved on Gentoo/ReiserFS.
Using "signed 64 bit mS since 1/1/1970" would be a good compromise though - for my OS it'd mean adding a constant, and for *nix it'd mean dividing by a constant - simple in both cases.
What if we define two levels of support, with level 1 being default with only static allocation (no fragmentation) and level 2 allowing fragmentation? That way you could promote level 1 as common exchange format (and I'll slave at making all drivers as fast as possible, so they offer an advantage to the common user) and level 2 as common storage format (for more advanced things such as your personal USB stick section etc).Brendan wrote: It would be possible to write complex file system code to manage free space fragmentation more efficiently (and do other things, like caching and sorting the "index area", or deliberately adding free space at the end of some files to make appending data easier/faster). This should all be optional though - someone who wants to quickly add support for any file system could skip it.
Why wait a full week? If everybody (or at least some amount) agree to the current idea, fix it up and publish the draft, we'll comment again and it'll be done before 5PM.Brendan wrote: In the meantime, I'll wait a week or so for any other suggestions or improvements and then create a formal draft specification for it. After that I can write a generic "boot loader" for it - something that will find a file named "boot.bin", load it and jump to it.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Universal File System
i'd say that a matter of the OS or the VFS to cope with such inconsistence even if the filesystem itself cannot. I mean, if you need <1s precision on the building date, you could have your VFS caching the ultraprecise timing (converted into nanoseconds or whatever) of the latest modified files so that you can resolve race conditions. You might also have your build system aware of the filesystem restriction and have it 'post-dating' or 'ante-dating' the files so that everything remains consistent.Brendan wrote: I have a problem with "seconds since", as it can be too inaccurate for some things - I had a problem with this for my "System Build Utility", which resulted in reduced efficiency (same html files being generated multiple times) as the problem couldn't be resolved on Gentoo/ReiserFS.
... now, i suppose you won't be using UFS for precision timing anyway, will you ?