Universal File System

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
JoeKayzA

Re:Universal File System

Post by JoeKayzA »

Brendan wrote: In this case we'd need to check if the directory entry exists before creating a file in a directory, but this is normal for any file system. It'd use up some extra space in the "index area", but it does make much more sense - thanks for this JoeKayzA :)
You are welcome. ;)

Candy wrote: What about making them in 64-byte intervals with an 8-bit "size bit" at the front? indicating how many 64-byte blocks were used? Microsoft does this as well in their index.dat files (and I can't recall a patent on that ... :)).
When the index area grows upside-down (as Brendan suggested, and I agree that this will make things easier), we'd have to provide that size-byte at the end of an entry as well. IMO, 103 characters for the filename and path should be sufficient for most human-generated purposes, but could easily be exceeded by system-realated, more complex structures....so if we go for that limitation, we clearly make it unsuitable for these purposes (like using it as the root fs for a linux system... ;D ).

Maybe we could make 'long file names'-support ( ;) ) optional too, along with the fragmented files support?

cheers Joe
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Universal File System

Post by Brendan »

Hi,
Candy wrote:I was going for 1/(2^64)th of a second as unit for my OS, with crudeness as rdtsc will allow me to do, especially for these cases. Although I can't see any such problem with file transfers, as they really don't have any need for such precision. Hence my vote for seconds, since people usually (I do for instance) don't care about subsecond differences. You can't properly switch any medium within a second.
The problem isn't so much the transfer times (the chance of 2 computers having their clocks synchronized accurately enough isn't high anyway), but the differences between timestamps - for e.g. is the file "foo" older than the file "bar", when both files are on the simple FS (or have been copied from the simple FS).
Brendan wrote:What if we define two levels of support, with level 1 being default with only static allocation (no fragmentation) and level 2 allowing fragmentation? That way you could promote level 1 as common exchange format (and I'll slave at making all drivers as fast as possible, so they offer an advantage to the common user) and level 2 as common storage format (for more advanced things such as your personal USB stick section etc).
That would depend on what you mean by "fragmentation". All file system code would support free space fragmentation (with varying complexity in the algorithms and optimizations used in each implementation). Fragmentation of files is completely different - the file system could be extended or improved to be more efficient or give better performance, but this would (IMHO) be beyond the scope of "level 1", and a "level 2" would be appropriate if anyone wished to add support for fragmentation of files (as it'd involve breaking the implied simplicity of "level 1").
Candy wrote:Why wait a full week? If everybody (or at least some amount) agree to the current idea, fix it up and publish the draft, we'll comment again and it'll be done before 5PM.
I'm currently over-commited. I've stated that I'll do a "bug fix" release of my project, which is now overdue by several days. I've also implied that I'll add support for hard drives and CD-ROMs to the Bochs BIOS I'm working on before the end of the month, and haven't started it. Other than that, it's almost 1 AM in the morning here and I'd estimate a formal draft would take me half a day.

Waiting a week gives time for the FS idea to settle in my mind while I clear some of the backlog. It also allows time for improvements or alternatives to be suggested, for additional design flaws to be found and for details I've skipped to be considered, like "endian-ness", reserved characters in file names, volume labels, some sort of "media changed" detection mechanism (for crappy floppy drives), and anything else that I may have failed to think about so far.

I guess there's also the question of finding a name for the thing...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Universal File System

Post by Candy »

JoeKayzA wrote: When the index area grows upside-down (as Brendan suggested, and I agree that this will make things easier), we'd have to provide that size-byte at the end of an entry as well.
Hm... that's still a quite trivial thing to do, since you're iterating them anyway. You could turn it around too, put the index at the begin and the files at the end.
Kemp

Re:Universal File System

Post by Kemp »

I haven't contributed yet, but this seems like a good idea so I might as well put my 2c in.
for details I've skipped to be considered, like "endian-ness", reserved characters in file names, volume labels, some sort of "media changed" detection mechanism (for crappy floppy drives)
I like Intel-endian ;D I'm guessing most of our OSes are based on Intel/AMD x86/x64 architectures so it'd probably work out best that way. Reserved characters isn't too hard really, just the ones you use for directory seperators and useful ones for the system when doing searching, so pretty much '\' or '/' (or both) and *, possibly also ? for simple pattern matching Windows style.

Quick point... Deleting files. How is that handled? I know windows replaces the first character of the name with a ! (IIRC), so if you take a version of that method then that'd be one more reserved character.

Edit:
Hmmm.... how does that first character thing even work? The previously occupied space wouldn't be marked as free any more still, you'd have to check for free space in the places pointed to be deleted file entries as well as areas marked as free. Of course, I'm still thinking in terms of FAT, so this might not matter at all in this design.
Dex4u

Re:Universal File System

Post by Dex4u »

Maybe you could take a look at v2os file sys, as its similar to ideas here and come with a open licence.
http://v2os.v2.nl/old/
Go to FAQ, v2_FS.
Crazed123

Re:Universal File System

Post by Crazed123 »

Brendan wrote: Hi,

Now that Microsoft have patented the FAT file system, there is no simple file system that can be used for transferring data easily between computers, and OS developers who are using FAT are screwed may be sued if Microsoft get bored.

It seems obvious that a common file format is needed to replace FAT. Such a file system should be extremely simple to implement and be licenced under a non-restrictive licence (e.g. BSD licence or public domain). File security isn't needed, performance doesn't matter much, and long file names (hopefully in UTF8) would be nice.

IMHO existing file systems (ext2, ext3, reiser, etc) are too complex for this - they support advanced features that aren't needed (file permissions, file owner/group, sparse files, journalling, etc).

Anyway, the idea would be to develop an extremely simple standard file system that all OS developers could use, then try to get someone to write Linux and Windows support for it, and hope that other manufacturers (flash memory, digital cameras, etc) will eventually adopt it.

Any ideas?


Cheers,

Brendan
I dunno. On My Pet Operating System there will be extremely important Capability Security that will require a special filesystem feature to store the capabilities. I think this project is impossible, because forcing my OS to interact with an insecure Universal File System could make its safety choke and die.

And what about Unununium and CapROS? They use Orthogonal Persistence and don't even have filesystems! How will they use the Universal File System?
</sarcasm>
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Universal File System

Post by Brendan »

Hi,
Kemp wrote:Reserved characters isn't too hard really, just the ones you use for directory seperators and useful ones for the system when doing searching, so pretty much '\' or '/' (or both) and *, possibly also ? for simple pattern matching Windows style.
There's also whitespace and control characters - like tab, newline, linefeed, etc which probably shouldn't be allowed. Unicode has normal spaces and "non-break" spaces, where I'd prefer to allow non-break spaces but not allow normal spaces (normal spaces would be silently replaced by non-break spaces). The file system can also assume that text is written left to right unless the file name contains the special Unicode code to reverse the direction. If an OS can't display some characters in a file name, then it should display a '?' (rather than any other type of "can't be displayed" symbol) so that a command line user can type the file name as they see it and the OS's pattern matching would still find the file.
Kemp wrote:Quick point... Deleting files. How is that handled? I know windows replaces the first character of the name with a ! (IIRC), so if you take a version of that method then that'd be one more reserved character.
Originally, when a file is deleted all that happened was it's length was set to zero. I don't like this now as it doesn't allow for zero length files. Instead I'm thinking that length = -2 could mean a directory entry (also with trailing slash) and length = -1 could mean deleted. This should work (a deleted directory would contain a trailing slash while a deleted file wouldn't).
Kemp wrote: Hmmm.... how does that first character thing even work? The previously occupied space wouldn't be marked as free any more still, you'd have to check for free space in the places pointed to be deleted file entries as well as areas marked as free. Of course, I'm still thinking in terms of FAT, so this might not matter at all in this design.
For FAT, I think they also change the file attributes and mark blocks in the FAT as unused.

For the simple file system, there is no FAT (it's "fat-free" ;) ). Free space is anywhere that a file isn't. This means to find free space between files you'd need to scan the "index area" (which would be slow), but there's also values in the first sector which can be used to find the boundaries of the free space between the index area and data area quickly.

If I've got time I'll draw up a "preliminary draft" today, including a full explanation of everything in this thread and pseudo-code for common operations.


@Dex4U: V2_FS does seem fairly similar, but I see a few problems. Firstly, the sector size is always 512 bytes, which isn't so good for CD-ROMs or non-standard floppy formats. The length of a file is expressed in "number of sectors" rather than "number of bytes", which can create problems (e.g. up to 511 random characters at the end of text files). The "FileType" field would need some form of controlling body to ensure that different people don't use the same identifiers for different types of files. Also the documentation is more of a description than a formal specification.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Universal File System

Post by Candy »

Brendan wrote: If I've got time I'll draw up a "preliminary draft" today, including a full explanation of everything in this thread and pseudo-code for common operations.
I've been messing with linux drivers all last week and for the occasion I've been sifting through some filesystem drivers, to find out which is easiest to adapt. There's a filesystem called BFS (not BeFS) that is also based on non-fragmented allocation and so on. It's very small (printed it and it was around 15 pages, which is awfully small for a physical filesystem) and it's very clear for Linux filesystem drivers (have compared it with UFS, ext2 and FAT, which are all a mess). You might want to check it out first, it seems to be very similar to what we are doing. If it isn't going to suffice in its current state, I'll make a linux driver from it.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Universal File System

Post by Brendan »

Hi,
Candy wrote:I've been messing with linux drivers all last week and for the occasion I've been sifting through some filesystem drivers, to find out which is easiest to adapt. There's a filesystem called BFS (not BeFS) that is also based on non-fragmented allocation and so on. It's very small (printed it and it was around 15 pages, which is awfully small for a physical filesystem) and it's very clear for Linux filesystem drivers (have compared it with UFS, ext2 and FAT, which are all a mess). You might want to check it out first, it seems to be very similar to what we are doing. If it isn't going to suffice in its current state, I'll make a linux driver from it.
One of the problems with UTF8 is that characters are in groups according to the type/s of language/s that use them, and the encoding means that a character consumes between 1 and 4 bytes. This means that for English 20 characters take 20 bytes, for Byelorussian 20 characters typically take 40 bytes, for Chinese 20 characters typically take 60 bytes, etc.

BFS is limited to 14 byte file names and doesn't support directories. This works out to 4 Chinese characters or 7 Byelorussian characters, and is too limited to do "pretend" directories (i.e. by prepending the path to the file's name).

The entire file system is also limited to 4 GB, which isn't important at the moment, but may become a problem in the future - (e.g. storing a few high defininition videos or something).

Also (and I may be completely wrong here), the super block is stored at the very beginning of the disk and the first 32 bits (at offset 0) is used for a magic number. This makes it difficult to use bootable 80x86 media where the BIOS is hard coded to jump to magic number. BFS is designed for booting, so perhaps I've overlooked something.

Maybe I'm just too fussy...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Universal File System

Post by Candy »

Brendan wrote: One of the problems with UTF8 is that characters are in groups according to the type/s of language/s that use them, and the encoding means that a character consumes between 1 and 4 bytes. This means that for English 20 characters take 20 bytes, for Byelorussian 20 characters typically take 40 bytes, for Chinese 20 characters typically take 60 bytes, etc.

BFS is limited to 14 byte file names and doesn't support directories. This works out to 4 Chinese characters or 7 Byelorussian characters, and is too limited to do "pretend" directories (i.e. by prepending the path to the file's name).

The entire file system is also limited to 4 GB, which isn't important at the moment, but may become a problem in the future - (e.g. storing a few high defininition videos or something).

Also (and I may be completely wrong here), the super block is stored at the very beginning of the disk and the first 32 bits (at offset 0) is used for a magic number. This makes it difficult to use bootable 80x86 media where the BIOS is hard coded to jump to magic number. BFS is designed for booting, so perhaps I've overlooked something.
That's pretty much limiting enough for it in its current form to be unusable. However, the driver for it is clear and similar to what we're trying to do, so I'm hereby volunteering to make a linux driver for our FS. Makes testing your OS a lot easier, when you have a simple filesystem you can write to (esp. if it isn't patented from here to japan and back). Also, it makes for practice for my own fs in linux (which'll be quite a lot harder to make).
Maybe I'm just too fussy...
Surely not. Still, compared to other drivers and filesystems (procfs, cramfs, ext2, fat, romfs, ramfs) it's the closest I could find, and it's pretty clean. It was more about either using the fs itself (which I didn't know up to now, so that's a no-no) or using the driver as basis for our own.

Then there's the point of a driver for Windows and one for MacOS X. I might be able to do the windows one in some time, but I can't make a MacOS driver. The Linux one will be PD, the Windows one as far as I can make it also, I would prefer the MacOS one to be PD as well. That way the filesystem is entirely open and free in any form of meaning so people can use it just as easy as FAT (sometimes easier). I'd also prefer if somebody wrote an OS-independant version in multiple languages (I'll help translate to some), among which at least pure C and assembly, since these are used by quite some people doing OSdev that are not specifically going to try to support it.

As a final point, we'll need a public website that carries out the message and content, publication on technical magazines (write to linux magazine, slashdot etc) and we could use some active approaching of flashdisk device manufacturers and flashdisk manufacturers for support of the file system.

I've noticed the short name SFS is still free. It could be used for StarFS, but I think we'll stick with the long one, so it could be used for this one. Two suggestions: Slim (as opposed to FAT) or SimpleFS (or just SFS).
Rob

Re:Universal File System

Post by Rob »

Brendan wrote: If an OS can't display some characters in a file name, then it should display a '?' (rather than any other type of "can't be displayed" symbol) so that a command line user can type the file name as they see it and the OS's pattern matching would still find the file.
What happens if you have two or more files with unknown characters in identical places? In other words, the original files where two (or more) different files. But when shown with ? characters they all appear the same. This is not really a problem if you have a file dialog or some directory browser, but if you have a commandline how will it know which of the files you meant?
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Universal File System

Post by Candy »

Rob wrote:
Brendan wrote: If an OS can't display some characters in a file name, then it should display a '?' (rather than any other type of "can't be displayed" symbol) so that a command line user can type the file name as they see it and the OS's pattern matching would still find the file.
What happens if you have two or more files with unknown characters in identical places? In other words, the original files where two (or more) different files. But when shown with ? characters they all appear the same. This is not really a problem if you have a file dialog or some directory browser, but if you have a commandline how will it know which of the files you meant?
That's a pretty good point. If you get a bunch of files all with filenames written in a language you don't understand, you'll only see ?'s. That way, any file with an equal amount of characters will be equal. That's pretty likely.

What about \x1234 escape sequences? They can't be confused.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Universal File System

Post by Pype.Clicker »

What about \x1234 escape sequences? They can't be confused.
And have "D?fense de Th?se" displayed as "D\x234fense de Th\x233se" ? i'd prefer "Defense de These" if i had the choice, or yes, even "D?fense de Th?se" which at least would be directly reusable (since '?' would be single-letter jokers).

"????" would leave me as clueless as "你好世界" anyway ...

Isn't that more a matter of the user interface anyway ? you might want to cut&paste the output of the console (which the terminal might be smart enough to cut the _real_ characters than their on-screen representation...

The 'LS' program could also detect that the locale settings says you're unlikely to parse chinese characters and display

[001] 你好世界
[002] 你好.c
[003] Makefile

or if you prefer
[001] ????
[002] ??.c
[003] Makefile

so that you can just type "gcc [002]" instead of trying to retrieve that 'chinese hello'.c, or "gcc 你&#22909.c" or even "gcc ??.c" :P

<note> this post was powered by BabelFish </note>
RetainSoftware

Re:Universal File System

Post by RetainSoftware »

i thought the universal file system was supposed to be simple and basic and now we're talking about UTF8 and different languages. In my opinion choose only english in the beginning even with UTF8 if so needed and add other languages later version 1.1 or so.
User avatar
df
Member
Member
Posts: 1076
Joined: Fri Oct 22, 2004 11:00 pm
Contact:

Re:Universal File System

Post by df »

Id rather make some changes to minix v1fs, which is just as, if not more simple than FAT. extend v1fs fs to support larger file sizes/partitions....

minix v2fs is more complex than needed. I have not looked at 'v3' fs yet.
-- Stu --
Post Reply