SimpleFS - Missing from WIKI and Brendan's site
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
So far i've only seen the hex numbers changed (as previous post) so i think its just an updated version and since there's nothing seriously changed i think we can just go ahead with proofreading where we were.
One extra post of 'sorry for botehring you' wouldnt take too much space anyway
And as far as i'm concerned, i have found nothing else really to complain about the specs. In the meantime i got file writing and dir listings in some working state (i.e. it gives the right results amongst 100 lines of debug) and since i couldnt find anything contradicting yet i'll agree if he decides that this spec is final.
I hope to have a full implementation done by end of week, after which i hopefully would've come across all (if any) leftover bugs.
One extra post of 'sorry for botehring you' wouldnt take too much space anyway
And as far as i'm concerned, i have found nothing else really to complain about the specs. In the meantime i got file writing and dir listings in some working state (i.e. it gives the right results amongst 100 lines of debug) and since i couldnt find anything contradicting yet i'll agree if he decides that this spec is final.
I hope to have a full implementation done by end of week, after which i hopefully would've come across all (if any) leftover bugs.
Re:SimpleFS - Missing from WIKI and Brendan's site
Hi,
I still need to establish some way of ensuring consistancy between directory entries and file names - specifying that directory entries must be present for all directories, or that they are optional when it can be determined that the directory exists from one or more file names, or that they must only be used for empty directories, etc.
Cheers,
Brendan
I added a "Notice" and made some changes regarding reserved areas and version numbers in the "Super-Block Format" section (correcting things that you and Combuster have pointed out recently), but I'm not really finished...Habbit wrote:@Brendan: Er... I see the spec is changed, but is it finished? I mean, can we start reviewing this "second draft" or are you still making changes?
I still need to establish some way of ensuring consistancy between directory entries and file names - specifying that directory entries must be present for all directories, or that they are optional when it can be determined that the directory exists from one or more file names, or that they must only be used for empty directories, etc.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
I have designed a set of reading algorithms that are independent of the directory situation, so basically you can leave that architectural decision to the osdever (or the one calling mkfs.sfs).Brendan wrote: I still need to establish some way of ensuring consistancy between directory entries and file names - specifying that directory entries must be present for all directories, or that they are optional when it can be determined that the directory exists from one or more file names, or that they must only be used for empty directories, etc.
The main difference is that implied directories are slower but smaller in storage, while all directories allow algorithms to run one order of magnitude faster. And to be honest, i dont think it's a good thing to enforce a particular size/speed decision on the end user, but that's my opinion.
Re:SimpleFS - Missing from WIKI and Brendan's site
If that was after me ??? and you think I have been so extremely disrespectful, sorry for bothering you all, guys...Combuster wrote: One extra post of 'sorry for botehring you' wouldnt take too much space anyway
On track again: I think there's somethink mismatched in the new spec: about the reserved area, it states:
Code: Select all
The size of the reserved area can be determined from values in the super-block, and can be as small as 1 block, consume the entire media (except for the last 2 blocks), or anything in between.
By the way, as modern and cool 64-bit ints seem over 32-bit ones, I see no point in using them where they are not needed: "size of data area in blocks" or "total number of blocks" could do perfectly with 32-bit numbers: 4 billion blocks * minimum block size of 256 bytes = 1 TiB. Reducing those fields to 32 bits would eliminate the discrepancy with the "reserved blocks" field and leave 8 unused bytes in the superblock for a possible future extension.
Code: Select all
All future versions of the Simple File System will remain compatible with "SFS Version 1.0" (...) Alternatively, code that implements "SFS Version 1.0" can replace the version number with 0x10 to automatically revert the file system such that it complies with this specification (any data in the file system that belongs to optional features added in later versions of SFS become discarded/ignored)
Oh, the good ol' directory mess... The best way to make the first S on SFS go down the drain is "leaving that architectural decision to (whoever else)". Why? Because then, anyone writing a SFS driver would have to think and work out those same reading algorithms you said you've written. They may not be very clumsy, but - I think - a simple FS should need as few if's as possible. Or did you think simplicity came free?Combuster wrote: I have designed a set of reading algorithms that are independent of the directory situation, so basically you can leave that architectural decision to the osdever (or the one calling mkfs.sfs).
(...) And to be honest, i dont think it's a good thing to enforce a particular size/speed decision on the end user, but that's my opinion.
I see that my idea of express directory entries and parent directory links went away to never come back. A pity, but, if we want to continue with this system, we have to take one path or the other: either all directories have to have an entry in the index area - so one missing entry equals orphaned files, or empty (and only empty) directories have one - so we have to devise a way to reliably create and delete those entries as needed without losing the S in front of SFS.
To see what would happen if such a decision was let to the driver writer, let's say we both write drivers: I think all directories must have an explicit entry, while you think directory entries are only needed for empty ones. Needless to say, if a floppy written by your driver is read by mine, all folders without entries would not be visible (even with files in them) - my driver sees them as orphaned, and takes them as free space, ready for recollection: bye bye, files. Another example: I want my driver so darn fast that, when I delete a folder, I just mark its entry as unused - its children are not marked immediately, because the priority in my realtime OS is returning from I/O as fast as possible, and the "ugly job" is left to a background daemon. Say I've just deleted a folder with hundreds of files when I remove the floppy and I insert it into my laptop (with your driver). Guess what? Many deleted files will show up, back from the dead!
If you want simplicity, you can't avoid making decisions. I'm not saying the total blocks superblock field should be eliminated because "we all know SFS is for 1.38 MiB floppies" - that would be just a stupid limitation of its possibilities - but leaving the decision about directory entries to the implementor/user is definitely the way to go if we want constant incompatibilities, files dying and resurrecting misteriously, etc.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
I think we both missed the point hereHabbit wrote: Oh, the good ol' directory mess... The best way to make the first S on SFS go down the drain is "leaving that architectural decision to (whoever else)". Why? Because then, anyone writing a SFS driver would have to think and work out those same reading algorithms you said you've written. They may not be very clumsy, but - I think - a simple FS should need as few if's as possible. Or did you think simplicity came free?
I see that my idea of express directory entries and parent directory links went away to never come back. A pity, but, if we want to continue with this system, we have to take one path or the other: either all directories have to have an entry in the index area - so one missing entry equals orphaned files, or empty (and only empty) directories have one - so we have to devise a way to reliably create and delete those entries as needed without losing the S in front of SFS.
To see what would happen if such a decision was let to the driver writer, let's say we both write drivers: I think all directories must have an explicit entry, while you think directory entries are only needed for empty ones. Needless to say, if a floppy written by your driver is read by mine, all folders without entries would not be visible (even with files in them) - my driver sees them as orphaned, and takes them as free space, ready for recollection: bye bye, files. Another example: I want my driver so darn fast that, when I delete a folder, I just mark its entry as unused - its children are not marked immediately, because the priority in my realtime OS is returning from I/O as fast as possible, and the "ugly job" is left to a background daemon. Say I've just deleted a folder with hundreds of files when I remove the floppy and I insert it into my laptop (with your driver). Guess what? Many deleted files will show up, back from the dead!
If you want simplicity, you can't avoid making decisions. I'm not saying the total blocks superblock field should be eliminated because "we all know SFS is for 1.38 MiB floppies" - that would be just a stupid limitation of its possibilities - but leaving the decision about directory entries to the implementor/user is definitely the way to go if we want constant incompatibilities, files dying and resurrecting misteriously, etc.
If you know that both cases are possible, you should write your driver accordingly and support both, that is always generating the implied directories upon request. Now your os would check if there are orphaned entries and add directories accordingly, my os would filter out the useful entries. The results are the same: all directories can be found and all files in existance can be queried and read, without compatibility issues.
Also, if you know directories can be implied it will not ever suffice to just remove the directory now and orphaned files later since you haven't actually deleted anything that way - after reboot the system would simply be in a state halfway through the delete of files.
Basically, if you know of both cases, you wont suffer from any of those cases
As for my stupidity, i completely forgot to think the SFS way...
Namely, if you do have all directories, all algorithms regarding the index become as stupid as a single for loop instead of performance code with a lot of algoritmic abracadabra which would still be slower.
So yes, i think that having all directories simplifies everything a lot, and i think that's what we are looking for.
Re:SimpleFS - Missing from WIKI and Brendan's site
I definitely support the "all directories must have explicit entries" approach. The only problem I see is that this approach suffers the same problem that my parent-link idea: one missing/unreadable entry would orphan files. What would we do with orphan files?Combuster wrote: As for my stupidity, i completely forgot to think the SFS way...
Namely, if you do have all directories, all algorithms regarding the index become as stupid as a single for loop instead of performance code with a lot of algoritmic abracadabra which would still be slower.
So yes, i think that having all directories simplifies everything a lot, and i think that's what we are looking for.
- We can consider them deleted (as my example realtime OS sfs driver did), but that is a Bad Idea, because without any way of protecting the only copy of the index area from damage, many floppies would regularly "lose files". That is unacceptable.
- We can show them in a lost+found folder (without actually moving them). As no path information is lost, this is not as bad as the first option, but still a bit crappy.
- We can just automagically create a folder entry for the required directories, but this would require the same amount of algorithmic abracadabra than the "no directory entries are needed" approach, so we're back to the Start square
Off-toppic but still on-topic: I'm currently writing my driver's mksimplefs utility (for linux, of course: I can't pay $1000 to get Windoze's IFS kit) However, having been a mostly Redmond guy until some months ago, I can't completely understand the Linux paradigm. I reckon I can open block devices just as files, with fopen or ifstream/ofstream, but how can I get their size? At first I thought just seeking to the end of the "file" with ifstream::seekg and then getting the "current position" marker would suffice. I tested with some floppies and I was able to read the boot sector (the would-be SFS superblock) and get the length of the disk (1474560 bytes, or the BPB-advertised 2880 sectors). However, one disk I inserted buggered me: even though its BPB said it had the normal 2880 sectors, the "goto last byte, get position" function said more tan 1600000 bytes (>3000 sectors!). What can I do then? Should I, when asked to format a floppy device, force the size I get from the BPB/last position to one of the "known floppy sizes"? Should I, when encountering a hitherto unknown floppy size, force the user to specify the size ey wants (like "Floppy size (1600300 bytes) unrecognized. Please select a valid floppy size with the --fsize switch")? I'm completely puzzled ::)
Re:SimpleFS - Missing from WIKI and Brendan's site
Hi,
I haven't really examined how this effects every file I/O operation, but the only operation I can think of where "directory entries for empty directories only" would increase processing overhead is when files are deleted (where the SFS driver needs to find out if the file being deleted was the last file in the directory).
[continued]
It can create compatability problems between different implementations - for example, if one SFS driver assumes that all directory entries are always present while others don't. If the specification explicitly states "directory entries are optional when it can be determined that the directory exists from one or more file names", then this problem would be solved and implementors could still make their own size/space trade-offs without compatability problems.Combuster wrote:I have designed a set of reading algorithms that are independent of the directory situation, so basically you can leave that architectural decision to the osdever (or the one calling mkfs.sfs).Brendan wrote: I still need to establish some way of ensuring consistancy between directory entries and file names - specifying that directory entries must be present for all directories, or that they are optional when it can be determined that the directory exists from one or more file names, or that they must only be used for empty directories, etc.
The main difference is that implied directories are slower but smaller in storage, while all directories allow algorithms to run one order of magnitude faster. And to be honest, i dont think it's a good thing to enforce a particular size/speed decision on the end user, but that's my opinion.
For some reason I can't see how requiring a directory entry for every directory could improve overall performance in any case - the additional disk I/O required to maintain a larger index area (e.g. 100 file entries and 50 directories entries) is likely to be far more significant than any additional processing needed for a smaller index area (e.g. 100 file entries with no directory entries).Combuster wrote:Namely, if you do have all directories, all algorithms regarding the index become as stupid as a single for loop instead of performance code with a lot of algoritmic abracadabra which would still be slower.
So yes, i think that having all directories simplifies everything a lot, and i think that's what we are looking for.
I haven't really examined how this effects every file I/O operation, but the only operation I can think of where "directory entries for empty directories only" would increase processing overhead is when files are deleted (where the SFS driver needs to find out if the file being deleted was the last file in the directory).
You are right - it is inconsistant. I've changed the "Reserved Area" section to clarify how the maximum and minumum sizes of the reserved area should be calculated.Habbit wrote:On track again: I think there's somethink mismatched in the new spec: about the reserved area, it states:However, the superblock layout assigns a 8-byte number to the total block count and just 4 bytes to the reserved blocks count. Not that it really matters - I don't think any partition with > 4 billion _blocks_ will ever be formatted with SimpleFS - but it's a conceptual rift in the spec.Code: Select all
The size of the reserved area can be determined from values in the super-block, and can be as small as 1 block, consume the entire media (except for the last 2 blocks), or anything in between.
I shall remind you of this post in 20 years time, when you're trying to find a suitable way to transfer several 60 TB movies from one computer to another using 700 TB holographic disks.Habbit wrote:By the way, as modern and cool 64-bit ints seem over 32-bit ones, I see no point in using them where they are not needed: "size of data area in blocks" or "total number of blocks" could do perfectly with 32-bit numbers: 4 billion blocks * minimum block size of 256 bytes = 1 TiB. Reducing those fields to 32 bits would eliminate the discrepancy with the "reserved blocks" field and leave 8 unused bytes in the superblock for a possible future extension.
[continued]
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:SimpleFS - Missing from WIKI and Brendan's site
[continued]
The only alternative would be to say that an SFS 1.0 driver must not modify any data if the file system is from a later specification. For normal file systems this would be the best way to go, but for a file system designed specifically for "the free exchange of data on physical media" it's a pain in the neck. For example, if someone sends me a floppy containing files that I can read but can't modify, then I'd do some swearing, copy the files to my hard drive, format the disk and then return an SFS 1.0 format disk anyway.
The SFS 1.0 driver may refuse to modify the SFS 2.0 data, or it can choose to revert the file system back to SFS 1.0 format (which should only occur if the SFS 2.0 data is modified, if the recommendation is followed). This mostly leaves it up to the implementation to decide. For e.g. some sort of "Should I revert it" dialog box might be appropriate (for an OS like Windows), or a special "revert if necessary" or "never revert" mount option (for an OS like Unix), or always reverting the filesystem to SFS 1.0 might be better (for a digital camera).
BTW I'm considering writing some "example" source code - a portable command line utility to work on "SFS 1.0" format disk images (not a true OS specific file system). The idea would be to create the simplest possible code without caring about performance (e.g. no caching, etc) with a development framework so that disk data can be examined/dumped, etc. I'm thinking it could make a good reference, and make it easier for me to find things I've missed...
Cheers,
Brendan
It's not possible for SFS 1.0 code to modify an SFS 2.0 file system without destroying "SFS 2.0" extensions. For example, let's assume SFS 2.0 introduces "thumbnails" where index entry type 0x13 is used to determine where the thumbnail's data is stored in the data area. In this case, the SFS 1.0 driver would trash the data area while leaving an index area entry that points to trashed data. There's no way for the SFS 1.0 driver to know which parts of the data area are used by SFS 2.0 extensions, so there's no way to prevent the SFS 1.0 driver from trashing this data.Habbit wrote:That would be plainly unacceptable for an user: think about someone inserting ey SFS 2.0 disk in eir friend's computer to copy some files, then getting home just to realise that the disk was automagically reverted to SFS 1.0 and all the permissions, thumbnails or whatever information on it has been lost. If a driver does not understand anything it must not touch it unless it is specifically told to do so (i.e. with --forcerw or --revert10).Code: Select all
All future versions of the Simple File System will remain compatible with "SFS Version 1.0" (...) Alternatively, code that implements "SFS Version 1.0" can replace the version number with 0x10 to automatically revert the file system such that it complies with this specification (any data in the file system that belongs to optional features added in later versions of SFS become discarded/ignored)
The only alternative would be to say that an SFS 1.0 driver must not modify any data if the file system is from a later specification. For normal file systems this would be the best way to go, but for a file system designed specifically for "the free exchange of data on physical media" it's a pain in the neck. For example, if someone sends me a floppy containing files that I can read but can't modify, then I'd do some swearing, copy the files to my hard drive, format the disk and then return an SFS 1.0 format disk anyway.
The SFS 1.0 driver may refuse to modify the SFS 2.0 data, or it can choose to revert the file system back to SFS 1.0 format (which should only occur if the SFS 2.0 data is modified, if the recommendation is followed). This mostly leaves it up to the implementation to decide. For e.g. some sort of "Should I revert it" dialog box might be appropriate (for an OS like Windows), or a special "revert if necessary" or "never revert" mount option (for an OS like Unix), or always reverting the filesystem to SFS 1.0 might be better (for a digital camera).
BTW I'm considering writing some "example" source code - a portable command line utility to work on "SFS 1.0" format disk images (not a true OS specific file system). The idea would be to create the simplest possible code without caring about performance (e.g. no caching, etc) with a development framework so that disk data can be examined/dumped, etc. I'm thinking it could make a good reference, and make it easier for me to find things I've missed...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:SimpleFS - Missing from WIKI and Brendan's site
If in 20 years or whenever I try to format such a disk with SFS 1.0, please, book a room for me in a sanatorium (or invite me to your SFS party, whichever you prefer).Brendan wrote:I shall remind you of this post in 20 years time, when you're trying to find a suitable way to transfer several 60 TB movies from one computer to another using 700 TB holographic disks.Habbit wrote:By the way, as modern and cool 64-bit ints seem over 32-bit ones, I see no point in using them where they are not needed: "size of data area in blocks" or "total number of blocks" could do perfectly with 32-bit numbers: 4 billion blocks * minimum block size of 256 bytes = 1 TiB. Reducing those fields to 32 bits would eliminate the discrepancy with the "reserved blocks" field and leave 8 unused bytes in the superblock for a possible future extension.
That's cheating!!! Do you know how much magnificent poetry I had to cut out of my last posts to fit them in a single message? >:(Brendan wrote: [continued]
The point I was trying to make is that the spec should explicitly state that "volumes with newer versions of SFS must be always mounted read-only unless the opposite is explicitly requested". As you said, this works well for normal OSes, which can interpret this "explicitly requested" as they want - a dialog box in Widows, a prompt at mounting or a --forcerw switch in Linux, or even an environment variable in any OS. What I hate is computers doing not undoable things to my data automagically.Brendan wrote: It's not possible for SFS 1.0 code to modify an SFS 2.0 file system without destroying "SFS 2.0" extensions. For example, let's assume SFS 2.0 introduces "thumbnails" where index entry type 0x13 is used to determine where the thumbnail's data is stored in the data area. In this case, the SFS 1.0 driver would trash the data area while leaving an index area entry that points to trashed data. There's no way for the SFS 1.0 driver to know which parts of the data area are used by SFS 2.0 extensions, so there's no way to prevent the SFS 1.0 driver from trashing this data.
The only alternative would be to say that an SFS 1.0 driver must not modify any data if the file system is from a later specification. For normal file systems this would be the best way to go, but for a file system designed specifically for "the free exchange of data on physical media" it's a pain in the neck. For example, if someone sends me a floppy containing files that I can read but can't modify, then I'd do some swearing, copy the files to my hard drive, format the disk and then return an SFS 1.0 format disk anyway.
The SFS 1.0 driver may refuse to modify the SFS 2.0 data, or it can choose to revert the file system back to SFS 1.0 format (which should only occur if the SFS 2.0 data is modified, if the recommendation is followed). This mostly leaves it up to the implementation to decide. For e.g. some sort of "Should I revert it" dialog box might be appropriate (for an OS like Windows), or a special "revert if necessary" or "never revert" mount option (for an OS like Unix), or always reverting the filesystem to SFS 1.0 might be better (for a digital camera).
You said if you found a SFS disk you could read but not write, you'd format it anew: there is no need to do that, just click the "yes" button in the dialog box, mount with --forcerw... Whatever. If you do like automagic, you can use it too: check the "do this for all disks" box in the Windows dialog, add the --forcerw switch to /etc/fstab in Linux to use that switch always - but YOU are freely doing it, and knowingly taking the risk that any SFS > 1.0 disk you insert will be reverted; it is not an OS default setting.
I reckon this scheme would be impractical for embedded devices, such as digital cameras with SFS formatted cards. They also can display a "SFS version newer than camera driver - revert?" prompt, but I think the way to go for them would be deliberately ignoring that part of the spec. They would be SFS-compatible except for that one thing, which the spec would forbid, and their implementors will know they are introducing non-standard behaviour for a good reason. It is not a biggie - usually filesystem implementations are more incompatible with each other and with the spec than that, but that way all other implementations would not use dark automagic...
Everyone, keep the good work up!! The filesystem of the future is finally arriving!!
;D
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
About the directories:
Implying directory entries cause a lot of extra overhead at these points:
Deleting files
The entire index must be scanned if the underlying directory is now empty so that a directory marker must be created
Deleting directories
See deleting files
Moving files
when contents is moved, there might be need to recreate a directory entry, which is quite common after a large move
Moving directories
See moving files
These operations are also strictly poorer in complexity:
Listing directory contents
The entire index must be scanned for implied directories to be listed, after which you need to run a filter algorithm to cut duplicates
It also needs adaptations to the following parts of code and makes optimizing them more difficult:
Adding files
Instead of looking for the existance of the host directory, all file entries must be processed as well to see if that directory exists and the file is allowed to be written
Adding directories
See adding files
In short, most if not all operations are affected by that architectural decision, and i still think that for simplicity we are better off with them than without them.
Furthermore, implying directories will require in several cases that the entire index area is in memory, which is bad if you have a lot of files and limited memory. If you want to list a directory's contents for example you can do that in n-log-n time only if its entirely loaded in memory. In-place you'll probably spend n^2 time which is a horror.
As for directory ratio:
i found an old DOS floppy: 3 dirs ~50 files, which is a more likely combination. And although i'm not sure what directory usage is among os floppies i dont think the directory overhead is as big as 33% of the index size, more like 5-10%.
As for the resulting directory inconsistencies:
If directories must be listed and they dont, then the end user has a problem. The filesystem is still useable - and you can find all the files not in a missing directory.
Which brings us to a very infamous necessity: CHKDSK
In FAT days, this is used to find inconsistencies and attempt to resolve them, as well attempting to redo or undo incomplete transactions. The downside is, that it needs a lot of time to complete a run, bringing us to the need to avoid it.
As such, avoiding to run it if the disk is known not to be inconsistent is probably a good thing (and we know M$ to do it).
Since SFS can remain working in an inconsistent state, its not a necessity for drivers to be aware of "possibly if not likely implied directories" and just keep it simple, while having an utility that verifies the integrity of the entire system as such.
On the other hand, SFS is slow by design: you need to read the entire index to check if a file exists instead of walking a directory structure, so running a scandisk upon mount is likely not to be such an time horror when compared to loading the index into RAM.
As for 64 or 32 bit numbers:
Now to cancel your party, the 1 TB is not a hard limit of SFS in 32bit mode - Think FAT and change block size: 2^(7+255) * 2^32 = a lot (I am so not calculating this number). Shame the spec reads 'must equal sector size' , but then who says holographic disks dont have 256MB sectors
Personally, i dont mind 64 bits, but its annoying only because basic is hardlimitng me at 32 bits here ::) (hands out "insult combuster" invitations)
As for compatibility
It looks like we're throwing the same arguments over and over again. (And I'm happily joining in)
Honestly if a driver finds a disk it doesnt fully understand it should not attempt to alter it without permission. Its not until SFS 2.0 is out to see wether --rw is a do or don't. The only advice i have is to include a reserved field in the superblock, for the rest keep the backwards compatibility issues for the 2.0 design.
It also reminds me of PNG: it used a set of flags to tell what to do with unknown blocks: one of them was keep if modified/remove if modified, which was designed for forward compatibility. You could for instance split the index codes into 0-F and 10-1F for keep/dont keep so SFS 1.0 might retain some SFS 2.0 data without bringing it entirely down to SFS 1.0 level.
That's all folks
[edit]I misread something in the spec, fixed it[/edit]
Implying directory entries cause a lot of extra overhead at these points:
Deleting files
The entire index must be scanned if the underlying directory is now empty so that a directory marker must be created
Deleting directories
See deleting files
Moving files
when contents is moved, there might be need to recreate a directory entry, which is quite common after a large move
Moving directories
See moving files
These operations are also strictly poorer in complexity:
Listing directory contents
The entire index must be scanned for implied directories to be listed, after which you need to run a filter algorithm to cut duplicates
It also needs adaptations to the following parts of code and makes optimizing them more difficult:
Adding files
Instead of looking for the existance of the host directory, all file entries must be processed as well to see if that directory exists and the file is allowed to be written
Adding directories
See adding files
In short, most if not all operations are affected by that architectural decision, and i still think that for simplicity we are better off with them than without them.
Furthermore, implying directories will require in several cases that the entire index area is in memory, which is bad if you have a lot of files and limited memory. If you want to list a directory's contents for example you can do that in n-log-n time only if its entirely loaded in memory. In-place you'll probably spend n^2 time which is a horror.
As for directory ratio:
i found an old DOS floppy: 3 dirs ~50 files, which is a more likely combination. And although i'm not sure what directory usage is among os floppies i dont think the directory overhead is as big as 33% of the index size, more like 5-10%.
As for the resulting directory inconsistencies:
If directories must be listed and they dont, then the end user has a problem. The filesystem is still useable - and you can find all the files not in a missing directory.
Which brings us to a very infamous necessity: CHKDSK
In FAT days, this is used to find inconsistencies and attempt to resolve them, as well attempting to redo or undo incomplete transactions. The downside is, that it needs a lot of time to complete a run, bringing us to the need to avoid it.
As such, avoiding to run it if the disk is known not to be inconsistent is probably a good thing (and we know M$ to do it).
Since SFS can remain working in an inconsistent state, its not a necessity for drivers to be aware of "possibly if not likely implied directories" and just keep it simple, while having an utility that verifies the integrity of the entire system as such.
On the other hand, SFS is slow by design: you need to read the entire index to check if a file exists instead of walking a directory structure, so running a scandisk upon mount is likely not to be such an time horror when compared to loading the index into RAM.
As for 64 or 32 bit numbers:
Now to cancel your party, the 1 TB is not a hard limit of SFS in 32bit mode - Think FAT and change block size: 2^(7+255) * 2^32 = a lot (I am so not calculating this number). Shame the spec reads 'must equal sector size' , but then who says holographic disks dont have 256MB sectors
Personally, i dont mind 64 bits, but its annoying only because basic is hardlimitng me at 32 bits here ::) (hands out "insult combuster" invitations)
As for compatibility
It looks like we're throwing the same arguments over and over again. (And I'm happily joining in)
Honestly if a driver finds a disk it doesnt fully understand it should not attempt to alter it without permission. Its not until SFS 2.0 is out to see wether --rw is a do or don't. The only advice i have is to include a reserved field in the superblock, for the rest keep the backwards compatibility issues for the 2.0 design.
It also reminds me of PNG: it used a set of flags to tell what to do with unknown blocks: one of them was keep if modified/remove if modified, which was designed for forward compatibility. You could for instance split the index codes into 0-F and 10-1F for keep/dont keep so SFS 1.0 might retain some SFS 2.0 data without bringing it entirely down to SFS 1.0 level.
That's all folks
[edit]I misread something in the spec, fixed it[/edit]
Re:SimpleFS - Missing from WIKI and Brendan's site
Hi,
IMHO the only reason to restrict the specification to 32 bit is to save some bytes in the super-block and to occasionally save a continuation entry for file or directory names. To me, increasing the time it takes for the file system to become obsolete is more important than saving a very small amount of disk space...
Cheers,
Brendan
Ok, I'll change the specification to state that directory entries are required for every directory today..Combuster wrote:About the directories:
In this case, a utility like CHKDSK (or the file system code itself) can automatically create the required directory entry, even though they are required and missing. Habbit's earlier example (deleting a directory by removing the directory entry then removing the files) may create inconsistant data if interrupted, but IMHO that would be due to "less than perfect" implementation rather than something that'd need to be avoided by the specification. I guess what I'm saying is that a missing directory entry makes the file system data inconsistant, and is incorrect and avoidable, but easily corrected (rather than being normal, as would be the case for "implied directories").Combuster wrote:If directories must be listed and they dont, then the end user has a problem. The filesystem is still useable - and you can find all the files not in a missing directory.
In this case it's enough for your code to check the "Total number of blocks" field in the super-block and display a "The file system is too large for this code to mount" error message if the highest 32 bits aren't zero. After that you can safely do everything using unsigned 32 bit (and set the highest 32 bit of any modified 64 bit fields to zero if/when necessary). The same can be done if your language only supports signed 32 bit integers (just check if "Total number of blocks" & 0xFFFFFFFF800000000 == 0).Combuster wrote:As for 64 or 32 bit numbers:
Now to cancel your party, the 1 TB is not a hard limit of SFS in 32bit mode - Think FAT and change block size: 2^(7+255) * 2^32 = a lot (I am so not calculating this number). Shame the spec reads 'must equal sector size' , but then who says holographic disks dont have 256MB sectors
Personally, i dont mind 64 bits, but its annoying only because basic is hardlimitng me at 32 bits here ::) (hands out "insult combuster" invitations)
IMHO the only reason to restrict the specification to 32 bit is to save some bytes in the super-block and to occasionally save a continuation entry for file or directory names. To me, increasing the time it takes for the file system to become obsolete is more important than saving a very small amount of disk space...
That is a very nice idea - it would give 3 possible actions that SFS 1.0 code could take to deal with SFS 2.0 data - revert to "pure" SFS 1.0 format, maintain as much SFS 2.0 as possible, or refuse to modify (where any implementation could choose to support any of these options, and "revert to pure SFS 1.0 format" is strongly discouraged).Combuster wrote:It also reminds me of PNG: it used a set of flags to tell what to do with unknown blocks: one of them was keep if modified/remove if modified, which was designed for forward compatibility. You could for instance split the index codes into 0-F and 10-1F for keep/dont keep so SFS 1.0 might retain some SFS 2.0 data without bringing it entirely down to SFS 1.0 level.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
I just stumbled upon a loophole in the spec: Hard links
Right now the spec doesnt say anything about allowing two file entries to point to the same region in the data area, effectively pointing two names at the same data.
More general, you can set up file markers to point to parts of files or even several files, which is nothing short of idiocy, but as the spec doesnt say a word, why not?.
The same story goes up for bad area markers (to some extent)
Now obviously, having partial overlap is generally wrong and since nobody really supports it we can scrap it without worries.
The real problem is, do we allow fully overlapping files as being hard linked, or do we not.
Advantage:
- obviously, links
Disadvantage:
- all appends, overwrites, relocations, and deletes will need to check for file overlap so that all other pointers are updated accordingly and remain consistent.
In the spirit of SFS, i think the pros arent worth the cons, although the checks needed for these arent as complex to implement as implicit directories.
If you want some luxury, this is probably the cheapest thing to implement.
Basically, the spec should have some line about full/partial/no file overlap. Discussion?
As for the bad area markers - they are less of an issue since overlap doesnt mean anything extra anyway. If a region is marked as bad twice, it doesnt add anything. I doubt somebody would ignore the bad area markers and need something like index(2): "dont write here", index(7): "no seriously, dont". So i dont care about wether overlap for bad area markers is allowed or not, but i think some sort of reference would be good.
Also, if a bad area is inside the index, it might matter from what side you look for bad area markers...
Some other things that are common sense but missing are these things:
- a file should not be partially inside the index area, or generally outside the data area
- a bad file marker should not point to blocks outside the limits of the volume
- data area and index should not overlap
And thats everything for today
Right now the spec doesnt say anything about allowing two file entries to point to the same region in the data area, effectively pointing two names at the same data.
More general, you can set up file markers to point to parts of files or even several files, which is nothing short of idiocy, but as the spec doesnt say a word, why not?.
The same story goes up for bad area markers (to some extent)
Now obviously, having partial overlap is generally wrong and since nobody really supports it we can scrap it without worries.
The real problem is, do we allow fully overlapping files as being hard linked, or do we not.
Advantage:
- obviously, links
Disadvantage:
- all appends, overwrites, relocations, and deletes will need to check for file overlap so that all other pointers are updated accordingly and remain consistent.
In the spirit of SFS, i think the pros arent worth the cons, although the checks needed for these arent as complex to implement as implicit directories.
If you want some luxury, this is probably the cheapest thing to implement.
Basically, the spec should have some line about full/partial/no file overlap. Discussion?
As for the bad area markers - they are less of an issue since overlap doesnt mean anything extra anyway. If a region is marked as bad twice, it doesnt add anything. I doubt somebody would ignore the bad area markers and need something like index(2): "dont write here", index(7): "no seriously, dont". So i dont care about wether overlap for bad area markers is allowed or not, but i think some sort of reference would be good.
Also, if a bad area is inside the index, it might matter from what side you look for bad area markers...
Some other things that are common sense but missing are these things:
- a file should not be partially inside the index area, or generally outside the data area
- a bad file marker should not point to blocks outside the limits of the volume
- data area and index should not overlap
And thats everything for today
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
As for now, i got a limited, yet WORKING tool for SFS filesystems.
http://dimensionalrift.homelinux.net/combuster/vdisk/
The biggest problem is that you cant modify anything yet (but you can copy it out, reformat, then upload it back)
For the fans, this means you can start writing SFS-based boot floppy's.
Source + Binaries available from the URL.
Unfortunately its windows only so not everybody can run around with it, but then again, Habbit seems to be making a linux driver...
Oh and, if you seem to get nowhere, try to right-click a few things.
http://dimensionalrift.homelinux.net/combuster/vdisk/
The biggest problem is that you cant modify anything yet (but you can copy it out, reformat, then upload it back)
For the fans, this means you can start writing SFS-based boot floppy's.
Source + Binaries available from the URL.
Unfortunately its windows only so not everybody can run around with it, but then again, Habbit seems to be making a linux driver...
Oh and, if you seem to get nowhere, try to right-click a few things.
Re:SimpleFS - Missing from WIKI and Brendan's site
Oh. My. God.
So many really good ideas and I wasn't here to criticise their creators into a nervous breakdown. I shouldn't have gone on holiday >:(
Well, I'll try to bring myself up to date while being succint (3G wireless links are a bit expensive):
about directory entries: I absolutely support the "explicit directory entries" approach. As you pointed, this makes a "chkdsk" utility indispensable, especially since SFS only has a copy of the index, but I think it could be automatically run on mount
about the linux driver: hehehe... everything I've got at the moment is a moderately complete mkfs.simplefs that can do quick and full formats, merge a binary bootsector image into the superblock and reserve some sectors. However, it can only format 1440K floppies and - I think - its images. How do I obtain the drive geometry? On the "real" SFS driver, I'm still battling with the extremely poor FUSE documentation
about hard links: how do ext2, reiser or ntfs (yes, it supports them) do it?
about everything: the "start block", "end block" and everything else in file / unusable index area entries are volume relative, aren't they? so a file in the first block of the data area would start at block 1 if there are no user reserved blocks?
So many really good ideas and I wasn't here to criticise their creators into a nervous breakdown. I shouldn't have gone on holiday >:(
Well, I'll try to bring myself up to date while being succint (3G wireless links are a bit expensive):
about directory entries: I absolutely support the "explicit directory entries" approach. As you pointed, this makes a "chkdsk" utility indispensable, especially since SFS only has a copy of the index, but I think it could be automatically run on mount
about the linux driver: hehehe... everything I've got at the moment is a moderately complete mkfs.simplefs that can do quick and full formats, merge a binary bootsector image into the superblock and reserve some sectors. However, it can only format 1440K floppies and - I think - its images. How do I obtain the drive geometry? On the "real" SFS driver, I'm still battling with the extremely poor FUSE documentation
about hard links: how do ext2, reiser or ntfs (yes, it supports them) do it?
about everything: the "start block", "end block" and everything else in file / unusable index area entries are volume relative, aren't they? so a file in the first block of the data area would start at block 1 if there are no user reserved blocks?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:SimpleFS - Missing from WIKI and Brendan's site
I hope you enjoyed your holiday - this thread has been dying a bit without you...
about the links:
at least ext2 (and probably reiser, xfs and the whole lot) have an inode table. Sortof what SFS calls the index. It only contains references to files. Directory tables point at inodes, and gives them a name there. Inodes contain a reference count which tells how many hard links there are. Basically, all files are hard links here.
No real clue about NTFS workings, but as far as i know it can only hardlink directories, which is a different story compared to hardlinking files. (e2fs and the lot can't do that)
example:
where files 1 thru 3 are normal files, file 4 is hardlinked to file 1, and file 5 is softlinked to file 4.
Basically, directories and files are kept in separate tables. But that's unfortunately not how sfs works...
Not that its unable to support it, which is what my original post was about.
An idea for sfs2 would be possibly the softlinks: one index entry with two filenames, file1 pointing to file2, but that's something for later.
About offsets, volumes and such:
The spec afaik states something in the likes of "relative to the start of the data area", which means block 0 is indeed sector 1 on a regular floppy. (or partition offset + 1 on harddisks)
There is and will always be one reserved block: the MBR / superblock. (to ease the calculation: start block = volume start + reserved sectors, or 1 = 0 + 1 for floppies)
As for geometry, there is a thing called LBA....
about the links:
at least ext2 (and probably reiser, xfs and the whole lot) have an inode table. Sortof what SFS calls the index. It only contains references to files. Directory tables point at inodes, and gives them a name there. Inodes contain a reference count which tells how many hard links there are. Basically, all files are hard links here.
No real clue about NTFS workings, but as far as i know it can only hardlink directories, which is a different story compared to hardlinking files. (e2fs and the lot can't do that)
example:
Code: Select all
contents of /home/somebody/example
file1.txt (inode 124)
file2.bin (inode 150)
file3.mp3 (inode 212)
file4.c (inode 124)
file5.c (/home/somebody/example/file4.c)
Basically, directories and files are kept in separate tables. But that's unfortunately not how sfs works...
Not that its unable to support it, which is what my original post was about.
An idea for sfs2 would be possibly the softlinks: one index entry with two filenames, file1 pointing to file2, but that's something for later.
About offsets, volumes and such:
The spec afaik states something in the likes of "relative to the start of the data area", which means block 0 is indeed sector 1 on a regular floppy. (or partition offset + 1 on harddisks)
There is and will always be one reserved block: the MBR / superblock. (to ease the calculation: start block = volume start + reserved sectors, or 1 = 0 + 1 for floppies)
As for geometry, there is a thing called LBA....