Universal File System

Pype.Clicker · Post by **Pype.Clicker** » Sun Jan 15, 2006 6:40 am

Rob wrote: This has the advantage that the driver code should be far less "complicated" to rename a directory. The downside is that if another process / thread is reading it could see files in the old directory before they are "moved".

I'd say that's the matter of the OS to make sure there is no race condition such as the one you're talking about (e.g. ensure no process attempt to open something in a "being modified" directory). This can all be handled by structures present in the driver without needing to modify any bits on the disk.

What's is more a concern for the FS, imho, is the fact that the on-disk structure remains consistent (e.g. don't delete the "directory" entry as long as not all files have been moved, etc.)

"What do you mean, I have to change my AudioGrabber settings and re-name my entire mp3 collection to just so I can copy it to my flash drive? I'll just use FAT, thanks..."

not exactly what i meant. you can copy it on your flash drive: the name will just get truncated and made unique (you know, those MyFileIsSoooooooLongThatItUnfortunatelyGetTrunca~2.mp3) .
Imho, an "import/export" xml file could be carrying all those information you normally couldn't have fit SFS, e.g.
- extra dates such as creation, last modification and the like
- extra ownership, etc.
- extra-long filenames.

IIrc, windows and unix doesn't have the same semantic for date stuff (e.g. there's nothing like an "initial file create time" in unix, only "inode creation time", which may be quite different ...)

Brendan · Post by **Brendan** » Sun Jan 15, 2006 7:52 am

Hi,

[off topic]
My apoligies to all, as I've spent most of the last 2 days tracking down an annoying bug - isolating parts, searching through heaps of code, examining CPU detection and AP startup code, etc. I even rewrote all the video parts of my boot manager to work in 80 * 50 text mode (instead of 640 * 480 graphics mode) just to make sure the video code wasn't the problem (it seemed like the only thing that could have been the problem at one point).

After all of this I found the bug about an hour ago. Here it is:

Code: Select all

   mov ecx,(256 * 2048) / 4            ;ecx = size of table in dwords
   mov edi,BOOTMEMCPUdataTable
   clr eax
   cld
.l1:   mov [gs:edi],eax
   add edi,4
   loop .l1

This loop is meant to fill a 512 KB table (which eventually contains information on each CPU detected) with zeros - can you see the bug?

In reality, this loop only clears 256 KB instead of 512 KB, which makes my OS think other CPUs are present if the computer/BIOS leaves data in the last half of the table (which is why the OS worked on almost all of my computers, and died in spectacularly confusing ways otherwise).

If you're still trying to see where the bug is, the loop instruction at the end is running in real mode and therefore uses CX instead of ECX.

[/off topic]

Anyway, I shall try to make amends, in order of newest to oldest.

Rob wrote:Anyway, what about a volume label? I think Windows needs it. Not 100% sure if it is really required though.

I'm not sure if any OS actually needs the volume label, but for floppy disks the file system needs a way to determine if the disk has been changed (for old/dodgy floppy drives where the "media changed" flag doesn't work). For this purpose I'll add a volume label and a time stamp (time when when the media was first formatted) - the chance of both being the same would be very small.

Rob wrote:
1. It looks like it is a conscious decision to have only one date for a file. Perhaps it would be a good idea to have a create date as well? (I don't care for the access date which Windows also has, for example). That might be a piece of crucial information you want to give the receiver.

2.
For the directory entry type, all other fields in the index entry except for the entry name must be ignored when the entry is read and filled with zeros when the entry is being written
Shouldn't a directory have a (create, see point 1) modify date as well? I assume it can be renamed?

Most file systems have a creation time, a modification time and an accessed time, where the modification time is updated when the attributes change or when the file's data changes. On Linux, most people disable the "accessed time" to improve performance (not sure about other OSs). For example, my "/etc/fstab" begins with these 2 lines:

[tt]/dev/md0 / reiserfs noatime 0 1
/dev/sda1 /boot ext2 noatime 0 2[/tt]

This leaves the creation time and the modification time. The modification time is what you see when you type "ls" or "dir" or view the file's properties. It's also used by utilities (archivers, make, etc).

I can't think of any practical use for the creation time. At 8 bytes per timestamp having more than one isn't too efficient, especially for a simple file system that is meant used for transferring files (like FAT) rather than as an advanced native OS partition (like NTFS or ext3). DOS/FAT also only has the modification time.

[continued]

Brendan · Post by **Brendan** » Sun Jan 15, 2006 7:53 am

[continued]

Rob wrote:Perhaps it would be wise to include guidelines/recommendations/algorithms in the spec to advice people that when a new file get's added or an existing file expanded the list of deleted files gets checked.

This will be in the last part of the document (Appendix A), which will contain advice, algorithms suggestions for optimization, etc. I'm intending to do all common operations (creating a file or directory, deleting a file or directory, changing the size of a file, etc).

As for reclaiming the space used by deleted files, this is an "OS specific" option. Some OSs might try to keep as much deleted stuff as possible to maximize the chance of "un-deleting", some OSs might convert the file directly to free space to make it easier to implement, and some OSs might have a "nuke" operation (in addition to the normal "delete" operation) that converts the file to free space and overwrites it several times to deliberately make recovery almost impossible (for security purposes).

The "Algorithms" section will also include some suggestions for avoiding concurrency problems (but not many as this is higher level problem), and some suggestions for making the file system "uncorruptable" (e.g. updating things on disk in an order that allows the file system itself to remain consistant in the case of power failures, etc even though the most recent changes may be lost). Your directory renaming algorithm is a good example of this, but the same sort of thing can be applied to all file operations (at the cost of performance).

I guess what I'm saying is that the specification should have 3 different algorithms for each operation - one for "simplest implementation possible", one for performance and the other for stability. Despite this it will only be a simple guideline, as most people implementing it will probably have better ideas (more suited to their situation) than I'll think of. An example of this would be Candy's last post - "an array of hashed filenames with trees below each hash entry with alphabetic sorting for speed".

Candy wrote:
Despite this, it would be easy to make the "index area entries" 256 bytes long instead of 128 bytes long. That would make the maximum file name length 230 bytes.
What about making it stretchable entries? You just have a first byte/word/dword that indicates how many 64byte blocks it is and it can be any size. The entry placement algorithm is then exactly identical to the file placement algorithm, so you can use the same implementation or knowledge/experience and you can have infinite length filenames (at least, stretchable lengths). My main point is, you don't get arbitrary failures "file path too long" such as were common in DOS (with a limit of iirc 64 or 80 something characters).

You're right - I intend to reduce each "index area entry" to 64 bytes and then allow the file name to stretch over any number of sequential index area entries. I'm thinking of using 2 flags in the "entry flags" field, one to say that the file name is continued in the next entry and the other to say that the entry is a continuation of the previous entry. This means file names that are less than 30 bytes long will consume one entry, names that are less than 93 bytes long will take 2 entries, and a name that takes 63030 bytes will consume 1001 entries.

Cheers,

Brendan

Candy · Post by **Candy** » Sun Jan 15, 2006 9:01 am

Brendan wrote: You're right - I intend to reduce each "index area entry" to 64 bytes and then allow the file name to stretch over any number of sequential index area entries. I'm thinking of using 2 flags in the "entry flags" field, one to say that the file name is continued in the next entry and the other to say that the entry is a continuation of the previous entry. This means file names that are less than 30 bytes long will consume one entry, names that are less than 93 bytes long will take 2 entries, and a name that takes 63030 bytes will consume 1001 entries.

That's awfully close to the FAT patent. Also, it requires linking the entries. I would vote for making the entry itself variable-sized and just stretching it across one or more "dirent"-blocks. Say:

Code: Select all

struct direntry {
  int blockcount:16; // think 8 would suffice
  struct other_info oi;
  char filename[enough];
};

That simplifies reading the entries significantly, with the sidenote that you must read through them all so you don't use a part of a filename. For slow implementations, this is easier. For fast ones, you'd want to cache most of it anyway. The structures require you to either search through all anyway, or to cache them all, so it's pointless to make random access possible.

One other thing, what about dumping the idea of directory names being in the file name? It makes renaming so much harder and unreliable that I'm not sure it'll help. Also, searching through files will be not much harder. It would be a nice touch if you could require the files to be in subdirectory-order so that if you search through the tree that all files in a given directory are after the entry for that directory. That makes any search still linear time. Appending entries at the end ensures this for the simple ones, more complex ones could look at where the parent dir is and find a place to put it from there.

Kemp · Post by **Kemp** » Sun Jan 15, 2006 9:22 am

It would be a nice touch if you could require the files to be in subdirectory-order

I'd imagine the constant rearranging would be very detrimental to a flash-based disk in this case.

Candy · Post by **Candy** » Sun Jan 15, 2006 9:25 am

Kemp wrote:
It would be a nice touch if you could require the files to be in subdirectory-order
I'd imagine the constant rearranging would be very detrimental to a flash-based disk in this case.

You wouldn't rearrange constantly. You would append to the inode list, when it's full you compact it. Given a sufficiently large inode list, you can mess about all you want. If using a smarter algorithm (as indicated above) you might never have to compact.

Kevin McGuire · Post by **Kevin McGuire** » Sun Jan 15, 2006 9:27 am

The file system could directly support extensions to the headers. This way it could be very simple and space smart for using it on a 1.44MB disk vs a 120GB disk. The problem with adding all those time stamps would be:

1. Keeping it simple for developers.
2. Saving space.

The disk could include flags at the header that signal certain header struct contain additional information at the end.

struct tIndex{
unsigned int blockStart;
unsigned int blockEnd;
unsigned int fileLength;
unsigned int64 timeStamp;
unsigned char entryFlags;
unsigned char entryName[95];
};

This is from the specification Brandon posted. This is great. We don't need 64 bit entries for referencing blocks on a 1.44MB disk. We can use 32 bit entries. To extended this we simply append to the header on disks that support additional features.

struct tIndexLargeCapacity{
unsigned int blockStartHigh;
unsigned int blockEndHigh;
unsigned int fileLengthHigh;
};

If a flag is set we could append another struct for even more additional features.

struct tIndexTS{
unsigned int64 creation;
unsigned int64 accessed;
};

struct tIndexLongerFileNames{
unsigned char entryNameHigh[100];
};

I am no expert, but that is my $.02. To shoot down my own idea I will say this adds a extra layer of complexity. I do not know if this FS was being designed to stay simple?

It also seems that this FS would suffer a performance penalty for having to defragment it self when there is no area of contingious space to use. (blockStart & blockEnd). This could be great for a floppy?

Candy · Post by **Candy** » Sun Jan 15, 2006 11:46 am

Addition to requirements: We could use a GRUB driver so people can use it to start quick kernel development. Anybody got experience with GRUB filesystem driver development?

Brendan · Post by **Brendan** » Sun Jan 15, 2006 11:46 am

Hi,

Candy wrote:
Brendan wrote: You're right - I intend to reduce each "index area entry" to 64 bytes and then allow the file name to stretch over any number of sequential index area entries. I'm thinking of using 2 flags in the "entry flags" field, one to say that the file name is continued in the next entry and the other to say that the entry is a continuation of the previous entry. This means file names that are less than 30 bytes long will consume one entry, names that are less than 93 bytes long will take 2 entries, and a name that takes 63030 bytes will consume 1001 entries.
That's awfully close to the FAT patent. Also, it requires linking the entries. I would vote for making the entry itself variable-sized and just stretching it across one or more "dirent"-blocks.

[Expletives] I didn't think of that. I'd still like some way of detecting if you're in the middle of an entry though. How about something like:

Code: Select all

struct direntry {
  char magicnumber   // Anything that can't be in a filename ('?', '*', etc)
  int blockcount:16; // think 8 would suffice
  struct other_info oi;
  char filename[enough];
};

That way if the first byte in the 64 byte chunk isn't <magicnumber> then you know you must be in the middle of a file name.

[continued]

Brendan · Post by **Brendan** » Sun Jan 15, 2006 11:48 am

[continued]

Candy wrote:One other thing, what about dumping the idea of directory names being in the file name? It makes renaming so much harder and unreliable that I'm not sure it'll help. Also, searching through files will be not much harder. It would be a nice touch if you could require the files to be in subdirectory-order so that if you search through the tree that all files in a given directory are after the entry for that directory. That makes any search still linear time. Appending entries at the end ensures this for the simple ones, more complex ones could look at where the parent dir is and find a place to put it from there.

I'm thinking of the simplest possible code. For example, for the original design (before I started the draft) a very simple implementation could support creating new whole files, deleting whole files and reading whole files (and nothing else), with no caching or optimizations.

For a (dodgy) example:

Code: Select all

void *address;

bool check_for_entry(char *name) {
    for(address = index_area_start to index_area_end; step ENTRY_SIZE) {
       if( strcmp(address->name, name) == MATCH) return TRUE;
    }
    return FALSE;
}

bool delete_file(char *name) {
    if(check_for_entry(name) == FALSE) return FALSE;
    address->type = deleted;
    return TRUE;
}

bool write_file(char *name, void *data, int size) {
    void *temp1, temp2;

    delete_file(name);
    temp1 = data_area_end + size;
    temp2 = index_area_start - ENTRY_SIZE;
    if(temp1 > temp2) {
        if( compact() == FALSE) return FALSE;
        return write_file(name, data, size);
    }
    memcopy(data_area_end, data, size);
    temp2->start = data_area_end;
    temp2->end = temp1;
    temp2->size = size;
    temp2->timestamp = get_time();
    temp2->type = FILE;
    memcopy(temp2->name, name, strlen(name)+1 );
    data_area_end = temp1;
    index_area_start = temp2;
    return TRUE;
}

bool read_file(char *name, void *data, int max_size) {
    if(check_for_entry(name) == FALSE) return FALSE;
    size = address->end - address->start;
    if(size > max_size) return FALSE;
    memcopy(data, address->start, size);
    return 0;
}

For something like a digital camera using (memory mapped) flash memory, this is close to complete except for the "compact()" function (which would be a bit messy, but could also be optional). It's also "uncorruptable" in that a power failure at any point won't matter.

This level of simplicity died when we decided to have seperate/explicit entries for directories. Now you'd need to add some "does the directory entry exist" stuff to it, but it'd still be reasonably simple.

I'm almost tempted to make directories implicit again, so you can do "write_file("a/b/c/d/e/f/g/file.txt", data, size)" without caring if any of these directories exist or not. Still, doing one search to see if "a/b/c/d/e/f/g" is present won't slow things down much - it's only one linear search.

To seperate the file name from it's path, you'd need to search for the directory "g", then search for "f" and hope that "f" is the parent to "g", then search for "e" and hope that "e" is the parent to "f", etc. If during this recursive search you find out that the parent is wrong (e.g. if there's other directories elsewhere with the same names, like "oops/b/c/d/e/f/g") you'd need to start again searching from the first sub-directory (i.e. go back to searching for another "g" and do everything again). In this case the simplest possible implementation becomes slow, so you'd need to optimize it to improve those frequent searches and lose the simplicity.

Cheers,

Brendan

Candy · Post by **Candy** » Sun Jan 15, 2006 12:04 pm

Brendan wrote: For something like a digital camera using (memory mapped) flash memory, this is close to complete except for the "compact()" function (which would be a bit messy, but could also be optional). It's also "uncorruptable" in that a power failure at any point won't matter.

This level of simplicity died when we decided to have seperate/explicit entries for directories. Now you'd need to add some "does the directory entry exist" stuff to it, but it'd still be reasonably simple.

You could in reading just ignore the directory entries altogether.

I'm almost tempted to make directories implicit again, so you can do "write_file("a/b/c/d/e/f/g/file.txt", data, size)" without caring if any of these directories exist or not. Still, doing one search to see if "a/b/c/d/e/f/g" is present won't slow things down much - it's only one linear search.

To seperate the file name from it's path, you'd need to search for the directory "g", then search for "f" and hope that "f" is the parent to "g", then search for "e" and hope that "e" is the parent to "f", etc. If during this recursive search you find out that the parent is wrong (e.g. if there's other directories elsewhere with the same names, like "oops/b/c/d/e/f/g") you'd need to start again searching from the first sub-directory (i.e. go back to searching for another "g" and do everything again). In this case the simplest possible implementation becomes slow, so you'd need to optimize it to improve those frequent searches and lose the simplicity.

Place header at end of entry, invert search order (last to first, children must come before parents), rinse, repeat.

In any case, I'm going to leave the final choices and design up to you, I'm going to conform to whatever you will choose.

dushara · Post by **dushara** » Mon Jan 16, 2006 5:34 am

@ Brenden,

Hi I looked at your draft. Just one suggestion really... Would it be a good idea to keep 2 copies of the data area and index area size fields in your super block? (preferably over 2 sectors) and keep an incrementing counter. At one time, you need only write to one super-block and increment the counter. Therefore, should there be a power failure or something, we have an older version of this data. Would help in robustness at a very low cost?

Brendan · Post by **Brendan** » Mon Jan 16, 2006 6:30 am

Hi,

dushara wrote:Would it be a good idea to keep 2 copies of the data area and index area size fields in your super block? (preferably over 2 sectors) and keep an incrementing counter.

Not really - if the counter itself can't be read then you won't know which set of values to use. We'd need 2 versions of this counter with an "extra something" to determine which version of the counter to use. Then we'd need to worry about what happens if there's a problem reading the "extra something".

However, if software can find the start and end of the "index area", it would still be able to recover all files without using anything in the super-block. The end of the index area is easy to find as it's the end of the media, but there's currently no way to find the start of the index area (without the super-block).

For this purpose, I'll add another "index area entry type" to mark the start of the index area, so that recovery software could search entries from the end of the disk until it finds this special entry to find all index entries (and therefore all file data) when the super-block is unreadable.

This special marker will be optional, so that some implementations can ignore it (i.e. not write the marker and have less chance of recovery) and others can use it (i.e. have a better chance of recovery, but slightly slower index area updates and a little more complexity).

Cheers,

Brendan

Brendan · Post by **Brendan** » Mon Jan 16, 2006 9:51 am

Hi,

I've updated the draft specification to take into account the recent discussions. I still haven't done anything in the "algorithms" part yet though....

It's available from my web site at:

http://bcos.hopto.org/sfs.html

I couldn't attach it like last time (file size too long) - if anyone has trouble accessing it just let me know and I'll email it

.

Cheers,

Brendan

Kevin McGuire · Post by **Kevin McGuire** » Mon Jan 16, 2006 11:31 am

DATA AREA DEFRAGMENTATION ARITHOGRAM

I have reviewed the code, and can find no bugs - BUT - there are some. However, this deserves a bronze star for the arithogram at least.

The code works by scanning the index area for the file that resides closest to the first block in the data area. Once a file is found that is the closest to the first block a calculation is done to figure out if it can be moved down a little so it resides exactly on the first block of the data area. Yes? Move it down. No? Leave it alone. The process is then repeated, but this time it is looking for a file that is closest to the last block of the previous moved file. No? We have done all the files. Yes? Move it down. Each time we keep moving to files together and making them contingious. Its more like taking a buncha of bricks and pushing them all together.

This removes gaps caused by the deleting and creating of file that uses different number of blocks over time. This actually allows the system to reclaim space lost by fragmentation. There is a easier and faster way to do this on-the-fly. Actually when a file is being written to the disk, and it can find no space big enough.. This code is just a guide line to defragmenting the data area.

To keep it simple, the arithogram expects a memory manager to be able to give it two free pages, and it expects routines to read and write blocks to the FS. This does not mean it was written to be plugged in to some code, but only to show how the arithogram works at a high level.

OSDev.org

Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System

Re:Universal File System