SpeedFS

davidv1992 · Post by **davidv1992** » Thu Jul 19, 2007 9:15 am

I have tried to design a good file system myself, and this is the result. I would like to hear if anyone has sugestions etc.

Combuster · Post by **Combuster** » Thu Jul 19, 2007 1:46 pm

I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.

Candy · Post by **Candy** » Thu Jul 19, 2007 2:22 pm

davidv1992 wrote:I have tried to design a good file system myself, and this is the result. I would like to hear if anyone has sugestions etc.

-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?
-. The short name SFS has been overspent at least 5 times. I would recommend you using an arabic or hebrew character as first, since those are the only few who haven't been used already. You might want to not use a short name (at all).
-. You're not specific. How big is an int, how big is a short?
-. You have max sector size of 32k, max 128 sectors per cluster and max 2G clusters per disk. That's a max of 8PB total, which is pretty fair. I doubt you'll get it up to that because sectors aren't 32k and clusters of 4 megabyte each leave a huge slack space.
-. What character encoding are you using or assuming?
-. I see only literal declarations, no rationales.

This looks like a pretty easy ripoff of FAT. Do you have a list of "why"'s?

davidv1992 · Post by **davidv1992** » Fri Jul 20, 2007 3:49 am

Combuster wrote:I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.

I've done some calculations and my max file size is equal to the max disk size. (thus 256 TB when using 512 byte sectors)
calculation: sector size (assumed 512 bytes) * max sectors per cluster * max clusters per disk = 512 * 128 * 4294967295 = +- 256 TB (little under this actually)

Candy wrote:-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?

Much faster than in FAT. In FAT you had to walk through a File Allocation Table. This one wasn't sorted on file, but on cluster. So when a file was very fragmented you had to load a huge bit of the fat, while you only used a small fraction of it. When loading a part from a file in SpeedFS you just load the (part of) the cluster list located in the directory file under which the file has been stored. This is much faster than walking through a fat which is on its own 0.5 GB, and loading near everything of it. My estimations are when you have the directory file's its first cluster and it's cluster list already loaded, you actualy only need to load 3 extra clusters above your data clusters, depending on the size of the bit you want to load.

Candy wrote:You're not specific. How big is an int, how big is a short?

see second paragraph or so, it is there.

Candy wrote:You have max sector size of 32k, max 128 sectors per cluster and max 2G clusters per disk. That's a max of 8PB total, which is pretty fair. I doubt you'll get it up to that because sectors aren't 32k and clusters of 4 megabyte each leave a huge slack space.

Yes it is possible to do that, but as you already mentioned above you end up with big pieces of unused space, and indeed 32k sectors aren't to say the least common. One correction however, max 4G clusters per disk (0xFFFFFFFF). This huge cluster size slack is just why i calculated max disk size with the assumption that sectors are 512 KB, thus 64 KB clusters. Yes they are big, but on a 256TB volume this is probably quite acceptable. (don't think anyone will fill it with 1 or 2 cluster files).

Candy wrote:What character encoding are you using or assuming?

Good point, i should've included that with the default definition. Updated version will be uploaded very very soon (15 minutes or so).

Candy wrote:I see only literal declarations, no rationales.

What do you mean with this sentence??

And about the SFS "shortname", it's only use is to function as an extra test if the volume is realy a SpeedFS one, it's not ment as a smaller version of the name SpeedFS, you should just call it SpeedFS.

Candy · Post by **Candy** » Fri Jul 20, 2007 10:31 am

davidv1992 wrote:
Combuster wrote:I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.
I've done some calculations and my max file size is equal to the max disk size. (thus 256 TB when using 512 byte sectors)
calculation: sector size (assumed 512 bytes) * max sectors per cluster * max clusters per disk = 512 * 128 * 4294967295 = +- 256 TB (little under this actually)

How big is your int into which you put the file size?

Candy wrote:-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?
Much faster than in FAT. In FAT you had to walk through a File Allocation Table. This one wasn't sorted on file, but on cluster. So when a file was very fragmented you had to load a huge bit of the fat, while you only used a small fraction of it. When loading a part from a file in SpeedFS you just load the (part of) the cluster list located in the directory file under which the file has been stored. This is much faster than walking through a fat which is on its own 0.5 GB, and loading near everything of it. My estimations are when you have the directory file's its first cluster and it's cluster list already loaded, you actualy only need to load 3 extra clusters above your data clusters, depending on the size of the bit you want to load.

Why did you choose a cluster-linked-list approach instead of an extent-based approach, or a direct/indirect block approach?

Candy wrote:You're not specific. How big is an int, how big is a short?
see second paragraph or so, it is there.

My bad.

Candy wrote:I see only literal declarations, no rationales.
What do you mean with this sentence??

Why do you choose to do things the way you chose to do them?

And about the SFS "shortname", it's only use is to function as an extra test if the volume is realy a SpeedFS one, it's not ment as a smaller version of the name SpeedFS, you should just call it SpeedFS.

You'll also hit StarFS volumes, SmallFS, SimpleFS. Those three have passed through here at some point in the past.

Why did you call it SpeedFS? What's speed about it?

davidv1992 · Post by **davidv1992** » Fri Jul 20, 2007 1:25 pm

Candy wrote:
davidv1992 wrote:
Combuster wrote:I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.
I've done some calculations and my max file size is equal to the max disk size. (thus 256 TB when using 512 byte sectors)
calculation: sector size (assumed 512 bytes) * max sectors per cluster * max clusters per disk = 512 * 128 * 4294967295 = +- 256 TB (little under this actually)
How big is your int into which you put the file size?

This int is as big as any int in the definition, thus 4 Bytes, thus having a max value of 0xFFFFFFFF hexadecimal, or the magical number 4294967295 in my calculation.

Candy wrote:
Candy wrote:-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?
Much faster than in FAT. In FAT you had to walk through a File Allocation Table. This one wasn't sorted on file, but on cluster. So when a file was very fragmented you had to load a huge bit of the fat, while you only used a small fraction of it. When loading a part from a file in SpeedFS you just load the (part of) the cluster list located in the directory file under which the file has been stored. This is much faster than walking through a fat which is on its own 0.5 GB, and loading near everything of it. My estimations are when you have the directory file's its first cluster and it's cluster list already loaded, you actualy only need to load 3 extra clusters above your data clusters, depending on the size of the bit you want to load.
Why did you choose a cluster-linked-list approach instead of an extent-based approach, or a direct/indirect block approach?

Throughout the whole process of creating SpeedFS i tried to keep the number of sectors you need to load from the hard disk to an absolute minimum. As you naturaly cannot shrink the number of sectors you need to load for a gigabyte of real data in a file, i tried to keep the overhead to this number caused by the file system at a minimum. Part of this is that the array through which you determine which clusters are part of a file is file specific. This is actually an array with pointers. This allows you to load only the part of this array that you actual need, and even if you need the full file (thus the full array) it will still be much faster and much less loading of clusters compared to FAT. Also, in FAT, when you try to read (following your 8th GB example) the 8th GB of a file, you need to walk over every single cluster used by this file.

this makes things (hopefully) more clear:
For FAT reading 6th cluster of a file:

Code: Select all

cluster = File_First_Cluster;
for (int i=0; i<6; i++)
{
    // Magic code to load the FAT bit we need
    ....

    // What is next cluster???
    cluster = FAT[cluster];
}

//phew we can finaly read our cluster
read_cluster(cluster);

for SpeedFS:

Code: Select all

cluster_list = file_cluster_list;

// some magic code to load the bit of the cluster list we need (we can work this out at this stage)
......

//wow, we can already read our cluster
read_cluster(cluster_list[6]);

for 6 clusters the difference won't be noticed, but try to do this over a range of say 1000 clusters, SpeedFS will be MUCH faster.

Candy wrote:
Candy wrote:I see only literal declarations, no rationales.
What do you mean with this sentence??
Why do you choose to do things the way you chose to do them?

Well, every choise i made is basicly so that to do something in my FS you have to read/write a minimum number of sectors, as this is i believe the biggest hit to preformance.

Candy wrote:
And about the SFS "shortname", it's only use is to function as an extra test if the volume is realy a SpeedFS one, it's not ment as a smaller version of the name SpeedFS, you should just call it SpeedFS.
You'll also hit StarFS volumes, SmallFS, SimpleFS. Those three have passed through here at some point in the past.

Well, i've checked on SimpleFS while creating SpeedFS, and it wont have a higher change to be misrecognized as a SpeedFS volume. Because of it's partition table independence, i doubt it would be ever misrecognized.
StarFS and SmallFS i don't know, couldn't find documentation on both via google. And also, you should always beare in mind a volume can be misrecognized coincidentaly. The use of a recognition string will most likely decrease this chance, but will never take it away completely.

Candy wrote:Why did you call it SpeedFS? What's speed about it?

exactly the fact described two quotes above. I've atleast tried to minimize the number of sectors needed to be loaded for an action.

Candy · Post by **Candy** » Fri Jul 20, 2007 1:54 pm

davidv1992 wrote:
Candy wrote:How big is your int into which you put the file size?
This int is as big as any int in the definition, thus 4 Bytes, thus having a max value of 0xFFFFFFFF hexadecimal, or the magical number 4294967295 in my calculation.

Doesn't that limit files to 4294967295 bytes?

Candy wrote:Why did you choose a cluster-linked-list approach instead of an extent-based approach, or a direct/indirect block approach?
Throughout the whole process of creating SpeedFS i tried to keep the number of sectors you need to load from the hard disk to an absolute minimum. As you naturaly cannot shrink the number of sectors you need to load for a gigabyte of real data in a file, i tried to keep the overhead to this number caused by the file system at a minimum. Part of this is that the array through which you determine which clusters are part of a file is file specific. This is actually an array with pointers. This allows you to load only the part of this array that you actual need, and even if you need the full file (thus the full array) it will still be much faster and much less loading of clusters compared to FAT. Also, in FAT, when you try to read (following your 8th GB example) the 8th GB of a file, you need to walk over every single cluster used by this file.

Ok, point taken. Why do you store each pointer consecutively when most filesystems tend to extents - which is basically taking this list and performing basic run-length encoding? An extent lists "from this sector the next X sectors" - yours would be a degenerate form that says "from this sector the next sector" in each entry. I have to admit, that idea is a large improvement over FAT whilst not raising the complexity by much.

for 6 clusters the difference won't be noticed, but try to do this over a range of say 1000 clusters, SpeedFS will be MUCH faster.

I'm not that sure. If you keep fragmentation under control, you'll be a lot better off.

Candy wrote:Why do you choose to do things the way you chose to do them?
Well, every choise i made is basicly so that to do something in my FS you have to read/write a minimum number of sectors, as this is i believe the biggest hit to preformance.

I dare claim that most common documentation on FS design (BeFS design book, OS design books parts on FS design) disagree, as well as metrics. Writing a full track on a disk is on the order of 60/5400 second on a 5400rpm disk (laptop), or 60/7200 on a 7200 rpm disk (desktop). That comes down to about a hundreth of a second, or 10 milliseconds. Seek times are on average still 8ms and top out at 14/15ms for 3.5" disks, slightly less for 2.5" disks (and if you have a Seagate Bigfoot 5.25" disk, lots more). NCQ, new technique in SATA, allows the drive to reorder commands and just that saves it 10% in performance.

I would try to reduce seeks and seek lengths. Did you read up on BSD's FFS (Fast File System) design? Their concept is to spread out allocation tables etc. into chunks spread out evenly on the disk, so that modifying them would not cause a long seek (seek 15ms, wait for sector 5ms, write sector 0.01ms, seek 15ms, wait for sector 5ms - total time taken 40.01 ms).

Candy wrote:You'll also hit StarFS volumes, SmallFS, SimpleFS. Those three have passed through here at some point in the past.
Well, i've checked on SimpleFS while creating SpeedFS, and it wont have a higher change to be misrecognized as a SpeedFS volume. Because of it's partition table independence, i doubt it would be ever misrecognized.

StarFS and SmallFS i don't know, couldn't find documentation on both via google. And also, you should always beare in mind a volume can be misrecognized coincidentaly. The use of a recognition string will most likely decrease this chance, but will never take it away completely.

StarFS is a creation of Pype and myself, although it usually is shortnamed *FS (where most people misinterpret the star to be a wildcard - you can't win them all). The documentation should still be on the wiclicker.

Do you know about journalling filesystems? They try to tackle the problem of filesystem integrity but the same technique can also be used to tackle speed challenges in filesystem changes. The concept is to bundle all the small writes (that could fail because the computer could drop out during a seek, making one write but not the other) into a log that's written in one go - either the CRC is good (block & writes accepted) or it isn't (blocks not accepted). It also helps on flash disks to reduce hotspots.

Brendan · Post by **Brendan** » Sat Jul 21, 2007 1:52 am

Hi,

davidv1992 wrote:I have tried to design a good file system myself, and this is the result. I would like to hear if anyone has sugestions etc.

Some notes, based on the version of SpeedFS from the first post in this topic...

Is it possible to make the SpeedFS partition bootable? Typically (for chainloading) the boot loader loads the first sector of the partition and jumps to it, but as far as I can tell it'd be jumping to the "SFS\0" indicator, so the traditional boot sequence can't work. What if you want to use SpeedFS for a bootable floppy?

For the root directory, what is the purpose of the directory cluster list?

For all directories, is the directory information limited to one cluster? If you're using 4 KB clusters, and file names are an average of 16 bytes, would it be possible to have more than 100 files in a directory? Is there any limit on the length of a file name (except the 4 GB "item name length" limit)?

Does the directory information keep track of the directory's parent (the ".." entry you see in some file systems)?

I couldn't find anything that keeps track of how large each file is - there's only the number of clusters it consumes (which isn't the same at all). A 16 byte file is 16 bytes long (even though it may consume an entire 8 KB cluster).

I couldn't find any permission flags (read/write/execute, owner, group, etc) - is there any security?

I couldn't find a "modified" flag or timestamps. Is it possible to do incremental backups of this file system? Will something like "make" work?

If I delete a file does the file system immediately free those clusters; or does it mark the file as deleted somehow and free the clusters later (e.g. when free space runs out) so that the file can be "undeleted" (and/or so that freeing the clusters can be postponed until the disk driver is under less load)?

Have you considered fault tolerance? If I get a bad sector in the root directory information or in the cluster usage map, is all the data in the entire partition lost forever? If there's a power failure while you're deleting a file, will clusters be left marked as used when they aren't? How long will it take to scan the disk to find out if the file system is correct after a power failure (recover lost clusters, etc)? Does anything keep track of faulty sectors (or faulty clusters)?

How hard will it be to defragment? For example, if a file uses 20 clusters and 19 of the clusters are contiguous, how quickly can you find out what's using the other cluster? I'm guessing you'd need to search every directory information structure on the disk until you found it... If you cache all the information in RAM to speed up defragmentation, how much RAM would you need? Can the file system defragment itself when it has nothing else to do (for e.g. do a little bit every now and then during idle time), or do you need to take the file system offline while it's being defragmented (so the computer is unusable for hours)?

If this is meant to be a formal specification, then be carefull how you write numbers - "262.144 GB" means two hundred and sixty two (and a bit) gigabytes to me. I'm guessing that where you live a '.' is a thousands seperator and ',' is a decimal point, so that one million looks like "1.000.000,00" (instead of "1,000,000.00"). I'd also suggest using standard units. Technically, "1 GB" is 1000000000 bytes (and not 1073741824 bytes). For 1073741824 bytes use "1 GiB".

Lastly, try not to use file formats like "rich text" - they suck for people who aren't using Windows. Instead use plain ASCII text, HTML or PDF (I prefer HTML, and if you're using "Word for Windows" it's easy to do "save as HTML" when you're ready to publish the file).

Cheers,

Brendan

Candy · Post by **Candy** » Sat Jul 21, 2007 2:23 am

Brendan wrote:Lastly, try not to use file formats like "rich text" - they suck for people who aren't using Windows. Instead use plain ASCII text, HTML or PDF (I prefer HTML, and if you're using "Word for Windows" it's easy to do "save as HTML" when you're ready to publish the file).

I was happy he didn't make it a MS Word DOC file of any version. At least RTF can be somewhat read without reader.

Brendan · Post by **Brendan** » Sat Jul 21, 2007 3:47 am

Hi,

[OFFTOPIC]

Candy wrote:I was happy he didn't make it a MS Word DOC file of any version. At least RTF can be somewhat read without reader.

Possibly, although I'm not sure what with (I have nothing on my Gentoo machine that can read it)...

For fun, I dare you to open the original RTF file in Kwrite (or any other text editor) and see how many lines of crap there are at the start of the file before you get to the first word of text ("SpeedFS"). Here's a hint - it adds up to 8.7 KB.

Converting the original file to HTML using Word dropped it's size from 51.4 KB down to 41.5 KB without loosing formatting. By using copy & paste to transfer the data into a clean HTML document with Mozilla, the file dropped down to 32.3 KB without loosing any formatting. As a plain ASCII text file it's only 5.9 KB, but the formatting is gone.

This means that for 5.9 KB of actual text:

Word's RTF file had 45.5 KB of "formatting".
Word's HTML file had 35.6 KB of "formatting".
Mozilla's HTML file had 26.4 KB of "formatting".

Of course I did take a look at the HTML generated by Mozilla - it wasn't very good either, and the amount of bytes used for formatting could easily be halved.

[/OFFTOPIC]

Cheers,

Brendan

Candy · Post by **Candy** » Sat Jul 21, 2007 10:29 am

Brendan wrote:Hi,

[OFFTOPIC]

Candy wrote:I was happy he didn't make it a MS Word DOC file of any version. At least RTF can be somewhat read without reader.
Possibly, although I'm not sure what with (I have nothing on my Gentoo machine that can read it)...

I used Abiword to read it; I'm pretty sure KWord and OpenOffice.org also can open it. I will take your comments about the stuffing to be a truth, don't see why it wouldn't be.

CompDever · Post by **CompDever** » Sun Jul 22, 2007 1:53 pm

umm, where is the attachment? it helps for their be an attachment, when one attempts to discuss that which was attached.

EDIT: wow... never mind... i forgot to login, lol

OSDev.org

SpeedFS

SpeedFS

Re: SpeedFS

Re: SpeedFS

Re: SpeedFS

Re: SpeedFS

Re: SpeedFS