SpeedFS
Posted: Thu Jul 19, 2007 9:15 am
I have tried to design a good file system myself, and this is the result. I would like to hear if anyone has sugestions etc.
-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?davidv1992 wrote:I have tried to design a good file system myself, and this is the result. I would like to hear if anyone has sugestions etc.
I've done some calculations and my max file size is equal to the max disk size. (thus 256 TB when using 512 byte sectors)Combuster wrote:I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.
Much faster than in FAT. In FAT you had to walk through a File Allocation Table. This one wasn't sorted on file, but on cluster. So when a file was very fragmented you had to load a huge bit of the fat, while you only used a small fraction of it. When loading a part from a file in SpeedFS you just load the (part of) the cluster list located in the directory file under which the file has been stored. This is much faster than walking through a fat which is on its own 0.5 GB, and loading near everything of it. My estimations are when you have the directory file's its first cluster and it's cluster list already loaded, you actualy only need to load 3 extra clusters above your data clusters, depending on the size of the bit you want to load.Candy wrote:-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?
see second paragraph or so, it is there.Candy wrote:You're not specific. How big is an int, how big is a short?
Yes it is possible to do that, but as you already mentioned above you end up with big pieces of unused space, and indeed 32k sectors aren't to say the least common. One correction however, max 4G clusters per disk (0xFFFFFFFF). This huge cluster size slack is just why i calculated max disk size with the assumption that sectors are 512 KB, thus 64 KB clusters. Yes they are big, but on a 256TB volume this is probably quite acceptable. (don't think anyone will fill it with 1 or 2 cluster files).Candy wrote:You have max sector size of 32k, max 128 sectors per cluster and max 2G clusters per disk. That's a max of 8PB total, which is pretty fair. I doubt you'll get it up to that because sectors aren't 32k and clusters of 4 megabyte each leave a huge slack space.
Good point, i should've included that with the default definition. Updated version will be uploaded very very soon (15 minutes or so).Candy wrote:What character encoding are you using or assuming?
What do you mean with this sentence??Candy wrote:I see only literal declarations, no rationales.
How big is your int into which you put the file size?davidv1992 wrote:I've done some calculations and my max file size is equal to the max disk size. (thus 256 TB when using 512 byte sectors)Combuster wrote:I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.
calculation: sector size (assumed 512 bytes) * max sectors per cluster * max clusters per disk = 512 * 128 * 4294967295 = +- 256 TB (little under this actually)
Why did you choose a cluster-linked-list approach instead of an extent-based approach, or a direct/indirect block approach?Much faster than in FAT. In FAT you had to walk through a File Allocation Table. This one wasn't sorted on file, but on cluster. So when a file was very fragmented you had to load a huge bit of the fat, while you only used a small fraction of it. When loading a part from a file in SpeedFS you just load the (part of) the cluster list located in the directory file under which the file has been stored. This is much faster than walking through a fat which is on its own 0.5 GB, and loading near everything of it. My estimations are when you have the directory file's its first cluster and it's cluster list already loaded, you actualy only need to load 3 extra clusters above your data clusters, depending on the size of the bit you want to load.Candy wrote:-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?
My bad.see second paragraph or so, it is there.Candy wrote:You're not specific. How big is an int, how big is a short?
Why do you choose to do things the way you chose to do them?What do you mean with this sentence??Candy wrote:I see only literal declarations, no rationales.
You'll also hit StarFS volumes, SmallFS, SimpleFS. Those three have passed through here at some point in the past.And about the SFS "shortname", it's only use is to function as an extra test if the volume is realy a SpeedFS one, it's not ment as a smaller version of the name SpeedFS, you should just call it SpeedFS.
This int is as big as any int in the definition, thus 4 Bytes, thus having a max value of 0xFFFFFFFF hexadecimal, or the magical number 4294967295 in my calculation.Candy wrote:How big is your int into which you put the file size?davidv1992 wrote:I've done some calculations and my max file size is equal to the max disk size. (thus 256 TB when using 512 byte sectors)Combuster wrote:I see a 4GB file limit. For new filesystems, that's unacceptable. By the time it gets popular it will be unusable.
calculation: sector size (assumed 512 bytes) * max sectors per cluster * max clusters per disk = 512 * 128 * 4294967295 = +- 256 TB (little under this actually)
Throughout the whole process of creating SpeedFS i tried to keep the number of sectors you need to load from the hard disk to an absolute minimum. As you naturaly cannot shrink the number of sectors you need to load for a gigabyte of real data in a file, i tried to keep the overhead to this number caused by the file system at a minimum. Part of this is that the array through which you determine which clusters are part of a file is file specific. This is actually an array with pointers. This allows you to load only the part of this array that you actual need, and even if you need the full file (thus the full array) it will still be much faster and much less loading of clusters compared to FAT. Also, in FAT, when you try to read (following your 8th GB example) the 8th GB of a file, you need to walk over every single cluster used by this file.Candy wrote:Why did you choose a cluster-linked-list approach instead of an extent-based approach, or a direct/indirect block approach?Much faster than in FAT. In FAT you had to walk through a File Allocation Table. This one wasn't sorted on file, but on cluster. So when a file was very fragmented you had to load a huge bit of the fat, while you only used a small fraction of it. When loading a part from a file in SpeedFS you just load the (part of) the cluster list located in the directory file under which the file has been stored. This is much faster than walking through a fat which is on its own 0.5 GB, and loading near everything of it. My estimations are when you have the directory file's its first cluster and it's cluster list already loaded, you actualy only need to load 3 extra clusters above your data clusters, depending on the size of the bit you want to load.Candy wrote:-. It's FAT based. That's pretty bad for performance. Your memory usage will be relative to the total size of the disk, which in cases of disks of 0.8 TB will be huge. How fast is seeking to the 8th gigabyte of a file?
Code: Select all
cluster = File_First_Cluster;
for (int i=0; i<6; i++)
{
// Magic code to load the FAT bit we need
....
// What is next cluster???
cluster = FAT[cluster];
}
//phew we can finaly read our cluster
read_cluster(cluster);
Code: Select all
cluster_list = file_cluster_list;
// some magic code to load the bit of the cluster list we need (we can work this out at this stage)
......
//wow, we can already read our cluster
read_cluster(cluster_list[6]);
Well, every choise i made is basicly so that to do something in my FS you have to read/write a minimum number of sectors, as this is i believe the biggest hit to preformance.Candy wrote:Why do you choose to do things the way you chose to do them?What do you mean with this sentence??Candy wrote:I see only literal declarations, no rationales.
Well, i've checked on SimpleFS while creating SpeedFS, and it wont have a higher change to be misrecognized as a SpeedFS volume. Because of it's partition table independence, i doubt it would be ever misrecognized.Candy wrote:You'll also hit StarFS volumes, SmallFS, SimpleFS. Those three have passed through here at some point in the past.And about the SFS "shortname", it's only use is to function as an extra test if the volume is realy a SpeedFS one, it's not ment as a smaller version of the name SpeedFS, you should just call it SpeedFS.
exactly the fact described two quotes above. I've atleast tried to minimize the number of sectors needed to be loaded for an action.Candy wrote:Why did you call it SpeedFS? What's speed about it?
Doesn't that limit files to 4294967295 bytes?davidv1992 wrote:This int is as big as any int in the definition, thus 4 Bytes, thus having a max value of 0xFFFFFFFF hexadecimal, or the magical number 4294967295 in my calculation.Candy wrote:How big is your int into which you put the file size?
Ok, point taken. Why do you store each pointer consecutively when most filesystems tend to extents - which is basically taking this list and performing basic run-length encoding? An extent lists "from this sector the next X sectors" - yours would be a degenerate form that says "from this sector the next sector" in each entry. I have to admit, that idea is a large improvement over FAT whilst not raising the complexity by much.Throughout the whole process of creating SpeedFS i tried to keep the number of sectors you need to load from the hard disk to an absolute minimum. As you naturaly cannot shrink the number of sectors you need to load for a gigabyte of real data in a file, i tried to keep the overhead to this number caused by the file system at a minimum. Part of this is that the array through which you determine which clusters are part of a file is file specific. This is actually an array with pointers. This allows you to load only the part of this array that you actual need, and even if you need the full file (thus the full array) it will still be much faster and much less loading of clusters compared to FAT. Also, in FAT, when you try to read (following your 8th GB example) the 8th GB of a file, you need to walk over every single cluster used by this file.Candy wrote:Why did you choose a cluster-linked-list approach instead of an extent-based approach, or a direct/indirect block approach?
I'm not that sure. If you keep fragmentation under control, you'll be a lot better off.for 6 clusters the difference won't be noticed, but try to do this over a range of say 1000 clusters, SpeedFS will be MUCH faster.
I dare claim that most common documentation on FS design (BeFS design book, OS design books parts on FS design) disagree, as well as metrics. Writing a full track on a disk is on the order of 60/5400 second on a 5400rpm disk (laptop), or 60/7200 on a 7200 rpm disk (desktop). That comes down to about a hundreth of a second, or 10 milliseconds. Seek times are on average still 8ms and top out at 14/15ms for 3.5" disks, slightly less for 2.5" disks (and if you have a Seagate Bigfoot 5.25" disk, lots more). NCQ, new technique in SATA, allows the drive to reorder commands and just that saves it 10% in performance.Well, every choise i made is basicly so that to do something in my FS you have to read/write a minimum number of sectors, as this is i believe the biggest hit to preformance.Candy wrote:Why do you choose to do things the way you chose to do them?
StarFS is a creation of Pype and myself, although it usually is shortnamed *FS (where most people misinterpret the star to be a wildcard - you can't win them all). The documentation should still be on the wiclicker.Well, i've checked on SimpleFS while creating SpeedFS, and it wont have a higher change to be misrecognized as a SpeedFS volume. Because of it's partition table independence, i doubt it would be ever misrecognized.Candy wrote:You'll also hit StarFS volumes, SmallFS, SimpleFS. Those three have passed through here at some point in the past.
StarFS and SmallFS i don't know, couldn't find documentation on both via google. And also, you should always beare in mind a volume can be misrecognized coincidentaly. The use of a recognition string will most likely decrease this chance, but will never take it away completely.
Some notes, based on the version of SpeedFS from the first post in this topic...davidv1992 wrote:I have tried to design a good file system myself, and this is the result. I would like to hear if anyone has sugestions etc.
I was happy he didn't make it a MS Word DOC file of any version. At least RTF can be somewhat read without reader.Brendan wrote:Lastly, try not to use file formats like "rich text" - they suck for people who aren't using Windows. Instead use plain ASCII text, HTML or PDF (I prefer HTML, and if you're using "Word for Windows" it's easy to do "save as HTML" when you're ready to publish the file).
Possibly, although I'm not sure what with (I have nothing on my Gentoo machine that can read it)...Candy wrote:I was happy he didn't make it a MS Word DOC file of any version. At least RTF can be somewhat read without reader.
I used Abiword to read it; I'm pretty sure KWord and OpenOffice.org also can open it. I will take your comments about the stuffing to be a truth, don't see why it wouldn't be.Brendan wrote:Hi,
[OFFTOPIC]
Possibly, although I'm not sure what with (I have nothing on my Gentoo machine that can read it)...Candy wrote:I was happy he didn't make it a MS Word DOC file of any version. At least RTF can be somewhat read without reader.