The "linked lists of extents" (or "LLoE" to save me some typing) would be used for storing arbitrary things.pauldinhqd wrote:These are great points however I'm still a bit confused whether linked-listBrendan wrote:If you want to develop a new high performance file system; I'd start by splitting it in half - a lower level part and a higher level part.
The lower level part would mainly manage "linked lists of extents" - allocating new extents, freeing them, resizing them, reading them and writing them. Naturally it'd also manage free space, handle things like bad sectors, etc; and it'd probably be the best place to do things like "sparse lists of extents" (used for sparse files support), encryption, compression, etc.
The higher level part relies on the lower level part. A file's data is just a linked lists of extents. Directories are linked lists of extents that contain directory entries instead of file data. More advanced things like snapshots and forks are just more linked lists of extents.
If you split things like this you should be able to test the lower level part on its own; and could possibly have several "lower level parts" and several "higher level parts" and combine them in various ways. For example, perhaps one "lower level part" that tries harder to avoid moving extents for SSD and USB flash, and another "lower level part" that tries harder to avoid fragmented extents for hard drives; and perhaps one "higher level part" designed to be efficient for lots of small files and one designed to be efficient for lots of large files (plus maybe a stable version of each and a beta/development version of each).
may give good performance, coz, let say when we need to find some file with its
name and its path, it seems that we to have to scan 'linearly' thru' certain linked-lists,
and what if these linked-lists contain lots of items
Possible to allow O(logN) file search on these linked-lists
What may happen when we seek to a byte in a very large file
If it makes it easier, think of it like "malloc()". The "malloc()" function just gives you a pointer you can use to access "n bytes of storage in RAM". It doesn't care what you store in the allocated space, and you can use it to store hash tables, btrees, arrays, structures or anything else. You also don't need to care if the storage space is actually contiguous or not - it might be many separate physical pages that look contiguous due to paging. Of course "malloc()" is just part of a set of functions, and you can "free()", "realloc()", etc too.
In the same way, the lower level gives you an "LLoE ID" (instead of a pointer) that you can use to access "n bytes of storage on disk" (instead of in RAM). It doesn't care what you store in the allocated space, and you can use it to store hash tables, btrees, arrays, structures or anything else. You also don't need to care if the storage space is actually contiguous or not - it might be many separate physical sectors (or extents) that look like contiguous space because the lower level does some translation to make offsets within the LLoE look contiguous even if they actually aren't. Of course the lower level would let you allocate LLoEs, free them, resize them, etc.
Cheers,
Brendan