thewrongchristian wrote:I've been looking over the LeanFS spec, and while I'm mostly positive about it, and seriously considering implementing it as the native FS for my OS, I'm have a few reservations around the directory entries:
Absolutely, and thank you for the comments.
thewrongchristian wrote:With a single byte length, we're limited to directory records of 4096, but the filename length limit uses a 16-bit field.
This seems a bit of a waste. Can these fields be split 12/12 bits instead perhaps, or even reversing the sizes and making recLen a byte count? After all, is 255 file name entry limit an onerous limit?
I also see 16-bit alignment as quite wasteful as well. Why not 8 byte?
Why must nameLen be greater than zero? An unused entry has no name, surely, and so should be zero?
The main goal is simplicity. With a 16-byte alignment, it is a simple task to "jump" to the next entry, as well as this entry being paragraph aligned. A later comment will explain a little better.
nameLen must be greater than zero when the entry is in use. If the entry's FileType field is zero, the now unused parts of the entry are undefined. However, if a driver is concerned about undeleting an entry, the nameLen field now becomes relevant to be able to undelete the file. However, undelete capabilities are not specified within the specification and are driver specific. Therefore, in theory, all entries, used and unused, will have a filename and need a length for that name. See a later comment why this is true.
thewrongchristian wrote:
A single directory entry may span across different blocks.
I think this is a mistake. I prefer the ext2 restriction of not crossing block boundaries, with each directory block being self contained.
With a bigger recLen field, directory entries can claim all the space in a directory block up to block sizes of 64K, even if it's not used for the file name. Directories blocks can then be initialised with a single, large, unused directory entry.
Creating a new entry then becomes a simple matter of finding an existing entry with sufficient free space, and splitting it between the existing and the new entry.
Removing an entry involves simply merging the space of the entry being deleted into the recLen of the previous entry.
In both cases, there is no need to separately track the tail of the directory. Directories will be allocated and sized block by block.
In theory, there should not be any records after the last used record. A directory is simply a file, nothing more.
For example, if there is only the '.', '..', and a single file within a directory, the directory's file will only have a length of 16 + 16 + 12 + length of filename + padding to a paragraph length. The fileSize field in the Inode indicates this length. Only when adding another file to this directory will the driver add another entry, in turn increasing the size of the file. Therefore, in theory, there will be no empty entries. When a file is deleted, a driver may consolidate the used entries, in turn removing the now unused entry, in theory never having an unused entry.
As for the entries crossing a block boundary, again, a directory is simply a file, and a file should not know about block boundaries.
thewrongchristian wrote:All in all, though, I like the look of LeanFS.
Using separate blocks for inodes is something I've looked at in the past, with a view to using tail packing to further reduce space usage of small files, but extended attributes is also a noble use of such space. Perhaps a file tail could be an extended attribute itself, best of all worlds.
One problem of inodes as blocks is the loss of the indirection, making the block location fixed once set in the directory entry, and mandating the use of in-place updates (perhaps mitigated by a journal.)
The other problem I foresee is inodes can no longer be found other than via directory entries. So the loss of a directory block or entry to corruption makes the file contents lost as well, with little hope of fsck finding and linking it back into "/lost+found".
The use of extents is also a win, much more compact (and simpler IMO) than direct and indirect block pointers.
Again, the goal is simplicity. A directory is simply a file, made up of file records, hopefully, but not mandatory, no empty records, especially no trailing empty records. With the capabilities of preallocating extra blocks when the file (directory) is created, in theory, no allocation is needed to add a new record to the file, which makes for a quick and simple task. The driver only needs to allocate a new extent when the current preallocated extents are consumed, again preallocating extra extents. See the Superblock's preallocCount entry and the Inode's iaPrealloc attribute.
With a block size of 4096 and a preallocCount of 1 (allocate 2 blocks when creating the directory), with an average filename length of 20, and the unused tail of the inode used for file contents, this allows approximately 250 directory entries to be added until a new extent is needed. Since there is no Inode in the first block of this new allocation, this next preallocation will allow 256 entries to be added until preallocation is again needed. Simply increasing preallocCount to 3 (4 additional blocks), this doubles the count of entries allowed until allocation is needed.
Therefore, in theory, using preallocation, a file (in this case a directory) allocates more space than needed at file creation. When a file is added to the directory, only the Inode's checksum, fileSize, and time fields are modified, along with the relevant extent(s). No need to access the Superblock, bitmap, or any other part of the volume outside of the Inode and its already allocated extents.
So to (hopefully) answer your questions, a directory should have no empty entries, though this is not mandatory. Allocating large unused entries, splitting them into two when adding an entry, is perfectly allowed. However, if no empty entries are included, it is a simple task to add to the end of the directory. You already know the offset within the file (fileSize), you should already have preallocated extents, though it is a very simple task to find out if you do or not, and you simply need to append the file with an entry. With the example above, you will only need to allocate more space on every 256th entry added. This is a very small percentage. A block size of 512 and a preallocCount of 7 (8 total blocks) will have an allocation needed on every 128th entry added. A similar small percentage.
As for the "/lost+found" aspect, having a list of Inodes, this was once discussed between Salvo and I. However, we came to the conclusion that if added, we would be simply reinventing Ext2. Not what we were intending to do. We wish to keep simplicity within the filesystem.
I do appreciate the comments and don't hesitate to continue. If you have more questions while implementing this filesystem, feel free to post.
Thank you,
Ben