Nora's Reliable FileSystem (NRFS) v0.1
Posted: Sun Sep 25, 2022 6:44 am
Hello,
I finished the first version of my custom filesystem. You can find it here. Current features include:
Why yet another filesystem?
Personally, I'm a fan of filesystems that support transparent compression since you can save a huge amount of space with it. On my Linux desktop I use ZFS which currently reports a x1.96 compressratio, i.e. the total size of all data is halved, which doubles the effective capacity of my main disk.
However, ZFS is very complex and porting it to my own OS is likely very difficult. i've looked at other filesystems with compression such as BTRFS but I'm unable to find any that isn't either very complex or a read-only FS. Hence I decided to roll my own.
Design (briefly)
I ensured that all updates can be performed atomically as to reduce the odds the filesystem becomes corrupt. I went for a transaction instead of journal based system since a) it makes it easier to ensure consistency (IMO) and b) there is constant copying between host and device anyways, but at least transaction based systems should need less copies as you don't need to write twice.
All data is grouped into power-of-two sized "records". Each record points to a variable amount of blocks. By splitting data into records it is possible to compress and decompress only parts of large files instead of the entire file.
While all modern disks have ECC to protect against bitrot a hash has also been added to records anyways. This should help with catching errors that may have occurred during transmission which may occur over a poor link. While the hash only provides detection capabilities it at least let the user be aware of issues immediately instead of letting corruption happen silently.
I use hashmaps for directories instead of e.g. B-trees since the object layer of NRFS acts as a sort of MMU, i.e. it exposes each object as a contiguous space. Since hashmaps are much simpler to implement and the object layer makes it practical I went with that.
Specification & Future work
I made a specification of NRFS, which is 430 lines long. I first wrote the specification and then made an implementation based on this specification, so it should be complete as of writing.
While the storage format is mostly finished, the implementation is admittedly not great: it doesn't support asynchronous requests, writes out records redundantly and doesn't handle errors such as a hash mismatch gracefully. I will address this once I figure a good design for other planned features (notably, mirroring and error correction).
Feedback is welcome!
I finished the first version of my custom filesystem. You can find it here. Current features include:
- Error detection (not correction) with a fast 64-bit hash.
- Transparent compression with LZ4 (can be extended with other compression algorithms)
- Transactional updates, i.e. the filesystem will never be in an inconsistent state on disk even if power is cut in the middle of a transaction.
- Sparse files.
- Up to 2^32 entries per directory, indexed with a hashmap and a cryptographic hash.
- File names up to 255 bytes long.
- Extensions that can be enabled per directory. Current extensions include "unix", which adds UID, GID and permissions to each entry, and "mtime", which adds a 64-bit modification time in milliseconds to each entry.
- Small files (up to 2^16) can be embedded directly inside directories to reduce space usage and improve locality.
Why yet another filesystem?
Personally, I'm a fan of filesystems that support transparent compression since you can save a huge amount of space with it. On my Linux desktop I use ZFS which currently reports a x1.96 compressratio, i.e. the total size of all data is halved, which doubles the effective capacity of my main disk.
However, ZFS is very complex and porting it to my own OS is likely very difficult. i've looked at other filesystems with compression such as BTRFS but I'm unable to find any that isn't either very complex or a read-only FS. Hence I decided to roll my own.
Design (briefly)
I ensured that all updates can be performed atomically as to reduce the odds the filesystem becomes corrupt. I went for a transaction instead of journal based system since a) it makes it easier to ensure consistency (IMO) and b) there is constant copying between host and device anyways, but at least transaction based systems should need less copies as you don't need to write twice.
All data is grouped into power-of-two sized "records". Each record points to a variable amount of blocks. By splitting data into records it is possible to compress and decompress only parts of large files instead of the entire file.
While all modern disks have ECC to protect against bitrot a hash has also been added to records anyways. This should help with catching errors that may have occurred during transmission which may occur over a poor link. While the hash only provides detection capabilities it at least let the user be aware of issues immediately instead of letting corruption happen silently.
I use hashmaps for directories instead of e.g. B-trees since the object layer of NRFS acts as a sort of MMU, i.e. it exposes each object as a contiguous space. Since hashmaps are much simpler to implement and the object layer makes it practical I went with that.
Specification & Future work
I made a specification of NRFS, which is 430 lines long. I first wrote the specification and then made an implementation based on this specification, so it should be complete as of writing.
While the storage format is mostly finished, the implementation is admittedly not great: it doesn't support asynchronous requests, writes out records redundantly and doesn't handle errors such as a hash mismatch gracefully. I will address this once I figure a good design for other planned features (notably, mirroring and error correction).
Feedback is welcome!