Fault Tolerance and Data Integrity in a File System
Posted: Tue Oct 10, 2017 6:55 pm
To a large extent, there is a tradeoff between simplicity and what may be called, for lack of a better phrase, "useful features". I'm trying to create a file system that makes use of some useful features (e.g., keeping track of who "owns" a file) while maintaining simplicity.
I've run into a couple issues; I'm going to use FAT to explain the issues, as I assume we're all familiar with FAT.
Fault Tolerance
Let's say I want to create a file named "MyFile" in the root directory, it takes 3 clusters, and clusters x,y, & z are free. I need to:
A) make an entry in the root directory for a file named "MyFile" that starts at cluster x
B1) modify the xth entry in the FAT to point to y
B2) modify the yth entry in the FAT to point to z
B3) modify the zth entry in the FAT to be "End of file"
C) write the contents of the file to the clusters x,y,z
The most intuitive order is what is what is shown above, but this is not fault tolerant; to cite but one example, if the computer loses power after B3 the file system will appear to be valid but "MyFile" will contain garbage.
The first step should be C; if we lose power after that we "lose" our data (it's there on the disk, but we have no idea it's there), but the file system is in a valid state. The data lose can't be considered that big a deal as there is literally no way to prevent data loss in the case of abrupt power failure.
If step A is done next, a power failure will result in "MyFile" showing that it starts at cluster x, but cluster x will be marked as free, and we won't know where clusters y and z are. On top of that, we have no way of knowing there is any problem until an attempt is made to access the file. So A cannot be the 2nd step.
If, after step C, step B is done in reverse (B3, then B2, then B1) any interruption in the middle of or end of step B can be caught, as there will be some or all of a cluster chain that has no file pointing to it. However, this can only be caught with a full scan of the file system -- and there's no way of knowing a scan of the file system is necessary.
It looks as if my options are:
1. Add a journal
2. Have a file system that is not fault tolerant.
My primary design goal is simplicity. While a journal obviously isn't that complicated, it does add a tad more complexity. So,
QUESTION ONE: Is there a way to ensure the integrity of the file system without a journal?
One idea I have to ensure the integrity of the data on the disk was to have a checksum or a hash for every file. At creation time this is no big deal: compute the magic number while writing the file. After the fact it presents problems: Checking the checksum/hash every time a file is read is no problem for a file that's a couple KB, but what about one that's 50GB? Reading a single byte from the file would incur the enormous cost of reading the entire file to ensure it checks out first. Not performing that check would defeat the purpose of having the checksum/hash. The solution I've come up with is to have checksum/hash of every cluster. So,
QUESTION TWO Is there a better way of ensuring the integrity of data read from the disk than having a checksum or hash of every cluster (or group of clusters)?
I hope I've managed to adequately explain myself
I've run into a couple issues; I'm going to use FAT to explain the issues, as I assume we're all familiar with FAT.
Fault Tolerance
Let's say I want to create a file named "MyFile" in the root directory, it takes 3 clusters, and clusters x,y, & z are free. I need to:
A) make an entry in the root directory for a file named "MyFile" that starts at cluster x
B1) modify the xth entry in the FAT to point to y
B2) modify the yth entry in the FAT to point to z
B3) modify the zth entry in the FAT to be "End of file"
C) write the contents of the file to the clusters x,y,z
The most intuitive order is what is what is shown above, but this is not fault tolerant; to cite but one example, if the computer loses power after B3 the file system will appear to be valid but "MyFile" will contain garbage.
The first step should be C; if we lose power after that we "lose" our data (it's there on the disk, but we have no idea it's there), but the file system is in a valid state. The data lose can't be considered that big a deal as there is literally no way to prevent data loss in the case of abrupt power failure.
If step A is done next, a power failure will result in "MyFile" showing that it starts at cluster x, but cluster x will be marked as free, and we won't know where clusters y and z are. On top of that, we have no way of knowing there is any problem until an attempt is made to access the file. So A cannot be the 2nd step.
If, after step C, step B is done in reverse (B3, then B2, then B1) any interruption in the middle of or end of step B can be caught, as there will be some or all of a cluster chain that has no file pointing to it. However, this can only be caught with a full scan of the file system -- and there's no way of knowing a scan of the file system is necessary.
It looks as if my options are:
1. Add a journal
2. Have a file system that is not fault tolerant.
My primary design goal is simplicity. While a journal obviously isn't that complicated, it does add a tad more complexity. So,
QUESTION ONE: Is there a way to ensure the integrity of the file system without a journal?
One idea I have to ensure the integrity of the data on the disk was to have a checksum or a hash for every file. At creation time this is no big deal: compute the magic number while writing the file. After the fact it presents problems: Checking the checksum/hash every time a file is read is no problem for a file that's a couple KB, but what about one that's 50GB? Reading a single byte from the file would incur the enormous cost of reading the entire file to ensure it checks out first. Not performing that check would defeat the purpose of having the checksum/hash. The solution I've come up with is to have checksum/hash of every cluster. So,
QUESTION TWO Is there a better way of ensuring the integrity of data read from the disk than having a checksum or hash of every cluster (or group of clusters)?
I hope I've managed to adequately explain myself