Page 4 of 4

Re: About LeanFS

Posted: Fri Dec 09, 2022 5:59 am
by thewrongchristian
BenLunt wrote:After much consideration and thought, I have taken your advise and removed the option of multiple text encodings. UTF-8 is now specified for all text strings.
And there was much rejoicing :)
May I quote something I saw over on hackaday.org today:
"but it better be better...to falling so in love with your idea, that you lose sight of what it really means to be better."
:-)

Thank you all for your comments. The new specs are up, with an addition. This addition makes all aspects of the volume have a checksum (when extended checksums are used). The Superblock, Inodes, Indirects, Extents, and now the bitmap.
The spec doesn't indicate how the block bitmap checksum is calculated.

It can't use the existing checksum definition, because:

[*] The existing checksum reserves the first uint32_t for the checksum itself, and so doesn't include it in the checksum.
[*] The existing checksum is for a contiguous data structure, and the bitmap is not contiguous.

With that in mind, perhaps it's worth redefining the checksum procedure, so that:

[*] All the passed in data is included in the checksum, not skipping the first field (this skipping would have to be done by the caller if required.)
[*] Pass in a partial checksum, that will be used instead of 0 at the start.

With the above, we can run the checksum against each bitmap in turn, keeping track of the partial checksum on the way, and putting the final checksum in the superblock.

You'd also need this for the data checksum for the per-extent checksums.

Re: About LeanFS

Posted: Fri Dec 09, 2022 11:05 am
by thewrongchristian
thewrongchristian wrote:
You'd also need this for the data checksum for the per-extent checksums.
Actually, I'm not sure this checksumming of extent data would work practically.

Consider, creating a large file. You know how big it is going to be, so you can reserve all the space up front, and the leanfs driver might find and assign a single extent to it.

You now have to checksum that whole file in one go.

Worse, when you come to read it in later, you have to load the entire extent into memory to checksum it, before it can be passed back to the user buffer. You can perhaps stream it to limit the amount of memory in use doing it, but you will still have to load all the data.

Re: About LeanFS

Posted: Fri Dec 09, 2022 3:45 pm
by zaval
dealing with UTF-8 in the UEFI environment is meh. hardly there would be a UEFI driver for a FS, so any OS Loader, wanting to use LeanFS, would have to mess with UTF-8 in the FS. whereas everything else in the UEFI is UTF-16. it's up to the developers of the FS, but I'd better stop at UTF-16. UTF-8 is easy only for basic latin letters, once there something else pops up, it becomes ugly. and endianness... please, every general purpose CPU architectures nowadays is little endian natively (x86, arm, riscv).

Re: About LeanFS

Posted: Fri Dec 09, 2022 4:04 pm
by BenLunt
thewrongchristian wrote:The spec doesn't indicate how the block bitmap checksum is calculated.

It can't use the existing checksum definition, because:

[*] The existing checksum reserves the first uint32_t for the checksum itself, and so doesn't include it in the checksum.
[*] The existing checksum is for a contiguous data structure, and the bitmap is not contiguous.

With that in mind, perhaps it's worth redefining the checksum procedure, so that:

[*] All the passed in data is included in the checksum, not skipping the first field (this skipping would have to be done by the caller if required.)
[*] Pass in a partial checksum, that will be used instead of 0 at the start.

With the above, we can run the checksum against each bitmap in turn, keeping track of the partial checksum on the way, and putting the final checksum in the superblock.
I can clarify it a little, but the checksum is on every dword of the bitmap. In theory, the bitmap is continuous. In reality, it is not. It is up to the implementation on how it would like to do it. One way would be to read in the whole bitmap (all bands) into one continuous buffer and check it. Most likely, the implementation would already expect to load the whole bitmap anyway. Another would be to read each band's bitmap, then doing as you suggest, using a partial checksum routine. Either way, each and every dword of the bitmap must be checked as a whole.
thewrongchristian wrote:You'd also need this for the data checksum for the per-extent checksums.
The extents are continuous, unless you read in a smaller block count (not the whole extent) at a time as you commented. I will also discuss this in my reply to that post.

I can expand on the example checksum code and post to the specs.

Thanks again. It is much appreciated.
Ben

Re: About LeanFS

Posted: Fri Dec 09, 2022 4:14 pm
by BenLunt
thewrongchristian wrote:
thewrongchristian wrote:
You'd also need this for the data checksum for the per-extent checksums.
Actually, I'm not sure this checksumming of extent data would work practically.

Consider, creating a large file. You know how big it is going to be, so you can reserve all the space up front, and the leanfs driver might find and assign a single extent to it.

You now have to checksum that whole file in one go.

Worse, when you come to read it in later, you have to load the entire extent into memory to checksum it, before it can be passed back to the user buffer. You can perhaps stream it to limit the amount of memory in use doing it, but you will still have to load all the data.
I did consider having large extents and having to do as you say. However, I only have two thoughts.

1) You only have to calculate the CRC at close of file time. How often does this happen? (which reminds me, I probably should clarify that as well) (I am retracting this. I forgot to take in to account, file sharing. Sorry, it's been a long day.)
2) I thought of putting a flag in the Inode to indicate whether it uses the Extended Extents or a standard extent. If 'capabilities' indicated so *and* the inode indicated so, then use the extended extents.

However, I came to the conclusion that you don't really need to use the Extended Extents very often. Say, sending a thumb drive to a friend. The implementation could verify the data, then convert the volume to no longer use the Extended Extents. Would take a moment, but would only need to be done once. There are a few other times you would want to verify the data as well.

Anyway, again, I appreciate the comments,
Ben