How to handle read & write failures in a file system

rdos · Post by **rdos** » Fri Aug 06, 2021 1:13 am

Some sectors not being possible to read (or write) seems like the most complicated issue in a file system, quite on parity with handling removing the disc device. I don't like the idea to clutter all of the filesystem code with tests for correctly reading & writing sector data. There also is no straightforward method to propagate a failure to read the disc to user-mode operations like open, read & write, and besides, I don't want this mess in the file API either.

So, I don't want the disc buffering, metadata decoding, and not even the file chunk reading & writing to need to test for bad sectors. If the disc is attached, and the operation is correct, then it should always return success, regardless of problems in the raw data from the disc drive. My idea is that if a sector is bad (cannot be read), then the disc driver will fire an event with information about operating type, partition, and relative sector. Userspace can record these events and provide alarms if there are too many so the disc drive can be exchanged. The file system server can also handle these events and fix issues in the filesystem to minimize impact. Bad sectors will by default be filled with zeros in the disc cache, but this can be modified by the server.

How do Linux and Windows handle this?

Octocontrabass · Post by **Octocontrabass** » Fri Aug 06, 2021 8:49 am

rdos wrote:If the disc is attached, and the operation is correct, then it should always return success, regardless of problems in the raw data from the disc drive. [...] Bad sectors will by default be filled with zeros in the disc cache, but this can be modified by the server.

How does an application tell the difference between a successful read of a sector full of zeroes and a failed read?

rdos · Post by **rdos** » Fri Aug 06, 2021 12:23 pm

Octocontrabass wrote:
rdos wrote:If the disc is attached, and the operation is correct, then it should always return success, regardless of problems in the raw data from the disc drive. [...] Bad sectors will by default be filled with zeros in the disc cache, but this can be modified by the server.
How does an application tell the difference between a successful read of a sector full of zeroes and a failed read?

I think the main question is what an application should do if it gets an unexpected read error on a file. A lot of code would either fail the entire operation, possibly aborting some critical application. The other alternative is that it will ignore the error and reuse the previous buffer content. There really isn't many other alternatives since it cannot guesswork the contents. By filling the content with zeros you
avoid reusing the previous buffer.

Korona · Post by **Korona** » Fri Aug 06, 2021 1:13 pm

Linux and Windows just pass the error through the file system and pass it to the user. (It gets more complicated when the error happens after the file is closed; then the error should generally be returned from sync() or similar instead.)

Well-designed applications can handle I/O errors (e.g., common databases), others just die when they encounter errors.

rdos · Post by **rdos** » Fri Aug 06, 2021 2:57 pm

Korona wrote:Linux and Windows just pass the error through the file system and pass it to the user. (It gets more complicated when the error happens after the file is closed; then the error should generally be returned from sync() or similar instead.)

That implies the error is in the file data sectors. What if some part of the directory entry that the file reside in is bad? Or if some part of the FAT table? The latter errors cannot reasonably be reported to the application as part of open, read or write.

Korona wrote: Well-designed applications can handle I/O errors (e.g., common databases),

Databases are typically not implemented inside file systems.

Take another example. One sector of an executable file is bad. If somebody tries to load the executable, will the load operation itself hinder it from being run? Or will accessing that particular part cause random garbage to execute?

Octocontrabass · Post by **Octocontrabass** » Fri Aug 06, 2021 3:56 pm

rdos wrote:What if some part of the directory entry that the file reside in is bad? Or if some part of the FAT table? The latter errors cannot reasonably be reported to the application as part of open, read or write.

Why not? The application doesn't care why the operation failed, just that it did.

rdos wrote:Take another example. One sector of an executable file is bad. If somebody tries to load the executable, will the load operation itself hinder it from being run? Or will accessing that particular part cause random garbage to execute?

If you always return success, garbage will be executed.

If you propagate failures up to the requestor, then whatever requested to put that part of the executable into memory will see the failure. If the loader attempts to put the entire executable into memory before running it, the loader will see the error and fail to start the program. If the executable is loaded as needed using demand paging, the page fault handler will see the error and terminate the program.

Korona · Post by **Korona** » Fri Aug 06, 2021 4:37 pm

Why can't meta data errors be reported? Linux and Windows report them just like data I/O errors. Linux returns EIO, Windows returns ERROR_IO_DEVICE (altough Windows also has some more specific error codes).

rdos · Post by **rdos** » Sat Aug 07, 2021 8:14 am

Octocontrabass wrote:
rdos wrote:What if some part of the directory entry that the file reside in is bad? Or if some part of the FAT table? The latter errors cannot reasonably be reported to the application as part of open, read or write.
Why not? The application doesn't care why the operation failed, just that it did.

If the directory entry that contains the file name is bad, then you will get back "file not found". Unless you interpret this as an error and report IO error, but then this will be reported regardless of which file you try to open. If the FAT is corrupt, then the alternative FAT might be used, and then the application shouldn't get an error, but you still have a serious error condition on the disc that essentially will go unreported. Basically, both of these are conditions that are better reported in a disc event interface rather than through normal file IO.

Octocontrabass wrote:
rdos wrote:Take another example. One sector of an executable file is bad. If somebody tries to load the executable, will the load operation itself hinder it from being run? Or will accessing that particular part cause random garbage to execute?
If you always return success, garbage will be executed.

Zeros is not random garbage, and when executed on x86 will typically result in faults.

Octocontrabass wrote: If you propagate failures up to the requestor, then whatever requested to put that part of the executable into memory will see the failure. If the loader attempts to put the entire executable into memory before running it, the loader will see the error and fail to start the program. If the executable is loaded as needed using demand paging, the page fault handler will see the error and terminate the program.

This is very problematic behavior for embedded systems. In this case you don't know why the program terminated, and so you simply try to retstart it in a loop. If the program faults instead, or report disc events to another interface, then the supervisor can decide there is some fatal problem somewhere and report this in a more adequate way rather than getting into a reboot-loop. Putting up error dialogs (which Windows typically does), is even worse and gives potential customers the idea they are dealing with poor software.

rdos · Post by **rdos** » Sat Aug 07, 2021 8:21 am

Korona wrote:Why can't meta data errors be reported? Linux and Windows report them just like data I/O errors. Linux returns EIO, Windows returns ERROR_IO_DEVICE (altough Windows also has some more specific error codes).

I don't think they should be reported to the normal file IO interface simply because this will overload app code with error checks that will always lead to termination anyway. Not only that, the kernel interface becomes over-complicated too by these error checks & propagation, which leads to slower code. If you instead report these directly from the disc device, and distribute them to a specific event interface instead, then the filesystem code doesn't need to bother and gets less complicated & faster too.

thewrongchristian · Post by **thewrongchristian** » Sat Aug 07, 2021 8:26 am

rdos wrote:
Octocontrabass wrote:
rdos wrote:If the disc is attached, and the operation is correct, then it should always return success, regardless of problems in the raw data from the disc drive. [...] Bad sectors will by default be filled with zeros in the disc cache, but this can be modified by the server.
How does an application tell the difference between a successful read of a sector full of zeroes and a failed read?
I think the main question is what an application should do if it gets an unexpected read error on a file. A lot of code would either fail the entire operation, possibly aborting some critical application. The other alternative is that it will ignore the error and reuse the previous buffer content. There really isn't many other alternatives since it cannot guesswork the contents. By filling the content with zeros you
avoid reusing the previous buffer.

The OS should never knowingly pass off known incorrect data as correct.

Critical application? Say your nuclear reactor control program needs to decide what to do next when it gets some dodgy sensor data? Your page full of zeros with no indication of error might just have sent it down the path to a meltdown. At least if your OS returns an error, the application can raise an alert for the operators to manually scram the reactor.

rdos · Post by **rdos** » Sat Aug 07, 2021 9:53 am

thewrongchristian wrote: The OS should never knowingly pass off known incorrect data as correct.

It doesn't do it knowingly. The OS decides to report these errors in another interface, and then does it's best effort to continue to run despite operating with faulty hardware.

thewrongchristian wrote: Critical application? Say your nuclear reactor control program needs to decide what to do next when it gets some dodgy sensor data? Your page full of zeros with no indication of error might just have sent it down the path to a meltdown. At least if your OS returns an error, the application can raise an alert for the operators to manually scram the reactor.

I think a reboot loop is not going to help much in operating the nuclear plant. What actually will happen is that either the program will refuse to start, or it will terminate because of a faulty disc, and then end up in a reboot loop. I don't think reboot loops helps a lot in keeping the nuclear plant safe.

Korona · Post by **Korona** » Sat Aug 07, 2021 10:23 am

In a nuclear plant, you'll have multiple redundant systems anyway, so it's actually preferable if one of them enters a reboot loop instead of behaving in an unexpected way.

Octocontrabass · Post by **Octocontrabass** » Sat Aug 07, 2021 12:55 pm

rdos wrote:If the directory entry that contains the file name is bad, then you will get back "file not found". Unless you interpret this as an error and report IO error, but then this will be reported regardless of which file you try to open. If the FAT is corrupt, then the alternative FAT might be used, and then the application shouldn't get an error, but you still have a serious error condition on the disc that essentially will go unreported. Basically, both of these are conditions that are better reported in a disc event interface rather than through normal file IO.

You know you can have both, right? If there's a read or write error, report it through your disk event interface, and if the error causes file I/O to fail, report it that way too.

This is how Windows and Linux both handle disk errors.

rdos wrote:Zeros is not random garbage, and when executed on x86 will typically result in faults.

But not always. You need it to always result in a fault. (And what if the zeroes are in the program's data instead of executable code?)

rdos · Post by **rdos** » Sat Aug 07, 2021 2:15 pm

An alternative way to handle it is that the event code could examine where the problem is with the help of the filesystem driver, and then it can signal that a certain part of a file's data couldn't be read, and mark this in the user level file cache. Of course, it could also report this in the event data so code logging this will know which parts of the disc have problems, including specific files. This will still avoid to complicate and slow-down mainstream filesystem code with error checking & error propagation. Actually, the error logger could set a flag that there were recent errors, and then will wait for the event code to mark-up problems before it hands over buffers to the application doing file-IO.

Korona · Post by **Korona** » Sun Aug 08, 2021 2:31 am

I don't get the "it slows down fs code" argument. It's just checking for a return value and propagating it, right? That's negligible compared to the cost of the syscall anyway. Plus, you probably need to check for other errors in the fs code anyway (e.g., out of disk space).

OSDev.org

How to handle read & write failures in a file system

How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system

Re: How to handle read & write failures in a file system