OSDev.org

Posted: **Sat Feb 13, 2016 5:21 am**

I tried googling it a bit but couldn't find anything useful, the man pages for Linux didn't seem to include anything either, though this question isn't specific to Linux.

When doing a normal read from HDD you can give an error code instead of the requested data and the (userland) application can act accordingly. However with mmap() you've already promised the file/HDD contents to be available in RAM, but haven't actually checked if you can read the HDD. So now when the app tries to access the mmap() RAM it will page fault, the OS tries to load the data into RAM and fails, what now?

I can only think of the following options:
1) Kill the process, nothing can be done
2) Kill only the thread causing the #PF and possibly communicate it to the process (if it has threads remaining) - not sure if the complexity is worth it really
3) Generate an exception for the application which will redirect the thread and something like C++ exception handlers can try to recover from the issue

For #3 there's of course then the possibility that the exception handler is not yet loaded to RAM (or has been swapped to HDD) and that would also cause a #PF and the HDD might not be able to read that either, at this point I don't think you can do anything other than kill the thread or process. Of course this could be prevented by not allowing exception handling code to be swapped to disk and that it must be loaded at startup.

Can you think of any other options or are there any possible other options?

Couple of other notes, this issue gets a bit more complex if you want to do the "right" thing, for example writes could still be committed to the RAM side of the file system and as such the system would continue to operate, but would be unable to commit to HDD. It might at this point notify user and allow user to add external storage (USB HDD, Flash, etc) and maybe even have the data survive a reboot, again probably pretty complex.

This issue is also related to swapping in general, but with swapping I think it's a bit more reasonable to kill the process as it's something the OS is doing completely behind the back of the process anyway. With mmap() files the expectation would be for the process to "read" disk thru RAM and so not crash the app, but rather give an error so the app can continue to operate appropriately.

If the kernel doesn't rely on mmap() and similar techniques then this issue should be entirely userland, so no need to make it even more complex and try to fix it in kernel as well =)

Posted: **Sun Feb 14, 2016 3:05 pm**

mmap() should run above your VFS (virtual filesystem) layer, so it doesn't matter what happens with the underlying block device. Whatever happens with the underlying block device is the same as when any other file access is handled by the VFS but fails at the block device layer.

read() of course has the option to return an error code but standard memory access doesn't so when the layer responsible for handling memory-mapped files executes read() on the corresponding file descriptor to fill the memory and gets an error it could e.g. fill the memory with zeros. Alternatively, you could have your mmap() function take a callback to call in the event of an error, passing the error code from read(), although that isn't exactly POSIX standard...

Posted: **Sun Feb 14, 2016 4:00 pm**

you could send SIGSEGV or SIGBUS

Posted: **Sun Feb 14, 2016 7:40 pm**

Hi,

LtG wrote:I can only think of the following options:
1) Kill the process, nothing can be done
2) Kill only the thread causing the #PF and possibly communicate it to the process (if it has threads remaining) - not sure if the complexity is worth it really
3) Generate an exception for the application which will redirect the thread and something like C++ exception handlers can try to recover from the issue

LtG wrote:Can you think of any other options or are there any possible other options?

4) Redundancy. If you get errors reading "copy #1" just switch to "copy #2".

5) Only do mmap for things on very reliable disk drives (and not floppy or scratched CDs; and not devices that can be unplugged easily like USB flash). If a process tries to mmap a file from an unreliable device, then load the entire file into VFS cache and pretend its being mmapped when it isn't (and return an error from "mmap()" if you can't load the entire file for any reason).

6) Don't support mmap.

7) Shift the entire page fault handling mechanism to user space (exo-kernel) and let the library developers figure it out

8 ) For each byte accessed, fork the process 256 times to cover every possibility and run the forks in "lock step". If any one fork crashes (but not all), assume it's because its guess was wrong and silently discard that fork. If any 2 forks end up with the same data in memory (excluding the dodgy mmapped area) assume the dodgy mmapped area had no consequence for those forks and silently discard one. If you run out of memory, "OOM kill" the process. If the process does any output (that may have depended on "guessed" mmap data), return an "mmap has become dodgy" error from whichever function it used for output.

Cheers,

Brendan

Posted: **Wed Feb 17, 2016 6:42 am**

Brendan wrote: 4) Redundancy. If you get errors reading "copy #1" just switch to "copy #2".

Helps, but doesn't solve the problem. If copy#2 fails, use #3, then #4, eventually you need a fall back.

Brendan wrote: 5) Only do mmap for things on very reliable disk drives (and not floppy or scratched CDs; and not devices that can be unplugged easily like USB flash). If a process tries to mmap a file from an unreliable device, then load the entire file into VFS cache and pretend its being mmapped when it isn't (and return an error from "mmap()" if you can't load the entire file for any reason).

I think this is quite similar to above #4, and I suppose it might be good idea to support mmap flags allowing mmap to only return once it's completed instead of being lazy, though might somewhat defeat the purpose of mmap..

Brendan wrote: 6) Don't support mmap.

Avoiding problems is always an option =)

Brendan wrote: 7) Shift the entire page fault handling mechanism to user space (exo-kernel) and let the library developers figure it out

Probably need to think a bit more about this one, though initially I think same/similar out come is reached with the exception/signal approach..

Brendan wrote: 8 ) For each byte accessed, fork the process 256 times to cover every possibility and run the forks in "lock step". If any one fork crashes (but not all), assume it's because its guess was wrong and silently discard that fork. If any 2 forks end up with the same data in memory (excluding the dodgy mmapped area) assume the dodgy mmapped area had no consequence for those forks and silently discard one. If you run out of memory, "OOM kill" the process. If the process does any output (that may have depended on "guessed" mmap data), return an "mmap has become dodgy" error from whichever function it used for output.

Nice outside of the box thinking =). What if we're dealing with 512 byte sectors and most/all of it will be used within relatively short period of time, which I would expect to be the case most often?

I think I'm initially just going to kill the app and later if I have the time I'll just add a possibility of delivering the exception to the app which can then decide best course of action, and if no handler found then kill it.

Also, instead of just killing the exception could be split into few categories:
1) Removable storage was removed; Sane default might be to freeze the app and inform user that required data is missing, please re-insert USB, then unfreeze
2) Data is corrupted, but origin of data is known; Retrieve data from origin again (HDD issue with file, but file is from Internet and still available, re-download); Pretty similar to above
3) Data is corrupted, origin unknown; Freeze app and let user decide what to do, up to the point that if it's really super critical someone might want to fire up a debugger and manually try to fix it or at least attempt to fix it.

I would expect the issues to be relatively rare that the above proposition might be overkill and simply killing the process is easier =)

Posted: **Wed Feb 17, 2016 6:45 am**

onlyonemac wrote:mmap() should run above your VFS (virtual filesystem) layer, so it doesn't matter what happens with the underlying block device. Whatever happens with the underlying block device is the same as when any other file access is handled by the VFS but fails at the block device layer.

read() of course has the option to return an error code but standard memory access doesn't so when the layer responsible for handling memory-mapped files executes read() on the corresponding file descriptor to fill the memory and gets an error it could e.g. fill the memory with zeros. Alternatively, you could have your mmap() function take a callback to call in the event of an error, passing the error code from read(), although that isn't exactly POSIX standard...

I was hoping for some ideas like that, where it could be someway communicated to the app, but I don't think zero's (or magic values) would work, given that you have no idea what the app is going to do with the data, including possibly trying to execute it if the app loader has used mmap to lazy-load the app itself.. But even if it's just data it would still have the same issues..

Posted: **Wed Feb 17, 2016 9:59 am**

From the suggestions above, throwing a SIGBUS would be the most Posix-y way, but I've seen enough issues with filesystem lockups (The local favourites being scratched CDs and NFSs) that I'll not be implementing that API.

unfortunately, mmap() is not an API for a language where you can actually throw IOExceptions.

Posted: **Wed Feb 17, 2016 10:36 am**

LtG wrote:[...] given that you have no idea what the app is going to do with the data, including possibly trying to execute it if the app loader has used mmap to lazy-load the app itself.. [...]

I think "loading" executable code with mmap is a bad idea anyway, especially as it's not that difficult to read an entire file into memory all at once. Although as the operating system developer it's one of your jobs to protect the system against badly-written applications, and I agree that using zeros/magic numbers to signal an error condition is a bad idea in a situation like this where they cannot reliably be distinguished from actual data (if the application even bothers to check them).

Posted: **Wed Feb 17, 2016 10:55 am**

onlyonemac wrote:I think "loading" executable code with mmap is a bad idea anyway, especially as it's not that difficult to read an entire file into memory all at once.

Reading the entire file into memory at once precludes, or makes much more difficult, demand paging which can be a great way of making program execution more efficient and less resource hungry.

Posted: **Thu Feb 18, 2016 5:16 am**

Combuster wrote:unfortunately, mmap() is not an API for a language where you can actually throw IOExceptions.

Not sure I understand, is there one "not" too many or one missing? Can you elaborate a bit? I would think a language that supports exceptions would fit easily, it's languages that don't support exceptions that would be the problem..

Posted: **Thu Feb 18, 2016 5:24 am**

iansjack wrote:
onlyonemac wrote:I think "loading" executable code with mmap is a bad idea anyway, especially as it's not that difficult to read an entire file into memory all at once.
Reading the entire file into memory at once precludes, or makes much more difficult, demand paging which can be a great way of making program execution more efficient and less resource hungry.

Agreed. I would like to know though why it's a bad idea anyway? I assume the "anyway" part implies that there's other reasons than the one discussed in this thread as well?

Besides just performance, there might be a pretty rare case where demand loading would allow the app to work even if some sectors can't be read, if those aren't actually needed for this use time or for this user ever. Granted it's probably a pretty rare occurrence, but besides the issues in this thread I can't think of a good reason to read it all at once.

Another side effect is that another core, if the system is mostly idle, could start to preload it eagerly. App starts as fast as possible, but doesn't actually have to #PF the stuff in, as that's done by another core.

Posted: **Thu Feb 18, 2016 5:35 am**

Application will stop execution momentarily as the next section is read from disk, which could cause problems in a timing-critical section of code
Loading of application cannot be verified before execution begins as no checksum can be calculated (as data is read on an on-demand basis, so a corrupt application file cannot be detected)
User may remove disk while application is executing, leading to failure when next section is required

Posted: **Thu Feb 18, 2016 6:13 am**

onlyonemac wrote:
Application will stop execution momentarily as the next section is read from disk, which could cause problems in a timing-critical section of code
Loading of application cannot be verified before execution begins as no checksum can be calculated (as data is read on an on-demand basis, so a corrupt application file cannot be detected)
User may remove disk while application is executing, leading to failure when next section is required

Good points to keep in mind, some thoughts:
1) If the system uses swapping, this can happen at any later time anyway, so it's never guaranteed that a part of the app is in RAM
2) Who is doing the verification? The app or the system? If the app, like some games do, then it needs to read all of it's own memory, that will trigger #PF's and it can generate hash, no issue. If the system, then clearly this is a system that cares about integrity, most don't seem to. If the system cares about integrity then I don't think it's necessarily a good idea to only check hash of apps, but all files. If hash is checked for all files then you might want to do it in block size (like file system clusters, maybe 4kB, 16kB, etc), and thus the #PF always loads the mmap'd stuff in said block/cluster sizes and calculates the hash.*
3) I think you need to prepare for this in any case, if the app is installed on the removable device, chances are that some of it's data is there as well. The app might at any moment be in the middle of reading said data and if the user removes the device then the app has the same issue now.

*) You could also have a hash per file, but that's problematic. Consider watching a movie, do you really want the whole 2GB - 16GB movie scanned on HDD, hash verified before you can start playback and then, assuming you only have 4 GB RAM and the movie is 16GB you need to read everything again? The solution is to hash at smaller blocks, and then it suits the demand loading just as well. Here I think the smaller the block size the better, but it increases the relative overhead, so I might not want to do it at 512B sector level..

In any case, I think the three points are valid and something to keep in mind while designing everything because I think they can be made into non-issues..

Posted: **Thu Feb 18, 2016 7:18 am**

LtG wrote:1) If the system uses swapping, this can happen at any later time anyway, so it's never guaranteed that a part of the app is in RAM

True, although I believe that some operating systems make it possible to "lock" time-critical parts of the application in RAM.

LtG wrote:2) Who is doing the verification? The app or the system? If the app, like some games do, then it needs to read all of it's own memory, that will trigger #PF's and it can generate hash, no issue. If the system, then clearly this is a system that cares about integrity, most don't seem to. If the system cares about integrity then I don't think it's necessarily a good idea to only check hash of apps, but all files. If hash is checked for all files then you might want to do it in block size (like file system clusters, maybe 4kB, 16kB, etc), and thus the #PF always loads the mmap'd stuff in said block/cluster sizes and calculates the hash.*

It didn't occur to me that verification of this form is not common, but if it is done then memory-mapping the application file and then computing the checksum seems a bit of a "sideways" way of doing it; it would be better to just read the application file into memory and then compute the checksum. But I guess that that's ultimately a matter of personal choice, and not much of an issue anyway considering that this isn't common. (I wasn't really thinking of verifying all files, but if that was done then yes it should be done at a filesystem level where the filesystem can return an error if the checksum fails, although that wouldn't catch against malicious altering of the data on disk as the filesystem checksum could be updated accordingly, whereas I was thinking of a verification with another trusted source.)

LtG wrote:3) I think you need to prepare for this in any case, if the app is installed on the removable device, chances are that some of it's data is there as well. The app might at any moment be in the middle of reading said data and if the user removes the device then the app has the same issue now.

It's the application's responsibility to make sure that it loads all critical data at the time that it is started, or is able to work without that data if it is unavailable at a later time; an application can't be expected to account for removal of the storage device at any time during its execution, otherwise we're pretty much back to the "memory-map the file and then force the whole thing to load via page-faults so that it's guaranteed to be available" situation where reading the file into memory the proper way would be better.

Posted: **Thu Feb 18, 2016 7:55 am**

onlyonemac wrote:
LtG wrote:1) If the system uses swapping, this can happen at any later time anyway, so it's never guaranteed that a part of the app is in RAM
True, although I believe that some operating systems make it possible to "lock" time-critical parts of the application in RAM.

Would this not be the same? If there's a special time-critical section that must be locked in RAM, then it can make the same requirements with lazy loading too. Also I would think this is also needed for security stuff too, like crypto keys, you don't want them ending up on the HDD. Here it might also be useful to be able to hint at the reason why RAM locked memory is needed, if it's for performance or security/criticality. The latter being a "promise" and causing the process to prefer dying to being swapped to HDD.

onlyonemac wrote:
LtG wrote:2) Who is doing the verification? The app or the system? If the app, like some games do, then it needs to read all of it's own memory, that will trigger #PF's and it can generate hash, no issue. If the system, then clearly this is a system that cares about integrity, most don't seem to. If the system cares about integrity then I don't think it's necessarily a good idea to only check hash of apps, but all files. If hash is checked for all files then you might want to do it in block size (like file system clusters, maybe 4kB, 16kB, etc), and thus the #PF always loads the mmap'd stuff in said block/cluster sizes and calculates the hash.*
It didn't occur to me that verification of this form is not common, but if it is done then memory-mapping the application file and then computing the checksum seems a bit of a "sideways" way of doing it; it would be better to just read the application file into memory and then compute the checksum. But I guess that that's ultimately a matter of personal choice, and not much of an issue anyway considering that this isn't common. (I wasn't really thinking of verifying all files, but if that was done then yes it should be done at a filesystem level where the filesystem can return an error if the checksum fails, although that wouldn't catch against malicious altering of the data on disk as the filesystem checksum could be updated accordingly, whereas I was thinking of a verification with another trusted source.)

I'm actually not 100% sure how common it is, but I think it's mainly/only done on the FS level, as I think it should. Apps are more or less useless if their data is corrupt. Also, if the FS does it, then the lazy-loading and #PF doesn't need to care about it, so I don't think it's sideways. Assuming you do it for all files, then referring to my previous point of large files, would you really want to have checksums/hashes on a file level or a block (what ever size it may be) level?

I might not be understanding what you mean by checking the checksum of the app..? You now mentioned "another source", which leads me to believe you might mean some type of signature..? There's at least two separate reasons for doing integrity verifications, checking that it hasn't become corrupt on the HDD and checking that it hasn't been tampered with, where arguably restricting it only to apps might make sense.

For the "another source" you mentioned, what source? If the OS can protect the checksum in some special "installed apps" file, why can't the OS protect it in the FS itself? And if it can, then I think it makes most sense to check all files for corruption (caused by anything) and it makes sense to check it at some block size, due to overhead of checking large files.

onlyonemac wrote:
LtG wrote:3) I think you need to prepare for this in any case, if the app is installed on the removable device, chances are that some of it's data is there as well. The app might at any moment be in the middle of reading said data and if the user removes the device then the app has the same issue now.
It's the application's responsibility to make sure that it loads all critical data at the time that it is started, or is able to work without that data if it is unavailable at a later time; an application can't be expected to account for removal of the storage device at any time during its execution, otherwise we're pretty much back to the "memory-map the file and then force the whole thing to load via page-faults so that it's guaranteed to be available" situation where reading the file into memory the proper way would be better.

If the file will be needed fully sequentially for example, it would be useful to hint it to the OS, so it can go and load all of it in preparation, but still I would want another core to do it, if available. If not available, then it might make sense if the hint is reliable to do it in one large load to avoid switching between app data (or code) usage and loading of said data/code. For normal apps though, the app usually shows some dialog and won't need all of its own code until the user does something, so I'd much prefer the app shown instantly and before I type in something and click some buttons which might take a few seconds to a few minutes until the app actually needs the rest of its own code, this would happen in the back ground. So overall, you should get equal or better responsiveness with little to no extra overhead. Best case scenario the app starts instantly and another core starts loading data/code to RAM while you can use the app normally.

Regarding the checksums and lazy-loading:
- FS is responsible for doing checksums, these happen every 4kB
- Paging just does paging, doesn't care about mmap, FS or checksums
- Mmap reserves virtual memory for the entire file, loads the first 4kB page from FS (checksum done by FS) and marks rest of the pages as NP (or anyway to cause #PF when accessed)
- App loader mmap's app in memory and jumps to start/main()

The first page is present and already checked for integrity, so no issues. From now on when ever the app accesses a page that causes a #PF we need to do something. The something depends on what the reason is, if it's due to mmap being lazy we just need to ask the FS for the block we need and place that in RAM and adjust the page table entry and return to the app. Note, it doesn't matter if the mmap'd memory is something the app loader did (the app binary) or mmap's the app itself has requested for data files it needs, all of these are dealt with identically, all will have their contents verified.

The only issue is what to do if the verification fails, if the data can't be read, etc, which is exactly what this topic is about. The easy way out would be to not support mmap, but then all apps will need to do it themselves, need (or should) do it a page at a time (or at least HDD sector), otherwise you end up multiple pieces of the system/app doing their own inefficient buffering.

All in all, I'm pretty convinced that signals/exceptions/callbacks is the only sane way to handle issues, reporting the issue to the app so it can decide how to proceed, as it would have to if it used a plain read() instead of mmap(). The only difficulty (for app devs) is that at that point the app might be "committed" more to what ever it is doing and, since resolving the issue is next to impossible, now it has to "uncommit" itself.

NOTE: Even with read() the app can easily become "committed" and face the same issues. For example the app starts to process a large video file and half way thru there's an unreadable sector on HDD, now it needs to decide what to do even though it's half way thru.

PS. I think Windows still doesn't use checksums/hashes for FS integrity in general, not sure about the ext2/3/4, ZFS on the other hand should have pretty comprehensive has usage for pretty much everything.

OSDev.org

How should you handle mmap issues?

How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?

Re: How should you handle mmap issues?