How should you handle mmap issues?
Posted: Sat Feb 13, 2016 5:21 am
I tried googling it a bit but couldn't find anything useful, the man pages for Linux didn't seem to include anything either, though this question isn't specific to Linux.
When doing a normal read from HDD you can give an error code instead of the requested data and the (userland) application can act accordingly. However with mmap() you've already promised the file/HDD contents to be available in RAM, but haven't actually checked if you can read the HDD. So now when the app tries to access the mmap() RAM it will page fault, the OS tries to load the data into RAM and fails, what now?
I can only think of the following options:
1) Kill the process, nothing can be done
2) Kill only the thread causing the #PF and possibly communicate it to the process (if it has threads remaining) - not sure if the complexity is worth it really
3) Generate an exception for the application which will redirect the thread and something like C++ exception handlers can try to recover from the issue
For #3 there's of course then the possibility that the exception handler is not yet loaded to RAM (or has been swapped to HDD) and that would also cause a #PF and the HDD might not be able to read that either, at this point I don't think you can do anything other than kill the thread or process. Of course this could be prevented by not allowing exception handling code to be swapped to disk and that it must be loaded at startup.
Can you think of any other options or are there any possible other options?
Couple of other notes, this issue gets a bit more complex if you want to do the "right" thing, for example writes could still be committed to the RAM side of the file system and as such the system would continue to operate, but would be unable to commit to HDD. It might at this point notify user and allow user to add external storage (USB HDD, Flash, etc) and maybe even have the data survive a reboot, again probably pretty complex.
This issue is also related to swapping in general, but with swapping I think it's a bit more reasonable to kill the process as it's something the OS is doing completely behind the back of the process anyway. With mmap() files the expectation would be for the process to "read" disk thru RAM and so not crash the app, but rather give an error so the app can continue to operate appropriately.
If the kernel doesn't rely on mmap() and similar techniques then this issue should be entirely userland, so no need to make it even more complex and try to fix it in kernel as well =)
When doing a normal read from HDD you can give an error code instead of the requested data and the (userland) application can act accordingly. However with mmap() you've already promised the file/HDD contents to be available in RAM, but haven't actually checked if you can read the HDD. So now when the app tries to access the mmap() RAM it will page fault, the OS tries to load the data into RAM and fails, what now?
I can only think of the following options:
1) Kill the process, nothing can be done
2) Kill only the thread causing the #PF and possibly communicate it to the process (if it has threads remaining) - not sure if the complexity is worth it really
3) Generate an exception for the application which will redirect the thread and something like C++ exception handlers can try to recover from the issue
For #3 there's of course then the possibility that the exception handler is not yet loaded to RAM (or has been swapped to HDD) and that would also cause a #PF and the HDD might not be able to read that either, at this point I don't think you can do anything other than kill the thread or process. Of course this could be prevented by not allowing exception handling code to be swapped to disk and that it must be loaded at startup.
Can you think of any other options or are there any possible other options?
Couple of other notes, this issue gets a bit more complex if you want to do the "right" thing, for example writes could still be committed to the RAM side of the file system and as such the system would continue to operate, but would be unable to commit to HDD. It might at this point notify user and allow user to add external storage (USB HDD, Flash, etc) and maybe even have the data survive a reboot, again probably pretty complex.
This issue is also related to swapping in general, but with swapping I think it's a bit more reasonable to kill the process as it's something the OS is doing completely behind the back of the process anyway. With mmap() files the expectation would be for the process to "read" disk thru RAM and so not crash the app, but rather give an error so the app can continue to operate appropriately.
If the kernel doesn't rely on mmap() and similar techniques then this issue should be entirely userland, so no need to make it even more complex and try to fix it in kernel as well =)