8infy wrote:Hi, I recently finished implementing an ahci driver and now starting to work on the VFS and FAT32 as the initial filesystem, and I have a few questions:
1. I already made it so that if a thread is blocked because of a disk read/write request if anyone tries to kill it, it's deferred until that thread gets unblocked.
I now realize that it might not be enough because if a thread tries to e.g write some file I might have to fetch some other unrelated parts of the filesystem
like the file allocation table for FAT32, so essentially one userspace request gets potentially broken down into multiple read/write requests, and during those
I would expect the thread to be invulnerable so that it can complete the requested transaction atomically. So essentially my question is how do other kernels
handle the case where someone tries to kill a thread that's currently executing an important syscall like writing a file that can also yield/get blocked multiple
times during the request? Since all of that is asynchronous the kernel/scheduler must somehow recognize that although the thread is technically dead it's still
kinda inside a "critical section" and must still be scheduled until it's out of the "critical section".
If by kill, do you mean the equivalent of sending a SIGINT or SIGKILL under UNIX?
The standard method of handling that in a UNIX like system would be to check for pending signals just before returning to user mode, along with having multiple sleep states for sleeping processes.
When sleeping waiting for something, a process is considered either interruptible, or uninterruptible.
An interruptible sleep process can be woken from its sleep by a signal, which will be detected and whatever operation the process doing would be short circuited. An example of an interruptible process might be a process reading from a network socket. This is considered a 'slow' operation, as we never know when the data will be available, so the sleep waiting for data is an interruptible sleep. How this interruption is handled can become complex, as you might be part way through some operation, so you have to ensure either you can undo partial operations, or have all the resources you need without further sleeps before starting to change state. This is a similar problem to exception safety in languages like C++.
An uninterruptible sleeping process cannot be woken from its sleep by a signal. It will finish waiting for the resource it is sleeping on. An example of an uninterruptible sleep might be waiting for a disk read I/O to complete (such as in your example above), which is considered a 'fast' operation as we know the I/O device will complete or error within a bounded time. So in your case, the operation will do whatever it does in the filesystem code to completion, then once done, it will check for pending signals before returning from the write system call, and act appropriately.
8infy wrote:
2. I've also been thinking about how to go about implementing the disk cache. So far i'm leaning towards implementing it inside the filesystem, but i'm not completely sure.
Maybe it should be cached on multiple layers, both disk and each filesystem? What do you think is the best way to go about this?
Would really appreciate any information I could get on this one, thanks

In order to provide coherence with demand paged mmap data, almost any modern OS will cache file data at the page level. So in a UNIX like system, with files managed using vnodes in a VFS, pages will be cached using the vnode/offset as the key. Then, both the read/write system calls and the virtual memory subsystem, will reference data using the vnode/offset key, and see the same data, thus ensuring coherence between data mapped into address spaces and data read/written using file handles.
This wasn't always the case, with caching in early UNIX being at the device block level, with early MMAP based systems essentially duplicating the data cached at the device block level in user page mappings, the result being that data written using write system calls would end up in the device block buffer cache, but not in paged memory mappings of the same file.
So, in answer to you question, data caching is best handled in the VFS layer, where it can be managed on a per-vnode basis, using the vnode/offset as a key.
For filesystem meta-data, you also need a buffer mechanism that operates under this file cache layer. It can also, if you prefer, use the same vnode/offset cache as the data cache, which might be useful to avoid duplicating code, but you have to be careful about double mapping file data, so that it doesn't get cached at both the file vnode level and the device vnode level. Block devices also don't always use page sized blocks of data, so for that reason as well it might be worth having a different device block buffer interface distinct from the page cache interface.