Page 1 of 2
Microkernel: How to deal with recursive messages?
Posted: Sun Aug 27, 2017 1:12 pm
by Artlav
I've been contemplating redesigning the filesystem handling in Aprom to use the messaging system (to get away from the broken call-gates/franken-lib-services system i've asked about before -
http://forum.osdev.org/viewtopic.php?f=1&t=32374), and am having troubles coming up with a design that won't be getting stuck.
Here is a problematic scenario.
The VFS service gets woken up, checks the inbox and sees a request to read some data from a file handle of /cfg/run/2/descr.txt
It resolves the handle, finds that the file belongs to a filesystem driver F, sends the read message to it and goes into a wait state.
The filesystem driver F's service gets woken up, checks the inbox and sees a request to read from a file handle on an instance from /dev/rd1.
It computes the necessary offsets, sends a read message to VFS and goes into a wait state.
The VFS service is, however, still waiting for a reply from the FS driver.
How do you resolve recursive messaging like this?
One way i had "success" solving this issue was to make each service spawn a separate thread to handle each message.
This does make things work, but it's also really, really slow.
Thousands of threads are getting spawned and killed all the time.
I contemplated keeping one handle thread idling, but then the same issue arises - how do you tell if it's a case of still-busy thread or a thread that waits for a recursive call?
I suspect that i'm just trying to approach this from a wrong angle, so the wide question is - how are recursive messages typically handled or avoided?
Should i just make thread creation much faster somehow, or are there better designs?
Re: Microkernel: How to deal with recursive messages?
Posted: Sun Aug 27, 2017 3:41 pm
by Brendan
Hi,
Artlav wrote:I've been contemplating redesigning the filesystem handling in Aprom to use the messaging system (to get away from the broken call-gates/franken-lib-services system i've asked about before -
http://forum.osdev.org/viewtopic.php?f=1&t=32374), and am having troubles coming up with a design that won't be getting stuck.
Here is a problematic scenario.
The VFS service gets woken up, checks the inbox and sees a request to read some data from a file handle of /cfg/run/2/descr.txt
It resolves the handle, finds that the file belongs to a filesystem driver F, sends the read message to it and goes into a wait state.
The filesystem driver F's service gets woken up, checks the inbox and sees a request to read from a file handle on an instance from /dev/rd1.
It computes the necessary offsets, sends a read message to VFS and goes into a wait state.
The VFS service is, however, still waiting for a reply from the FS driver.
How do you resolve recursive messaging like this?
I resolve a situation like this with asynchronous messages. The VFS would wait for any message; receive a request from a normal process, create a "request block" (to keep track of it) and figure out which FS is responsible, and send a request to the FS; then the VFS would wait for any message again. Eventually (possibly after handling many other unrelated messages from many other processes and file systems) the VFS would receive a reply for that specific request from the file system, find the corresponding "request block" it was using to keep track of it, and send a reply back to the process.
If you don't have asynchronous messages; then you'd emulate some asynchronicity on top of synchronous messages. In this case it might look something like:
- VFS would receive a request from a normal process, create a "request block" (to keep track of it) and figure out which FS is responsible, send a reply back to the process saying "request accepted" (causing the process' synchronous request to be completed and allowing the process to continue) and send a request to the FS
- FS would receive a request from VFS, create something to keep track of it and send a reply back to the VFS saying "request accepted" (causing the VFS synchronous request to be completed and allowing the VFS to continue), and then FS would process the request (possibly sending a request to storage device driver, or back to VFS, or..).
- Eventually FS would send a request back to VFS saying "I request that you handle this data that you requested earlier", VFS would receive this and send back a reply ("Thanks!") causing the FS's synchronous request to be completed and allowing the FS to continue.
- Then VFS would send a request back to the process saying "I request that you handle this data that you requested earlier" and the process would receive this and send back a reply ("Thanks!") causing the VFS's synchronous request to be completed and allowing the VFS to continue.
Note that (in general), at the lower levels all IO (including file IO) should be asynchronous in some way. For example, when one process does "read()" and that data happens to be in file data caches already (and the read could be almost immediate) you don't want the process to have to wait just because a completely different process is reading data from a completely different file system on a slow storage device - instead you want many requests "in flight" at the same time and many requests happening in parallel. This means VFS, FS and storage device drivers all need to be asynchronous in some way (whether that's via. asynchronous messages, or synchronous messages on top of asynchronous functionality, or direct function calls on top of asynchronous functionality in a monolithic kernel, or...).
Also note that asynchronous IO can (and typically does) extend to processes themselves (e.g. the
POSIX asynchronous I/O interface). In this case higher level synchronous IO (e.g. the old "read()" function in C) may be built on top of asynchronous IO (in that the library function might do an asynchronous read that returns "immediately" and then explicitly block until a reply arrives to emulate a synchronous function that wouldn't have returned until the everything was completed); such that "sychronous IO" only exists in the library and there's no need for other parts of the OS (kernel, VFS, FS, storage device drivers) to support synchronous IO at all.
Cheers,
Brendan
Re: Microkernel: How to deal with recursive messages?
Posted: Sun Aug 27, 2017 3:44 pm
by simeonz
In user-land programming, such problems are sometimes addressed by using work queues and thread pools. Work queues require continuation-style programming or in other words callbacks and external state in dynamically allocated memory. Here I am assuming that whoever uses work queues wants to use them for end-to-end processing. This type of manual external state reloading is not particularly cheaper in itself, but circumvents the costs of thread creation. Since work items essentially yield execution cooperatively (by returning from the work function), this has similar consequences to cooperative multitasking - potentially more precise and hence cheaper scheduling, but with the caveat of deadlocks/stalling due to non-preemption.
Thread pools are in some sense similar, because the threads do not possess task identity. They are only work vessels. However, unlike work queues, the job executed inside such a thread can delay longer, wait on synchronization, and do other things that would block a work queue. The idea here is to reuse the created threads and to keep them around for a while after they have become inactive - a hysteresis of sorts. Work queues can borrow some of the ideas of thread pools, and have their threads grown and be reaped according to the changing system demands, or redistribute the work by work-stealing/requesting/sharing, etc.
I have seen something else done in one particular wacky Python ipc (rpyc). It had the facility to turn waiting threads into work servers. Every waiting thread could receive a wait outcome that asked for work to be executed in its context, thus reusing the thread. The idea was to form a conversational stacked work transfer between the two sides of the ipc channel. However, if the processing forked into multiple threads on either side, various deadlock conditions could arise. It is doable strategy, but needs fine handling of the possible scenarios, including spawning threads if necessary.
All of the above concepts are simplification of the general idea of user-mode scheduling and user-mode threads. Basically, a lot of the costs involved with kernel-mode threads stem from the need for full virtualization of the execution context, maintenance of security information, prioritization, etc. Those facilities are necessary for the level of abstraction that the kernel provides, but are not necessary when the process focuses on one type of task or a set of tasks that have no legitimate reason to compete with each other. There are all kinds of variants of this - the kernel threads may or may not be reused by different user contexts, the user threads may or may not be capable of executing on different available kernel threads, user space preemption or voluntary termination may be chosen, etc.
P.S. I would venture to say that Brendan describes I/O requests implemented with state objects that travel through multiple work queues or worker threads existing throughout the system in different processes. But I could be wrong. (The private state would have to exist separately for each process involved, since this is mircrokernel.)
Re: Microkernel: How to deal with recursive messages?
Posted: Sun Aug 27, 2017 4:54 pm
by Brendan
Hi,
simeonz wrote:P.S. I would venture to say that Brendan describes I/O requests implemented with state objects that travel through multiple work queues or worker threads existing throughout the system in different processes. But I could be wrong. (The private state would have to exist separately for each process involved, since this is mircrokernel.)
For what I described; assume there's a single threaded VFS process, single threaded file system process/es and single threaded storage device driver process/es. Each of these processes has queue/s of pending requests (e.g. VFS might have a request that's waiting for file system, file system might have a request that's waiting for storage device driver, storage device driver might have a request that's waiting for the disk drive hardware).
For asynchronous interfaces (where threads never block until/unless they have nothing to do - no pending requests they can handle), single threaded processes are probably fine for most of it (the bottleneck in most cases is going to be the storage device hardware itself), and the only reason for any of the processes to be multi-threaded is to benefit from multiple CPUs. However, there's typically not much CPU usage involved and not much reason to want multiple CPUs; unless (maybe) if you have some expensive encryption in the storage device driver (for "whole disk encryption") or (maybe) if the VFS struggles to do path lookup fast enough or (maybe) if there's software RAID between FS and storage devices that's doing some relatively expensive form of RAID (e.g. calculating/checking parity for "RAID 5").
For actual data movement, ideally there isn't any (not in the "memcpy()" sense where CPU time is consumed for moving/copying data) - e.g. storage device driver uses bus mastering/DMA to get the data into physical pages, and those physical pages are mapped into VFS file data cache and then mapped into one or more normal processes (as "copy on write" pages) as needed. Of course there will be some cases where CPU time is consumed for copying data around (e.g. small data transfers that are too small to use whole pages); it just hopefully doesn't happen so much that multiple CPUs and the cost of synchronising them would help to improve bandwidth/performance.
Cheers,
Brendan
Re: Microkernel: How to deal with recursive messages?
Posted: Sun Aug 27, 2017 5:14 pm
by Artlav
Brendan wrote:If you don't have asynchronous messages; then you'd emulate some asynchronicity on top of synchronous messages.
Quite to the opposite i don't have synchronous messages, but since the VFS code was deeply written with sync calls in mind i emulate them in there.
Hm, that's actually an interesting idea i haven't considered - basically rewrite the whole VFS to be completely async.
Break everything into basic blocks and have a state machine-containing request elements ticking through actions in response to messages and data availability.
Sounds fun, and is actually long overdue one way or another.
Thanks!
simeonz wrote:Thread pools
That's one of the ideas i contemplated - have N idling processing threads, with each new request going to whichever one is free.
Compared to an async solution above, this way it should only be able to handle limited (by N) recursion, but it would make things work reasonably fast without requiring a massive code rewrite.
That should work for other places, where recursion is not impossible but is rather rare and never deep.
Re: Microkernel: How to deal with recursive messages?
Posted: Sun Aug 27, 2017 5:33 pm
by simeonz
Brendan wrote:For what I described; assume there's a single threaded VFS process, single threaded file system process/es and single threaded storage device driver process/es. Each of these processes has queue/s of pending requests (e.g. VFS might have a request that's waiting for file system, file system might have a request that's waiting for storage device driver, storage device driver might have a request that's waiting for the disk drive hardware).
For asynchronous interfaces (where threads never block until/unless they have nothing to do - no pending requests they can handle), single threaded processes are probably fine for most of it (the bottleneck in most cases is going to be the storage device hardware itself), and the only reason for any of the processes to be multi-threaded is to benefit from multiple CPUs. However, there's typically not much CPU usage involved and not much reason to want multiple CPUs; unless (maybe) if you have some expensive encryption in the storage device driver (for "whole disk encryption") or (maybe) if the VFS struggles to do path lookup fast enough or (maybe) if there's software RAID between FS and storage devices that's doing some relatively expensive form of RAID (e.g. calculating/checking parity for "RAID 5").
For actual data movement, ideally there isn't any (not in the "memcpy()" sense where CPU time is consumed for moving/copying data) - e.g. storage device driver uses bus mastering/DMA to get the data into physical pages, and those physical pages are mapped into VFS file data cache and then mapped into one or more normal processes (as "copy on write" pages) as needed. Of course there will be some cases where CPU time is consumed for copying data around (e.g. small data transfers that are too small to use whole pages); it just hopefully doesn't happen so much that multiple CPUs and the cost of synchronising them would help to improve bandwidth/performance.
I see. Single threaded operation, even one cpu prioritized for driver activity to improve cache locality may be. A single thread will reduce the overhead, while guaranteeing the same bandwidth. There are just two things. This approach may potentially increase the latency if the user requests spike. And second, with the modern tendency for getting performance by scaling horizontally, is using one core future proof enough. Especially with NVMe and such. I understand the main point though - a sensible system should chew on the application code, not the driver cruft which is not CPU limited. The question is, at what ratio and on how many cores.
Re: Microkernel: How to deal with recursive messages?
Posted: Mon Aug 28, 2017 12:36 am
by dozniak
Simplest solution i can think of: do not send messages recursively.
If you need to call exact same function you can just do it directly, because you're already inside that function.
Re: Microkernel: How to deal with recursive messages?
Posted: Mon Aug 28, 2017 4:44 am
by Korona
Asynchronous messages can be part of a solution to this problem but they are not sufficient. Your drivers won't (and shouldn't!) be able to handle arbitrary amounts of concurrent requests. If you allow them to accept arbitrary numbers you open them up to denial of service attacks. OSes typically provide a progress guarantee irregardless of the requests that are issued by unrelated userspace programs. That means that you have to prevent "issue n requests to VFS layer that cause it to call some driver, which in turn calls back to the VFS and blocks" exploits. The only thing that asynchronous messaging (without any extensions like separate message queues for different request types or some priority mechanism) does in this situation is to increase n from 1 to some larger number.
For this reason dozniaks suggestion not to send recursive messages is more valuable than it sounds. In your particular case eliminate the FS driver's need to call the VFS layer. Instead the FS driver can call the device driver directly. The device driver should never need to call the VFS or FS drivers. The call graph is now a tree and there can be no deadlocks.
Re: Microkernel: How to deal with recursive messages?
Posted: Mon Aug 28, 2017 4:47 am
by simeonz
After some consideration, using a single thread for each driver in the stack may incur other penalties. Namely, the interconnect traffic caused by handling requests from a different core. One of the strategies for monolithic kernel work queues is to have one thread per core, which enables the use of local cache transfers for the request parameters themselves. For microkernel design however, these will be complicated, as the threads will not be one per core, but one per core per driver process. This may/will result in suboptimal work queue utilization. Having one thread per process, as Brendan suggests, will help to mitigate this, as multiple requests will be handled during a single time slice. Additionally, the use of one thread in each driver can be used to parallelize the processing, by scheduling the threads on different cores under heavy load. But again, this will increase the interconnect traffic.
How significant a bottleneck the interconnect traffic involved in reading out the request parameters will be to the overall bandwidth and latency, I don't know. It depends on whether these requests arrive in bulk (as in program preloading data), on the read amplification (how much of the read data is actually used), etc. I am still stuck thinking about it, because neither Windows, nor Linux uses the single-threaded approach (to eliminate the synchronization overhead), so it makes me ponder the advantages and disadvantages. The main deterrent in my opinion is future scalability - for the machines that will massively have more than a hundred cores/threads. I am also not sure that the strategy can easily be changed dynamically, because the presence of synchronization may impact the data structure design fundamentally.
Re: Microkernel: How to deal with recursive messages?
Posted: Mon Aug 28, 2017 5:01 am
by simeonz
Korona wrote:Asynchronous messages can be part of a solution to this problem but they are not sufficient. Your drivers won't (and shouldn't!) be able to handle arbitrary amounts of concurrent requests. If you allow them to accept arbitrary numbers you open them up to denial of service attacks. OSes typically provide a progress guarantee irregardless of the requests that are issued by unrelated userspace programs. That means that you have to prevent "issue n requests to VFS layer that cause it to call some driver, which in turn calls back to the VFS and blocks" exploits. The only thing that asynchronous messaging (without any extensions like separate message queues for different request types or some priority mechanism) does in this situation is to increase n from 1 to some larger number.
For this reason dozniaks suggestion not to send recursive messages is more valuable than it sounds. In your particular case eliminate the FS driver's need to call the VFS layer. Instead the FS driver can call the device driver directly. The device driver should never need to call the VFS or FS drivers. The call graph is now a tree and there can be no deadlocks.
What about cases, like loopback devices, or virtualization storage in the form of file images? Also, do you suggest that the drivers have one thread for every user thread, or that each driver in the storage stack is responsible for throttling that particular user's requests at his level according to a QoS/quota setting? If the latter, couldn't the QoS be tracked end-to-end while servicing the request?
Re: Microkernel: How to deal with recursive messages?
Posted: Mon Aug 28, 2017 6:32 am
by Brendan
Hi,
simeonz wrote:Also, do you suggest that the drivers have one thread for every user thread, or that each driver in the storage stack is responsible for throttling that particular user's requests at his level according to a QoS/quota setting? If the latter, couldn't the QoS be tracked end-to-end while servicing the request?
If a group of one or more malicious processes are able to cause "denial of service" by flooding the VFS with requests; then they are probably also able to cause "denial of service" by flooding the network stack, or the GUI, or many other processes/services. You'd want to solve the problem for all cases (and not just for the VFS case); so the solution should probably should be built into the messaging system itself.
My main defence is prioritisation (both thread priorities and IO priorities). The idea is that normal processes aren't able to use higher priorities; so more important things (GUI, system services, drivers, etc) still function under flood conditions because their requests are higher priority and take precedence. A flood from lower priority (normal) processes only starves lower priority/normal processes of IO and/or CPU time.
This is not enough on its own.
For my OS (because of asynchronous messaging) each message costs RAM (the messages sit in message queues in kernel space until the receiver asks to receive them), and if the receiver can't handle the requests fast enough its message queue continues to grow until the kernel runs out of RAM. To guard against this I'm considering "thread penalties", where (if a thread sends a message to a receiver that already has a "too full" message queue) the scheduler is told to reduce the priority of the sending thread for a while, and these penalties would be cumulative; and I'm also considering "priority boosts" (if a thread's message queue is "too full", tell the scheduler to give the thread a priority boost for a while). This is mostly just basic flow control (if receiver can't keep up; drop speed of sender/s and boost speed of receiver).
The other thing I'm considering (because I have an "anyone can send any message to anyone else" model) is some kind of system to report and deal with "spam messages"; where a process can report the sender to the kernel for sending too many unwanted messages (which could include sending far too many otherwise valid/wanted messages); and where kernel would use a weighted point system (e.g. a report from "more trusted" system process counts as more points than a report from some random/unknown process) in conjunction with thresholds (when a process accumulates more than X points send warning to admin if one hasn't been sent recently; when a process accumulates more than Y points send warning to software developers if one hasn't already been sent; and when a process has more than Z points terminate it). Of course this would also involve a "bleed factor" (e.g. every second decrease the number of points a process has accumulated) so that it works more on the rate of spam reports (and so that a process that has been running for 123 years doesn't get terminated for being reported very rarely). However; I still need to do a whole pile of research into this.
Note: What I actually want is a framework for monitoring, tracking and reporting many different indicators of "bad software" (processes that crash for any reason, high priority tasks taking too much CPU time without blocking or receiving messages, tasks with "always increasing" memory consumption that could indicate memory leaks, tasks that the OS's security module flag as having "suspicious activity", etc) where the "spam reports" are just one of many things that feeds data into the "bad software tracking" system.
Cheers,
Brendan
Re: Microkernel: How to deal with recursive messages?
Posted: Wed Aug 30, 2017 8:25 am
by LtG
Brendan wrote:
For asynchronous interfaces (where threads never block until/unless they have nothing to do - no pending requests they can handle), single threaded processes are probably fine for most of it (the bottleneck in most cases is going to be the storage device hardware itself), and the only reason for any of the processes to be multi-threaded is to benefit from multiple CPUs. However, there's typically not much CPU usage involved and not much reason to want multiple CPUs; unless (maybe) if you have some expensive encryption in the storage device driver (for "whole disk encryption") or (maybe) if the VFS struggles to do path lookup fast enough or (maybe) if there's software RAID between FS and storage devices that's doing some relatively expensive form of RAID (e.g. calculating/checking parity for "RAID 5").
Assuming conventional x86 PC, multicore would suggest multiple threads can improve performance.
If you have a single thread then you would either need to check if that thread is sleeping and migrate it from one core to the current core, or alternatively send IPI to the other core and let it handle the work with the associated cache penalties and putting requesting thread and the thread that was running on the "VFS core" to sleep. With NUMA the performance penalty might be even greater, though NUMA isn't that common.
In practice I have no idea how much performance impact there actually is, but intuitively being able to handle the VFS stuff on the core where the requesting thread is should give better performance. Either the data is in cache and can be returned immediately or IO must be performed which shouldn't take long and thus returning to the requesting thread immediately. Returning the IO results might take time, but returning from the VFS shouldn't.
So I guess it would make sense to have at least one VFS thread per core, but given that the general OS design affects this a lot, it's difficult to say one way or the other.
Re: Microkernel: How to deal with recursive messages?
Posted: Wed Aug 30, 2017 8:47 am
by LtG
Brendan wrote:
If a group of one or more malicious processes are able to cause "denial of service" by flooding the VFS with requests; then they are probably also able to cause "denial of service" by flooding the network stack, or the GUI, or many other processes/services. You'd want to solve the problem for all cases (and not just for the VFS case); so the solution should probably should be built into the messaging system itself.
My main defence is prioritisation (both thread priorities and IO priorities). The idea is that normal processes aren't able to use higher priorities; so more important things (GUI, system services, drivers, etc) still function under flood conditions because their requests are higher priority and take precedence. A flood from lower priority (normal) processes only starves lower priority/normal processes of IO and/or CPU time.
Korona suggested that the OS shouldn't allow "arbitrary" numbers of requests, but I'm not really sure what alternatives there are.
If you only allow some relatively low fixed number of requests then isn't it easier to cause DoS? Further, it would have to be dynamically selected number per system, since a 10x system should allow for more requests, how would such an algorithm be devised and how accurate would it be?
I think prioritization is the best solution here, most importantly making sure the user is and remains in control, which means the user can shut down offending processes.
Brendan wrote:
For my OS (because of asynchronous messaging) each message costs RAM (the messages sit in message queues in kernel space until the receiver asks to receive them), and if the receiver can't handle the requests fast enough its message queue continues to grow until the kernel runs out of RAM. To guard against this I'm considering "thread penalties", where (if a thread sends a message to a receiver that already has a "too full" message queue) the scheduler is told to reduce the priority of the sending thread for a while, and these penalties would be cumulative; and I'm also considering "priority boosts" (if a thread's message queue is "too full", tell the scheduler to give the thread a priority boost for a while). This is mostly just basic flow control (if receiver can't keep up; drop speed of sender/s and boost speed of receiver).
Have you considered keeping the message in the senders address space and not in kernel? That way the "cost" is being associated with the problem process and for example a low priority process hogging all RAM can easily be identified and killed. Also helps with accounting, nothing is more annoying than being out of resources (RAM or otherwise) and not being able to tell why.
Brendan wrote:
The other thing I'm considering (because I have an "anyone can send any message to anyone else" model) is some kind of system to report and deal with "spam messages";
I'm still undecided on this, basically I've been thinking of two different options:
- Each process is responsible for their own message filtering. This has the benefit of allowing each process to customize the filtering code for best performance, for example PMM can reject all messages that are not from VMM. Whereas something like an online banking service needs much more complex system.
- Provide filtering in the messaging system. This has the benefit of being able to ensure, to a reasonable degree, that the filtering code itself is flawless, instead of each app rolling their own.
In practice the correct solution is probably something in between, for example a library that has a couple of different filters from simple to complex.
Re: Microkernel: How to deal with recursive messages?
Posted: Wed Aug 30, 2017 10:05 am
by mallard
In my OS (not a microkernel, but does have a native IPC messaging system, used for various purposes including the GUI); I have a couple of simple rules to keep the volume of messages low and prevent "DoS":
1. A userspace process may send only one message to any one recipient process at a time. An attempt to send a second message will block until the first message has been received and acknowledged. This limit does not apply to messages sent by the kernel or modules.
2. Replies to received messages are exempt from the above limit, however, each message may have at most one reply. Replies may themselves be replied to.
I also have a simple message filtering system (most often used to allow processes/threads to a await a reply to a specific message) implemented, however, this is a "ignore messages that don't match this filter in this instance of a wait", not a "block all messages that don't match this filter permanently" system. If an application isn't interested in a message, it just ignores it, worst-case, the sender keeps sending messages (one at a time) that keep getting ignored.
Re: Microkernel: How to deal with recursive messages?
Posted: Wed Aug 30, 2017 2:36 pm
by Korona
LtG wrote:Have you considered keeping the message in the senders address space and not in kernel? That way the "cost" is being associated with the problem process and for example a low priority process hogging all RAM can easily be identified and killed. Also helps with accounting, nothing is more annoying than being out of resources (RAM or otherwise) and not being able to tell why.
Accounting buffers at the sender is not a universal solution. It just reverses the problem: Now, receivers can starve senders. Even if some process is acting as a client it might be sometimes desirable act as a receiver.
The problem is commonly (?) solved by accounting the buffers both at the sender and at the receiver and by enforcing per-user limits (in a UNIX-like system; this might differ among different OS designs).