Page 1 of 1

Is the Linux Kernel Signal mechanism good design?

Posted: Sat Mar 26, 2022 4:53 am
by OSwhatever
For you who don't know about the signal mechanism in Linux, there are several pages on the internet describing it. In short what happens when a signal appears is that execution is interrupted in the ongoing task, kernel reads the stack/frame pointer, jumps to the signal handler and reuses the current running task and use the stack "below" current running task.

My question is if this is good design. The reason this works is that precisely all application must follow the stack/frame setup described in the ABI, otherwise it will not work. I see one problem with this, for example if you want to store temporary data on the stack while unwinding (for example exceptions), then there is the risk that signal handler overwrites that data. It also limits you from stepping outside the ABI as the OS sets the limitations. You cannot assume that the task has the stack for itself.

Re: Is the Linux Kernel Signal mechanism good design?

Posted: Sat Mar 26, 2022 5:54 am
by klange
OSwhatever wrote:For you who don't know about the signal mechanism in Linux, there are several pages on the internet describing it. In short what happens when a signal appears is that execution is interrupted in the ongoing task, kernel reads the stack/frame pointer, jumps to the signal handler and reuses the current running task and use the stack "below" current running task.
Keep in mind that this only happens when a process has registered a signal handler.
OSwhatever wrote:The reason this works is that precisely all application must follow the stack/frame setup described in the ABI, otherwise it will not work.
By registering a signal handler without specifying an alternate stack, the process is consenting to this behavior.
OSwhatever wrote:I see one problem with this, for example if you want to store temporary data on the stack while unwinding (for example exceptions), then there is the risk that signal handler overwrites that data.
This is one of many reasons ABIs like the one for x86-64 specify the existence of a "red zone" below the stack that can be safely used - when a signal handler or other usurping of the userspace stack occurs, it will happen well below the stack pointer, allowing that space to remain untouched and giving userspace processes some leeway in how and when they move the stack pointer.
OSwhatever wrote:It also limits you from stepping outside the ABI as the OS sets the limitations. You cannot assume that the task has the stack for itself.
The OS always gets to set limitations, that's part of its job.

Re: Is the Linux Kernel Signal mechanism good design?

Posted: Sat Mar 26, 2022 10:03 am
by nullplan
OSwhatever wrote:My question is if this is good design.
That is a good question. The limitations are numerous: Tasks can't use the part of their stack below the stack pointer (or, with AMD64, 128 bytes beyond the stack pointer), since at any time a signal frame may appear there. Tasks are also extremely limited in signal handlers and can only execute async-signal-safe code there (or alternatively, only execute async-signal-safe code in the main task, then the signal handler can do whatever). The most significant limitation here is on memory allocation, which is just not possible in such context, but also calling functions such as printf() is off-limits. Also, as I recently found out on the musl mailing list, you must actually take care not to modify errno, which is easy to do even in async-signal-safe code. So you must save errno at the start of the signal handler and restore it at the end.

Also there is the small matter of tasks crashing because you upgraded the CPU. See, the signal frame must contain the FPU registers along with the CPU regs. But Intel has come out with the AVX-512 instruction set, which increases the size of the register file to 2kB. It is entirely possible that a task does not necessarily plan for that much extra space on all subthread stacks. In particular, some versions of glibc set the minimal stack size at 2kB, so now there are programs out there allocating only 2kB for their smallest threads, which was enough prior to AVX-512 but is no longer enough after it. And while it is possible to only enable AVX-512 for processes that need it, I have been warned that doing so (which requires reprogramming a register on each task switch) will lead to performance degradation.

Beyond that, the process of signalling any PID of a process other than one of your own child processes is extremely error prone, and can lead to the delivery of signals to the wrong process. That is because the parent process controls the life time of the PID. If the process exits, and the parent process reaps it before clearing the PID out of whatever storage it is in (which will definitely happen if the parent also exits, and the child is reparented below PID 1), then you get a stale PID. Which can also be assigned to a new process, and now you are signalling the wrong process.

However, in return, you get an extremely general and versatile mechanism to have any process (and indeed the kernel) tell any other process about any kind of event, from "you just accessed the wrong address" to "you are about to run out of CPU time, better do something about that", to "I just published some new results, so maybe go do something with them". It's one mechanism that does all of that, and I cannot think of any other way to accomplish all of those tasks in a structured manner. In particular for the IPC, being able to quickly signal any process from any process scales better than any event pipe mechanism you can ever come up with (although the DBus people have tried).

Small point of order: That's POSIX signal mechanism, not Linux. Linux is merely one of many implementations of POSIX.

Re: Is the Linux Kernel Signal mechanism good design?

Posted: Tue Mar 29, 2022 12:54 am
by rdos
The POSIX signal mechanism also seems to assume that user space threads won't block in kernel, and if they do, a signal must be able to resume them. A very good example of this problem is if you have a blocking keyboard API. If you want to send abort to the process, you first must take the process out of wait for key in kernel. To handle this creates a big mess in the kernel, and so I currently don't support POSIX signals and there is no way to kill a process if it is blocked in kernel.

Re: Is the Linux Kernel Signal mechanism good design?

Posted: Tue Mar 29, 2022 1:03 pm
by vvaltchev
rdos wrote:The POSIX signal mechanism also seems to assume that user space threads won't block in kernel, and if they do, a signal must be able to resume them. A very good example of this problem is if you have a blocking keyboard API. If you want to send abort to the process, you first must take the process out of wait for key in kernel. To handle this creates a big mess in the kernel, and so I currently don't support POSIX signals and there is no way to kill a process if it is blocked in kernel.
My kernel didn't support signals for a very long time for the same reason but, it turned out that solving that problem is not so complex after all. After every blocking operation in the kernel, you'd need to check for pending signals: if there are pending signals, just return either -EINTR along the whole call-chain or something else if you can (e.g. #bytes read until that moment), without resuming the operation. (I use something very similar to the POSIX condition variables inside Tilck to wait for events.)

Therefore, if a signal is sent to a thread blocked in the kernel, it's enough to register the pending signal and wake up that thread: after waking up, thread's code will know if the wait condition was met or it has been woken (interrupted) by a signal. It's actually simpler than it looked like. Handling nested signal handlers and the full POSIX specification for signals is way more complicated than the base logic necessary for supporting the good old "reliable signals" introduced in the early UNIX days.

Of course, there are cases (critical sections) when you don't want your sleeping task to be interruptable. In that case, just add an extra flag to your struct task or extend your task state and check that before deciding to wake up a given task because of a signal.

Re: Is the Linux Kernel Signal mechanism good design?

Posted: Fri Apr 08, 2022 2:27 pm
by rdos
vvaltchev wrote:
rdos wrote:The POSIX signal mechanism also seems to assume that user space threads won't block in kernel, and if they do, a signal must be able to resume them. A very good example of this problem is if you have a blocking keyboard API. If you want to send abort to the process, you first must take the process out of wait for key in kernel. To handle this creates a big mess in the kernel, and so I currently don't support POSIX signals and there is no way to kill a process if it is blocked in kernel.
My kernel didn't support signals for a very long time for the same reason but, it turned out that solving that problem is not so complex after all. After every blocking operation in the kernel, you'd need to check for pending signals: if there are pending signals, just return either -EINTR along the whole call-chain or something else if you can (e.g. #bytes read until that moment), without resuming the operation. (I use something very similar to the POSIX condition variables inside Tilck to wait for events.)

Therefore, if a signal is sent to a thread blocked in the kernel, it's enough to register the pending signal and wake up that thread: after waking up, thread's code will know if the wait condition was met or it has been woken (interrupted) by a signal. It's actually simpler than it looked like. Handling nested signal handlers and the full POSIX specification for signals is way more complicated than the base logic necessary for supporting the good old "reliable signals" introduced in the early UNIX days.

Of course, there are cases (critical sections) when you don't want your sleeping task to be interruptable. In that case, just add an extra flag to your struct task or extend your task state and check that before deciding to wake up a given task because of a signal.
You might be right. With time, I've come to basically only use a single "wait" mechanism in kernel (which I call Signal/WaitForSignal). Timeouts can be viewed as an option, although there are also waits that are independent of "signals". My userspace multiwait object is also built with kernel signals. Even IRQs exclusively use Signal to wake up server threads (Signal can never block so this is safe). So, I suppose it would be possible to add checks for Posix signals to this mechanism.

I also have kernel critical sections, but a thread being blocked on one of those is not interruptable, but this should not be an issue as these are temporary conditions anyway.