Reading POSIX: Asynchronous fork()
Posted: Sat Mar 09, 2024 3:16 am
For those unaware, unixoid operating systems have a mechanism called signals, which allow running a user-defined function asynchronously in response to events. The existence of this mechanism is a bit of a headache for userspace to deal with (for kernel space this stuff is simple), and it is meant mostly for doing simple things, like setting a variable. Not for doing overly complex things. Signal handlers are restricted in what they can do. Well, I tell a lie, it is actually either the signal handler or the main program that are restricted. You can restrict the main program to doing things that are "async-signal-safe", then the signal handler can do whatever. This is typically not a good idea though, since then the signal handler might be interrupted by a different signal, and then that signal handler does have to be safe.
But still, for most programs, the signal handler is restricted to calling async-signal-safe functions. And there is a list of those functions in POSIX (defined in XSH chapter 2.4.3. The selection of functions on this list has already garnered some ridicule. I can't find it right now, but I once read a blog post in which someone observed that all the functions in there are enough to run a TCP server from the signal handler, if you want to. Not even a bad one; with poll() on the list, you can make it an nginx style HTTP server. Why you'd want to I don't know, but you can.
There are some interesting side effects to this list. For example, both abort() and sigaction() are on the list. Now, abort() is supposed to just raise SIGABRT, but it is not allowed to return at all. Since SIGABRT is not a special uncatchable signal, it might be blocked, caught, or ignored. So abort() has to raise SIGABRT, then, if that didn't kill the process, unblock SIGABRT, reset the signal handler for it to the default and raise SIGABRT again. But of course signal handlers are process-global state, so another thread might establish a SIGABRT handler between abort() resetting it and the raise taking effect. So in the end, both abort() and sigaction() (and signal()) need to take a lock if a handler for SIGABRT is supposed to be changed. But these functions are supposed to be signal safe. So they actually both have to block all signals, then take the lock, do their business, free the lock, and unblock the signals.
But that is not what I wanted to talk about. On the list, there is also fork(). Now fork() has been controversial for decades, and I myself have had a bit of a rocky start with it. But with fork() and _Exit() on the list, there is nothing stopping a signal handler from just calling fork() and then _Exit() in the parent, for example. A signal-triggered backgrounding! The intent was probably to allow signal handlers to spin off some subprocess, but the end result is still what I have just written. But the mere fact that this may be possible has major effects on the entire C system design: Basically, if signals are not blocked, you can never know your PID. The number getpid() returned might be the PID you had until moments ago, but now it's the PID of your parent process, or maybe it has exited and now it's the PID of no process at all, or some completely unrelated process (because PIDs do get re-used).
Practically, this means that you cannot implement raise() as just (I mean, you cannot do this exact thing anyway, since raise() is an ISO-C function and kill() is POSIX, but you get the point). No, instead you have to block signals before the operation to keep from sending the signal to some random process instead of yourself.
What's worse is that there's really no point to this rambling. The Austin group isn't going to change the list because of it. I suspect the usual suspects (like rdos and zaval) will come out of the woodwork to tell me that this is why they prefer a Windows-style approach that avoids both signals and fork(), while the other usual suspects (like Octo and myself) will tell me that yeah, POSIX sucks, but basically it's what we got to deal with. Next version of POSIX will apparently strike fork() from the list of signal-safe functions, but add a new function _Fork(), so the problem remains.
Moral of the story is however that simple ideas can have very complicated consequences.
But still, for most programs, the signal handler is restricted to calling async-signal-safe functions. And there is a list of those functions in POSIX (defined in XSH chapter 2.4.3. The selection of functions on this list has already garnered some ridicule. I can't find it right now, but I once read a blog post in which someone observed that all the functions in there are enough to run a TCP server from the signal handler, if you want to. Not even a bad one; with poll() on the list, you can make it an nginx style HTTP server. Why you'd want to I don't know, but you can.
There are some interesting side effects to this list. For example, both abort() and sigaction() are on the list. Now, abort() is supposed to just raise SIGABRT, but it is not allowed to return at all. Since SIGABRT is not a special uncatchable signal, it might be blocked, caught, or ignored. So abort() has to raise SIGABRT, then, if that didn't kill the process, unblock SIGABRT, reset the signal handler for it to the default and raise SIGABRT again. But of course signal handlers are process-global state, so another thread might establish a SIGABRT handler between abort() resetting it and the raise taking effect. So in the end, both abort() and sigaction() (and signal()) need to take a lock if a handler for SIGABRT is supposed to be changed. But these functions are supposed to be signal safe. So they actually both have to block all signals, then take the lock, do their business, free the lock, and unblock the signals.
But that is not what I wanted to talk about. On the list, there is also fork(). Now fork() has been controversial for decades, and I myself have had a bit of a rocky start with it. But with fork() and _Exit() on the list, there is nothing stopping a signal handler from just calling fork() and then _Exit() in the parent, for example. A signal-triggered backgrounding! The intent was probably to allow signal handlers to spin off some subprocess, but the end result is still what I have just written. But the mere fact that this may be possible has major effects on the entire C system design: Basically, if signals are not blocked, you can never know your PID. The number getpid() returned might be the PID you had until moments ago, but now it's the PID of your parent process, or maybe it has exited and now it's the PID of no process at all, or some completely unrelated process (because PIDs do get re-used).
Practically, this means that you cannot implement raise() as just
Code: Select all
int raise(int sig) {
return kill(getpid(), sig);
}
What's worse is that there's really no point to this rambling. The Austin group isn't going to change the list because of it. I suspect the usual suspects (like rdos and zaval) will come out of the woodwork to tell me that this is why they prefer a Windows-style approach that avoids both signals and fork(), while the other usual suspects (like Octo and myself) will tell me that yeah, POSIX sucks, but basically it's what we got to deal with. Next version of POSIX will apparently strike fork() from the list of signal-safe functions, but add a new function _Fork(), so the problem remains.
Moral of the story is however that simple ideas can have very complicated consequences.