Reading POSIX: Asynchronous fork()

nullplan · Post by **nullplan** » Sat Mar 09, 2024 3:16 am

For those unaware, unixoid operating systems have a mechanism called signals, which allow running a user-defined function asynchronously in response to events. The existence of this mechanism is a bit of a headache for userspace to deal with (for kernel space this stuff is simple), and it is meant mostly for doing simple things, like setting a variable. Not for doing overly complex things. Signal handlers are restricted in what they can do. Well, I tell a lie, it is actually either the signal handler or the main program that are restricted. You can restrict the main program to doing things that are "async-signal-safe", then the signal handler can do whatever. This is typically not a good idea though, since then the signal handler might be interrupted by a different signal, and then that signal handler does have to be safe.

But still, for most programs, the signal handler is restricted to calling async-signal-safe functions. And there is a list of those functions in POSIX (defined in XSH chapter 2.4.3. The selection of functions on this list has already garnered some ridicule. I can't find it right now, but I once read a blog post in which someone observed that all the functions in there are enough to run a TCP server from the signal handler, if you want to. Not even a bad one; with poll() on the list, you can make it an nginx style HTTP server. Why you'd want to I don't know, but you can.

There are some interesting side effects to this list. For example, both abort() and sigaction() are on the list. Now, abort() is supposed to just raise SIGABRT, but it is not allowed to return at all. Since SIGABRT is not a special uncatchable signal, it might be blocked, caught, or ignored. So abort() has to raise SIGABRT, then, if that didn't kill the process, unblock SIGABRT, reset the signal handler for it to the default and raise SIGABRT again. But of course signal handlers are process-global state, so another thread might establish a SIGABRT handler between abort() resetting it and the raise taking effect. So in the end, both abort() and sigaction() (and signal()) need to take a lock if a handler for SIGABRT is supposed to be changed. But these functions are supposed to be signal safe. So they actually both have to block all signals, then take the lock, do their business, free the lock, and unblock the signals.

But that is not what I wanted to talk about. On the list, there is also fork(). Now fork() has been controversial for decades, and I myself have had a bit of a rocky start with it. But with fork() and _Exit() on the list, there is nothing stopping a signal handler from just calling fork() and then _Exit() in the parent, for example. A signal-triggered backgrounding! The intent was probably to allow signal handlers to spin off some subprocess, but the end result is still what I have just written. But the mere fact that this may be possible has major effects on the entire C system design: Basically, if signals are not blocked, you can never know your PID. The number getpid() returned might be the PID you had until moments ago, but now it's the PID of your parent process, or maybe it has exited and now it's the PID of no process at all, or some completely unrelated process (because PIDs do get re-used).

Practically, this means that you cannot implement raise() as just

Code: Select all

int raise(int sig) {
  return kill(getpid(), sig);
}

(I mean, you cannot do this exact thing anyway, since raise() is an ISO-C function and kill() is POSIX, but you get the point). No, instead you have to block signals before the operation to keep from sending the signal to some random process instead of yourself.

What's worse is that there's really no point to this rambling. The Austin group isn't going to change the list because of it. I suspect the usual suspects (like rdos and zaval) will come out of the woodwork to tell me that this is why they prefer a Windows-style approach that avoids both signals and fork(), while the other usual suspects (like Octo and myself) will tell me that yeah, POSIX sucks, but basically it's what we got to deal with. Next version of POSIX will apparently strike fork() from the list of signal-safe functions, but add a new function _Fork(), so the problem remains.

Moral of the story is however that simple ideas can have very complicated consequences.

thewrongchristian · Post by **thewrongchristian** » Sat Mar 09, 2024 4:32 am

nullplan wrote: Basically, if signals are not blocked, you can never know your PID. The number getpid() returned might be the PID you had until moments ago, but now it's the PID of your parent process, or maybe it has exited and now it's the PID of no process at all, or some completely unrelated process (because PIDs do get re-used).

Why would getpid() return anything but the PID of the process executing getpid()? Why would a process PID change at all?

Remember, after fork, even though both processes return to the same point, they are in fact different processes, and for each process, getpid() will return the same the value (abominations like LinuxThreads notwithstanding) for the lifetime of that process. That the process calling getpid() still exists is self evident, it is executing right now. "I execute, therefore I am", to paraphrase Descartes.

getppid() returns you the PID of the parent process. That can change, because you can be re-parented if your original parent dies, but that is nothing to do with signals anyway.

nullplan · Post by **nullplan** » Sat Mar 09, 2024 11:59 am

thewrongchristian wrote:Why would getpid() return anything but the PID of the process executing getpid()? Why would a process PID change at all?

I just explained that. You call getpid(), the syscall looks up your PID in the kernel structure and returns it. Before you get to look at the value, a signal hits. Signal handler calls fork() and _Exit() in the parent, and returns from the signal handler in the child. Child is now executing with its erstwhile parent's PID as return value from getpid(). And since the parent is asynchronously exiting, that PID will become a free PID that may be assigned to any new process that comes along.

It is a TOCTOU problem, essentially. The value getpid() returned may not be the PID you actually have by the time you get to do anything with it. And all just because signal handlers can call fork() and _Exit().

thewrongchristian · Post by **thewrongchristian** » Mon Mar 11, 2024 4:07 pm

nullplan wrote:
thewrongchristian wrote:Why would getpid() return anything but the PID of the process executing getpid()? Why would a process PID change at all?
I just explained that. You call getpid(), the syscall looks up your PID in the kernel structure and returns it. Before you get to look at the value, a signal hits. Signal handler calls fork() and _Exit() in the parent, and returns from the signal handler in the child. Child is now executing with its erstwhile parent's PID as return value from getpid(). And since the parent is asynchronously exiting, that PID will become a free PID that may be assigned to any new process that comes along.

It is a TOCTOU problem, essentially. The value getpid() returned may not be the PID you actually have by the time you get to do anything with it. And all just because signal handlers can call fork() and _Exit().

Right, gotcha.

Still, anything written like this to get this specific problem deserves everything they get.

Solar · Post by **Solar** » Tue Mar 12, 2024 12:46 pm

<signal.h>, like <errno.h>, was a primitive way to handle primitive things in primitive times. Most importantly, both date back to when there was no memory model for C, and were encoding existing practice in existing operating systems without aspirations of portability. The C standard -- which has to cater for non-POSIX systems as well -- recognized that and left the specification for <signal.h> an empty husk that doesn't have to mesh with the OS' signalling mechanisms at all. The only signals that C requires the library to handle are those raise()d by the application itself...

For the most part, both <signal.h> and <errno.h> have outlived their usefulness (as they really don't mesh well with multiple threads of control), and remain in existence mostly for backward compatibility. Plauger didn't have many positive things to say about <signal.h> as early as 1992, and pointed out that the whole mechanism is basically unportable even among POSIX operating systems...

Plauger, p. 197 wrote:Adding your own signal handler decreases portability and raises the odds that the program will mishandle the signal.

nullplan · Post by **nullplan** » Tue Mar 12, 2024 9:36 pm

Solar wrote:For the most part, both <signal.h> and <errno.h> have outlived their usefulness

With <errno.h>, I presume you just mean the errno object. And yes, that hasn't been a thing (much to Daniel J. Bernstein's chagrin) for decades now, typically in favor of a hack to get a thread-local variable going. But the concept of error numbers, while primitive, is still the only means we have to communicate failure from the OS to the application.

And signals are highly useful for a variety of things. Actually, with the advent of real-time signals and the standardization of things like sigaction(), things have gotten a lot better since 1992. Of course, yes, in multi-threaded applications you probably want a signal-handler thread (so block all the signals you want to happen and have one thread call sigwait() in a loop). And also yes, you probably don't want to do a hell of a lot more than set a flag in the signal handler. But the general event distribution mechanism is still very effective.

rdos · Post by **rdos** » Thu Mar 14, 2024 7:13 am

nullplan wrote:
Solar wrote:For the most part, both <signal.h> and <errno.h> have outlived their usefulness
With <errno.h>, I presume you just mean the errno object. And yes, that hasn't been a thing (much to Daniel J. Bernstein's chagrin) for decades now, typically in favor of a hack to get a thread-local variable going. But the concept of error numbers, while primitive, is still the only means we have to communicate failure from the OS to the application.

Error numbers are highly unportable, not understood by anybody else than the designer, and so are utterly useless. I do not support error numbers, and no RDOS API function will return an error number, or set errno. It's enough to return if a function succeeded or failed. If you want to know why, use logging or the debugger, which can trace into kernel.

nullplan wrote: And signals are highly useful for a variety of things. Actually, with the advent of real-time signals and the standardization of things like sigaction(), things have gotten a lot better since 1992. Of course, yes, in multi-threaded applications you probably want a signal-handler thread (so block all the signals you want to happen and have one thread call sigwait() in a loop). And also yes, you probably don't want to do a hell of a lot more than set a flag in the signal handler. But the general event distribution mechanism is still very effective.

They are not very useful since they were invented before threads. If you want a generic signal function, design it yourself with reasonable multi-wait objects implemented in kernel. In RDOS, this is pretty simple since there is a signal object already that can be signalled and waited for. Then you typically want some thread to wait for signals, and others raising them.

Actually, if I wanted to implement realtime signals, I'd do it by starting a signal thread at load time and then let it handle the signal function. Still, it would be rather wasteful if all applications had one thread just to handle signals, regardless if they were used or not, and so I have not done this. C timers could be handled with the signal thread too.

Solar · Post by **Solar** » Thu Mar 14, 2024 7:49 am

Point in case for errno's being unportable, I qoute from man unlink:

EISDIR pathname refers to a directory. (This is the non-POSIX
value returned since Linux 2.1.132.)

[...]

EPERM The system does not allow unlinking of directories, or
unlinking of directories requires privileges that the
calling process doesn't have. (This is the POSIX
prescribed error return; as noted above, Linux returns
EISDIR for this case.)

eekee · Post by **eekee** » Thu Mar 14, 2024 5:26 pm

Signal numbers are unportable within Linux, being different on different architectures, with some signals being missing on some architectures. See signal(7) . I'm pretty sure this used to be worse years ago. It looks quite cleaned up now, but there's still a handful of signals which are only available on some architectures & one triplet which all have/had the same number on Alpha & SPARC.

Anyway, if no-one's ever used asynchronous poll() and the rest to write a signal-driven asynchronous web server already, I will be Disappointed in the Internet!

It would be an Abomination of Computer Science!

EDIT: I just remembered this quote:
"Unix does not prevent you doing stupid things, because that would also prevent you doing clever things."
How, exactly, fork() could be used to do clever things in a signal handler is not something I can imagine!

nullplan · Post by **nullplan** » Fri Mar 15, 2024 10:57 am

rdos wrote:Error numbers are highly unportable, not understood by anybody else than the designer, and so are utterly useless. I do not support error numbers, and no RDOS API function will return an error number, or set errno. It's enough to return if a function succeeded or failed. If you want to know why, use logging or the debugger, which can trace into kernel.

Wow. Talk about driving out the devil with the Beelzebub. So no error numbers, but no way to figure out what went wrong either. And that's when you wouldn't even have to care about violating any standard, because RDOS does not claim conformity to any standard. Even Windows has error numbers! So did DOS!

And they are useful. For example, I have written an implementation of realpath() using only the readlink() system call, relying on the fact that readlink() on a file that exists but is not a symlink returns EINVAL. On Linux I can make that assumption. If the code ran elsewhere, I would require the implementation to translate the error code into the standard compliant one, which brings me to:

Solar wrote:Point in case for errno's being unportable, I qoute from man unlink: [Linux returns EISDIR from unlink() on a directory, while POSIX specifies EPERM]

You make two mistakes here. For one, you do not distinguish between the unlink() function and the SYS_unlink system call. A conforming implementation can actually translate the not strictly conforming error number.

For two, there is XSH 2.3, which specifies

XSH 2.3 wrote:Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring.

So returning EISDIR from unlink() is perfectly valid.

eekee wrote:Signal numbers are unportable within Linux, being different on different architectures, with some signals being missing on some architectures.

But the API is portable. If a macro for a signal number exists, that macro expands to the number that means exactly that signal on that platform. Yes, the signals being different on some architectures is unfortunate ballast from a bygone era, in which Linux tried to be binary compatible with whatever the per-eminent flavor of UNIX was on that platform at the time Linux was ported. That's why we have architecture dependencies where they really shouldn't be, such as in the error numbers (most architectures define EDEADLOCK and EDEADLK as the same thing, PowerPC begs to differ), in struct termios (most of the time there is a "struct termios2"), ioctl numbers, various structure layouts, and yes, signal numbers.

And now that it is this way, they cannot change it because that would be a breaking change. So we kinda have to live with it.

Edit: For the most part, I think the asynchronous fork issue is really a lot simpler than I thought. You see, fork() must create a new thread, since it creates a new process, and processes are containers for threads. So in most places where my implementation would like to use the TID of the current thread, it would be invalid to call fork() from a signal handler and then return from it, because a new thread would be trying to access resources locked by the old thread. They do not transfer! Only place I think I have to take care is raise(), pthread_kill(), and abort().

rdos · Post by **rdos** » Fri Mar 15, 2024 4:42 pm

nullplan wrote:
rdos wrote:Error numbers are highly unportable, not understood by anybody else than the designer, and so are utterly useless. I do not support error numbers, and no RDOS API function will return an error number, or set errno. It's enough to return if a function succeeded or failed. If you want to know why, use logging or the debugger, which can trace into kernel.
Wow. Talk about driving out the devil with the Beelzebub. So no error numbers, but no way to figure out what went wrong either.

I've seen code that tries to propagate error numbers, and it is ugly. The best approach to error handling is to handle the error where it occurs, and I can see litte use in trying to decode which error occurred in code. Most errors are non-fixable, and might be interesting while you debug the code, but not in production code. The main reason you want the error code is because you cannot debug kernel code that caused the error (since you cannot trace into the Linux or Windows kernel), but that's not an issue with RDOS.

nullplan wrote:And that's when you wouldn't even have to care about violating any standard, because RDOS does not claim conformity to any standard. Even Windows has error numbers! So did DOS!

I support error numbers in the C library, but they are reconstructed in the runtime library. This is often very easy to do since there is typically only one reason for errors, and the calling code wouldn't care which error code is returned anyway.

nullplan wrote:And they are useful. For example, I have written an implementation of realpath() using only the readlink() system call, relying on the fact that readlink() on a file that exists but is not a symlink returns EINVAL. On Linux I can make that assumption. If the code ran elsewhere, I would require the implementation to translate the error code into the standard compliant one, which brings me to:

That's only useful because realpath is too complicated. If you keep syscalls simple, and avoid cluttering them with options and stuff, then failure has a definite meaning. If you want to know if a file is a symlink, you simply add a syscall that tells you if it is or not.

eekee · Post by **eekee** » Fri Mar 15, 2024 5:58 pm

nullplan wrote:
eekee wrote:Signal numbers are unportable within Linux, being different on different architectures, with some signals being missing on some architectures.
But the API is portable. If a macro for a signal number exists, that macro expands to the number that means exactly that signal on that platform. Yes, the signals being different on some architectures is unfortunate ballast from a bygone era, in which Linux tried to be binary compatible with whatever the per-eminent flavor of UNIX was on that platform at the time Linux was ported. That's why we have architecture dependencies where they really shouldn't be, such as in the error numbers (most architectures define EDEADLOCK and EDEADLK as the same thing, PowerPC begs to differ), in struct termios (most of the time there is a "struct termios2"), ioctl numbers, various structure layouts, and yes, signal numbers.

And now that it is this way, they cannot change it because that would be a breaking change. So we kinda have to live with it.

I just had a look at the old man page which introduced me to the issue, and it's not as bad as I remembered. My problem with it was probably due to my autistic thought patterns which were much stronger back in 2001 when I first read that page. Thanks for the explanation on how it got to be this way.

The error debate here reminds me of the surprisingly old question of whether errors should return a number or a string. The one good argument against strings are that they can't necessarily be recognized by programs trying to handle errors. Plan 9 returns error strings, and this became a problem when Go was ported. The lightweight, lightly used kfs returned slightly different error strings to other filesystems. It had to be patched.

Incidentally, I've always found Plan 9 does a good job of propagating error strings to where they need to be. I'm not sure how, but knowing Plan 9, it could be something which "falls out of" the whole system design.

nullplan · Post by **nullplan** » Fri Mar 15, 2024 8:48 pm

rdos wrote:The main reason you want the error code is because you cannot debug kernel code that caused the error

I believe you and I have a very different definition of the word "bug". If the user tries to open a file they have no right to access, the kernel refusing them does not have a "bug", and does not need debugging. Opening a file is a wonderful example of a complex operation that can fail for all sorts of reasons, and those ought to be displayed to the user. You might have run out of memory, or the file is inaccessible, or doesn't actually exist, or one of the directories on the way isn't actually a directory. Or there was an IO error trying to look up the file. Or the system or process file tables were full.

I will agree that most of the time, you don't really want to do anything programmatic with the error code, only display the error to the user, but there are use cases for programmatic error handling that ought not be discounted. Especially when it comes to the file system that by its very nature is shared with all other processes, and quite a lot of communication has to go on for many usage patterns.

rdos wrote:This is often very easy to do since there is typically only one reason for errors,

Basically all system calls have multiple reasons to error.

rdos wrote:That's only useful because realpath is too complicated. If you keep syscalls simple, and avoid cluttering them with options and stuff, then failure has a definite meaning. If you want to know if a file is a symlink, you simply add a syscall that tells you if it is or not.

Oh, there's a syscall that does that, stat(). But adding that call would complicate the code even more. It also adds another TOCTOU race, as the file I call stat() on is not necessarily the file I call readlink() on, if another process concurrently changes something. No, readlink() must necessarily look up the file and verify its type, and I use that ability. I also need it to give me the correct lookup failure so I can propagate it to the caller, so it can see the difference between ENOENT and ENOTDIR. And ELOOP of course I have to give myself.

rdos · Post by **rdos** » Tue Mar 19, 2024 2:52 am

nullplan wrote:
rdos wrote:The main reason you want the error code is because you cannot debug kernel code that caused the error
I believe you and I have a very different definition of the word "bug".

Probably.

nullplan wrote: If the user tries to open a file they have no right to access, the kernel refusing them does not have a "bug", and does not need debugging.

There are no access rights or users in my OS, so this is a non-issue. That's also why I strive at implementing ext or ntfs so I can access any file on these filesystems even when I cannot from Linux or Windows.

nullplan wrote: Opening a file is a wonderful example of a complex operation that can fail for all sorts of reasons, and those ought to be displayed to the user. You might have run out of memory, or the file is inaccessible, or doesn't actually exist, or one of the directories on the way isn't actually a directory. Or there was an IO error trying to look up the file. Or the system or process file tables were full.

All of these are fatal errors that you cannot fix. My focus is on embedded systems, and I definitely don't want to show these kinds of errors to users.

nullplan wrote: Oh, there's a syscall that does that, stat(). But adding that call would complicate the code even more. It also adds another TOCTOU race, as the file I call stat() on is not necessarily the file I call readlink() on, if another process concurrently changes something. No, readlink() must necessarily look up the file and verify its type, and I use that ability. I also need it to give me the correct lookup failure so I can propagate it to the caller, so it can see the difference between ENOENT and ENOTDIR. And ELOOP of course I have to give myself.

Well, in my design I delegate this to calling code, not to realpath. If realpath fails, and if this is important to the caller, it's up to the caller to decide why using stat and similar.

OSDev.org

Reading POSIX: Asynchronous fork()

Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()

Re: Reading POSIX: Asynchronous fork()