OSDev.org

Posted: **Tue Nov 28, 2023 1:03 pm**

I've been considering how to implement UNIX like signals and am wondering what other people's thoughts on this subject are. Essentially here's my current thinking of how to implement them:

1. each process will have a queue of pending signals. When the kernel wants to signal a process, it simply adds to that queue
2. when the scheduler interrupts a thread, it checks if the process has a pending signal, (skipping this thread if it is masked for signals) if so, pops it off the queue
3. it would then effectively simulate a call to the signal handler by pushing the current IP onto the user stack and setting the IP to the common signal handler code
4. the common signal handler code would push the current state things, similar to an interrupt and then call the signal handler
5. when the signal handler returns, the common handler would restore the state
6. finally, the ret of the common handler would effectively jump to the original IP

The only tricky bit is when signaling a process with all blocked threads, if so I guess I need to pick a thread, wake it up and somehow mark it as being interrupted so the caller knows that it should retry the system call (or whatever they deem appropriate).

Does this sound sane? Am I missing anything notable?

Thanks

Posted: **Tue Nov 28, 2023 6:00 pm**

Disclaimer: I'm not an expert OS developer, and I'm not overly familiar with UNIX signals either; so my opinion may be horrible

I was thinking that instead of restoring state and returning to the original IP, common signal handlers (as you call them) could simply use a designated system call at the end. Kernel would restore the original state and IP instead. Using this approach, there's no reason to save state, since the kernel already has it saved. However, it does add a system call; so I'm not sure if it would actually be more efficient.

The reason I mention it though, this implementation feels a bit cleaner to me. Since the userspace no longer needs to save the state; kernel no longer needs to restore it before calling the signal handler. Unused registers would still need to be zeroed to prevent leaking data, but some of those registers could be used for sending parameters. You can also easily have the signal handlers use their own stack, just change the SP to a predetermined location (which may possibly be changed by the program, or disabled entirely). I imagine you can do this the other way too, but it feels way easier in this implementation.

Though, there may be stuff I'm missing. I haven't implemented this in actual code; so it's likely certain problems are lurking in this algorithm.

Posted: **Wed Nov 29, 2023 1:55 am**

Since it is possible (via 'sigaction') to specify a signal mask that should be put in place when the signal handler is executing, Linux includes a "sigreturn" system call which atomically restores the original signal mask and returns to the interrupted execution point. This is necessary because restoring the caller's signal mask within the handler itself via the normal sigprocmask leads to possibility of stack overflow if signals are being issued faster than they can be handled (i.e. if the handler unmasks signals when it restores the interrupted context's signal mask, another signal can be immediately dispatched with the handler's stack frame still in place, and so on).

Note that this is true even without use of sigaction, since by default a signal should be blocked while it is being handled.

Posted: **Wed Nov 29, 2023 2:30 pm**

proxy wrote:Does this sound sane? Am I missing anything notable?

Yes, a lot. You cannot have a single queue of pending signals, as POSIX requires that each classic (as opposed to real-time) signal is queued at most once. Also, it is possible to block some signals but not others. It is probably better to handle pending signals on return to the process from the signal handler. You cannot simply invoke a function at whatever point was interrupted, as that is not an ABI boundary. The signal handler will clobber volatile registers the interrupted function may depend on. POSIX also requires that it be possible to register just a normal function as signal handler.

The actual process of signal handler invocation is arch specific. On x86_64, it would work like this: We start at the top of the altstack if an altstack is used, else 128 bytes below the current stack pointer (this is called the red zone). Align down to a 16 byte boundary, then subtract the size of a signal frame. A signal frame consists of an mcontext_t (which contains all the registers, including FPU and SSE regs), a ucontext_t (which contains some other process state), a siginfo_t and the return pointer. All current registers are saved into the mcontext_t. The return pointer is set to a routine that runs a single system call (sigreturn). This used to be on stack, but stack is not executable these days. I would simply require callers to provide that routine in the sigaction when registering a signal handler. The first argument is set to the signal number, the second argument is set to the address of the siginfo_t, and the third is set to the address of the ucontext_t.

When the function returns, it will run the system call. The system call will restore all registers from the record made on the user stack. Yes, this does mean that if the user changes stuff, the changes will be propagated to the parent context. There are applications that use this fact.

You invoke the signal handler when returning to userspace with a signal pending. This means that if a thread is currently blocked, it can handle a signal by returning either -EINTR or -ERESTART, and then the signal handler invoker can recognize the situation and react accordingly: If the return value is set to -ERESTART, it can reduce the RIP that will be saved by 2 (the size of a syscall instruction) and restore EAX to the syscall number. That way, when sigreturn restores the signals, the syscall will be immediately re-invoked.

Posted: **Sat Dec 23, 2023 12:20 am**

@nullpan, I appreciate your thorough reply. Sorry about the delay, now that I'm break for the holiday seasons, I'm back to thinking about this stuff again and have some thoughts:

You cannot have a single queue of pending signals, as POSIX requires that each classic (as opposed to real-time) signal is queued at most once.

Are you referring to the fact that POSIX doesn't allow a signal to be interrupted by another signal of the same type? If so, can you explain why that is incompatible with a queue? If I understand it correctly, let's say two SIGUSR1 and one SIGUSR2 arrive, so the queue would contain: [SIGUSR1, SIGUSR1, SIGUSR2]. At the next opportunity, the scheduler would interrupt an appropriate thread and make run the signal handler associate with SIGUSR1. If we detect the second SIGUSR1 either before or during dispatching the first SIGUSR1, I think we are allowed to simply ignore the second SIGUSR1 since it is allowed to be "merged" with the first one. So now the scheduler is free to interrupt the currently executing SIGUSR1 with a SIGUSR2, which will eventually return to SIGUSR1 and finally return to normal running code. Do I have that right?

I'm not sure I see the benefit of an "altstack" as that would make it more complex (I think) to have signal handlers interrupted by more signal handlers, unless the alt stack is one per signal?

Also, it is possible to block some signals but not others.

Understood, that's what I meant by "skipping this thread if it is masked for signals". I could have been more precise. I didn't mean to imply that the mask would be simple on or off for all signals, but that the thread would be masked for the signal that we're trying to deliver.

You cannot simply invoke a function at whatever point was interrupted, as that is not an ABI boundary. The signal handler will clobber volatile registers the interrupted function may depend on. POSIX also requires that it be possible to register just a normal function as signal handler.

Right. I did say that I would plan to "push the current state things, similar to an interrupt and then call the signal handler". I think that would be sufficient to resume without worrying about clobbering... am I missing something else?

Anyway, you seem to have a very strong understanding of this subject so I'm curious about more of your thoughts.

Posted: **Sat Dec 23, 2023 2:23 am**

proxy wrote:Are you referring to the fact that POSIX doesn't allow a signal to be interrupted by another signal of the same type? If so, can you explain why that is incompatible with a queue? If I understand it correctly, let's say two SIGUSR1 and one SIGUSR2 arrive, so the queue would contain: [SIGUSR1, SIGUSR1, SIGUSR2]. At the next opportunity, the scheduler would interrupt an appropriate thread and make run the signal handler associate with SIGUSR1. If we detect the second SIGUSR1 either before or during dispatching the first SIGUSR1, I think we are allowed to simply ignore the second SIGUSR1 since it is allowed to be "merged" with the first one. So now the scheduler is free to interrupt the currently executing SIGUSR1 with a SIGUSR2, which will eventually return to SIGUSR1 and finally return to normal running code. Do I have that right?

Weird. Now that I come to research it, I cannot find where it says that. Maybe POSIX changed it at some point...

Anyway, what I was alluding to earlier was that you have a queue of [SIGUSR1, SIGUSR2], now someone sends another SIGUSR1. There used to be a sentence in POSIX that meant you should now discard the second SIGUSR1, since it is already pending. Now POSIX says it is implementation-defined (XSI §2.4.1). OK.

Next problem is when you have [SIGUSR1, SIGUSR2] in your queue and SIGUSR1 is blocked in all threads of the destination process. SIGUSR2 must now be able to be a front-runner. You cannot just remove the SIGUSR1 from the queue, because it must still be delivered if the block is eventually removed, but you must deliver SIGUSR2 immediately.

Next problem is that POSIX specifies that if multiple signals between SIGRTMIN and SIGRTMAX are queued, they must be delivered in order of signal number ascending. Multiple calls to sigqueue() with the same signal number are to be delivered in FIFO order, though.

All of this is to say that you will need a signal queue for each signal number and also both in the thread and the process. One queue per process is not enough. One set of queues in the thread is not enough, because the semantics of sending a signal to a process are that the signal is held pending until any thread takes it. The queues for the classic signals can have a max depth of 1, though. Not for the RT signals.

What you were writing above is just implemented with signal masks. By default (i.e. when SA_NODEFER is not specified in the sigaction() call), the signal number itself is part of the signal mask. When a signal handler is invoked, the kernel atomically adds the signals in the mask to the thread's signal mask. When the signal handler returns, restoring the signal mask is part of what sigreturn() does. Indeed, the upper context's signal mask is writable by the signal handler. Musl uses that in its cancel handling. Musl's cancel handler contains roughly:

Code: Select all

__sigaddset(&ctx->uc_sigset, SIGCANCEL);
__syscall(SYS_tkill, self->tid, SIGCANCEL);

This is in the handler for SIGCANCEL, so the signal is masked. This code adds SIGCANCEL to the parent context's mask and sends another one to itself. The signal is blocked in the signal handler so it cannot be taken there, and it is blocked in the parent context as well, so it also cannot be taken there. The idea is that if the parent context happens to also be a signal handler (for another signal) then at some point it will return, and that will restore a signal mask where SIGCANCEL is unblocked, and then the signal will be taken again.

proxy wrote:I'm not sure I see the benefit of an "altstack" as that would make it more complex (I think) to have signal handlers interrupted by more signal handlers, unless the alt stack is one per signal?

The idea behind an altstack is the same as behind the AMD64 IST: Maybe you want to be able to handle a signal like SIGSEGV when the stack pointer is out-of-bounds. The altstack is generally implemented like this: You have one altstack per process. By default it is disabled. When a signal is to be handled by a process, you check that the signal handler was established with SA_ONSTACK, and that there is an altstack that is currently neither disabled nor in use (flags word doesn't contain SS_DISABLE, and stack pointer is not on the altstack. The altstack is given as base and length, so that is easy enough to check). If all of this works out, you use the altstack, else you use the normal delivery mechanism. On x86_64, this means decreasing the stack pointer by the red zone size.

proxy wrote:Right. I did say that I would plan to "push the current state things, similar to an interrupt and then call the signal handler". I think that would be sufficient to resume without worrying about clobbering... am I missing something else?

The thing you missed is that you cannot simply make the signal handler return to where you were in the main program afterward. You have to make it "return" to a shim that invokes a system call that restores everything from the data put on stack.

proxy wrote:Anyway, you seem to have a very strong understanding of this subject so I'm curious about more of your thoughts.

Well, I do read a lot. But honestly, there is not a hell of a lot more to say about signals I haven't said already. One queue per thread/process and signal number. You send a signal to a process by adding it to the right queue. I already went over signal handler invocation, so there are no surprises here. And if the processor faults in user space, you force a signal against the thread (which means that you send the signal, but if it is blocked or ignored, you force it to be unblocked and defaulted. Obviously, you cannot continue the thread normally after a CPU fault hits).

OSDev.org

About signals...

About signals...

Re: About signals...

Re: About signals...

Re: About signals...

Re: About signals...

Re: About signals...