I'm currently designing (seems like I never get as far as implementing the designs, though) the system call interface of my OS. For reasons I'm thinking call gates. I have an abundance of those, so each system call can be its own call gate. In fact, I should probably reserve a bunch of system call numbers and have all the unimplemented ones return -ENOSYS, since writing a single function that does that is easier than trying to emulate it from the GPF handler. Unless there is just a brilliant way to figure out both the failed opcode and the next instruction boundary of x86 code.
So I thought I'd have the call gates all point to assembly stubs that just continue to the C handler and do a far return, e.g.
Code: Select all
asm_write:
call sys_write
lret
But then I wondered: What if a call can block? Yeah, I can suspend the task in kernel mode. In fact, that is what normally happens. But then a signal might arrive for the task. In fact, signals can only arrive while the task is blocked. If it kill()s itself, I have to manually put a break there, and if another core kills a currently running task, that's what IPIs are for.
So, yes, what happens if a signal arrives and, horror of horrors, is handled? In that case I would like to alter the user stack: Align their rsp to a 16-byte boundary, lower it by 128, put a restore image there (which has to contain the old RSP, RIP, and the volatile registers. The nonvolatile ones are saved by the compiler), and the address of the restorer, then replace the system call's (or interrupt's) return RSP with the new value and the RIP with the handler.
Problem is, all of this requires access to return RIP and return RSP. Writable access, in fact. No matter how I slice it, these are necessary data. On a side note, even though these specific steps are arch-specific, something like this would have to be done on all archs, should I ever go portable.
I could of course make my stubs a bit longer, save all the volatile regs and call the system call with pointers to the reg and return structures (I'd decouple them, since I have both call gates and interrupts, with different return frames). But that would make every syscall arch-specific. Not a nice thought. Also, I'd have to read my arguments out of this structure and hope I'm doing the correct type casts. Sounds error prone.
I could save the regs and create the pointers, but only give them to the syscall function as two additional arguments. That would require knowledge of how many arguments there are in each call, but is theoretically possible. Though at least for some syscalls, that would put me over the edge of 6 arguments, so
I'd have to spill to the stack.
Other ideas?