Brendan wrote:while not returning from "do_a_task_switch()" into a known state may be a little more complex, it's unecessary and there are (or can be) reasons not to.
Unnecessary to do what? Return to an unknown state?
I'd better define what I mean by unknown state. I agree that, at run time, in the absence of weird subtle bugs like the EOI problem, things will work ok with your scheme. When you switch from thread A to thread B, thread B returns from do_a_task_switch() seemingly in the same state as when it entered it. However, from the kernel developer's point of view, it is now impossible to statically analyze the control flow inside the kernel just by looking at the code. Every time you see a call to do_a_task_switch(), you as the developer reading the code, have no idea what will happen next (in terms of time, not in terms of what will happen on that thread at some point in the future when it gets switched to again).
How about the page fault handler? The page fault itself causes you to enter the kernel, but if the page fault handler needs to load a page of data from disk (e.g. from swap space) then you have to do a thread switch before you return from the page fault handler.
IMO that's an odd way to implement any handler. I think of each handler/system call/interrupt as an event that potentially changes the state of a thread. At a high level, the control flow of an OS is a bunch of related concurrent event-driven state-machines. I'd rather model this in a more controlled manner, rather than relying on the state of some thread's kernel stack to "remember" what was happening when that thread was blocked for whatever reason.
In the page fault example in particular, I'd handle "hard" page faults (those requiring disk access) by putting the thread in a "page-fault-blocked" state and putting it on an appropriate queue. Next I'd set up a message to send to the appropriate file system or disk driver. Then I'd run the scheduler, which would pick the next task to run. The page fault handler would just iret to a new thread, like all the other handlers. Eventually, the in-page I/O request will complete, and the "page-fault-blocked" thread will be awoken. I see no reason to freeze the thread's state within the kernel itself at the moment you decide to block it.
But there are exceptions to this. One example would be a "get a message" system call, where the task blocks if no messages are currently available for the task. In this case "do_a_task_switch" is often called after checking if any messages are available, but before the "important thing" (or getting the message) takes place, and certainly not as the last thing before iret.
That's an example of a case where your scheme magically works because copying a message to the receiving thread's buffer is not really a critical operation the way sending an EOI is (i.e. -- it's something that could happen before iret, or could not, either way it doesn't bring the system down).
In my scheme, the thread makes a "get a message" system call. Immediately its most essential context (i.e. -- excluding FPU state but excluding system call parameter registers) is saved. This means that when its context is eventually restored, it will wake up seemingly at the moment that it made the system call, rather than somewhere in the middle of the kernel. This means that whoever delivers the message to that thread, whenever it happens, can copy the message to the thread's buffer and then switch to that thread on the way out of the kernel. If you were to read the code for the "get message" system call, you could tell statically where the control would go right up until the iret.
What is the advantage of your scheme...? Conversely, what is the disadvantage of my scheme? For that matter, which is more conventional?
At the time I had (unrelated) design problems with level triggered interrupts, and (more recently) interrupt latency problems that caused IRQ8 to be missed on some computers which led to the OS locking up/waiting forever (although this could be attributed to setting the RTC periodic timer's frequency too fast).
Did allowing nested interrupts fix this? If so, how? I briefly forgot that the kernel itself ought to deal with the timer in its own ISRs.
