[Solved] Issue sending IPC messages from an ISR (PS2 mouse)

codyd51 · Post by **codyd51** » Sun Jan 03, 2021 3:38 pm

Hello!

I’m encountering an issue in which my PS/2 mouse driver will move erratically or lock to the top edge of the screen, indicating my PS/2 mouse state machine has fallen out-of-sync with the physical mouse. This happens exclusively if I perform too much computation in the mouse ISR.

The issue initially popped up when I had too many printf()’s along the code path that the mouse ISR executes, but in the name of confidence and reproducibility, I’ve added a fibonacci(20) to my spinlock implementation. The problem reproduces and disappears consistently following whether I include the fibonnacci(20) (or printf’s).

I’ve tested a few different hypotheses that I’ve detailed below, but first I’ll provide some background on the system.

Handling mouse events

The handling of mouse events is split into 3 parts:

1. An ISR handles the mouse interrupt, reads a byte from the PS/2 port, and sends an IPC message to the PS/2 mouse driver
2. The PS/2 mouse driver receives an IPC message containing a byte of data and updates the state machine building a full (3-byte) data packet.
3. Once a data packet has been built, the PS/2 mouse driver sends an IPC message to the window manager, containing the 3 data bytes

Inter-process communication

Inter-process-communication works like so:

1. A process registers itself to the IPC subsystem

Sending a message
1. The data structure for the message is dynamically allocated, then populated
2. The data structure for the message is appended to the receiver’s inbox
3. If the receiver was blocked due to awaiting a message, the receiver is unblocked
4. If the receiver is of a higher priority than the current task, the current task is preempted.

Receiving a message
1. A process states its intent to receive a message via a syscall
2. The process is blocked until a message is received (if its inbox is empty)
3. A message is popped from its inbox (in FIFO) and the message is provided as the syscall’s return value

Modification and iteration of a process’s IPC inbox is protected by a spinlock, as is the kernel heap.

There are exactly two priorities at the moment: the priority for the second-stage PS/2 mouse driver (high), and everything else (low).

Okay! Thanks for bearing with me up to now. I’ve had some ideas as to what might be going on. Here’s what the mouse ISR looks like, for more context. I’ve flattened it into a single function to show control flow:

Code: Select all

void interrupt_handler(registers_t* regs) {
	// Save state
	// …

	if (regs->int_no == PS2_MOUSE_IRQ) {
		uint8_t byte = ps2_read(PS2_DATA);
		// The “__from_core” variant ensures that messages originating from ISR’s are sent from the special 
		// “com.axle.core” sender, instead of whatever service is running outside of the ISR context
		amc_message_t* amc_msg = amc_message_construct__from_core(&byte, 1);
		// Forward the data byte to the “stage 2” mouse driver
		amc_message_send("com.axle.mouse_driver", amc_msg);
	}

	pic_signal_end_of_interrupt(regs->int_no);

	// Restore state
	// …
	// sti
	// iretd
}

ISR is preempted when it shouldn’t be

As you can see, `amc_message_send` is called near the end of the mouse ISR. This function contains code like the following:

Code: Select all

bool amc_message_send(const char* destination_service, amc_message_t* msg) {
	// …

	// Unblock the receiver if it was waiting for a message
	if (dest->task->blocked_info.status == AMC_AWAIT_MESSAGE) {
		tasking_unblock_task(dest->task, false);

		// Higher priority tasks that were waiting on a message should preempt a lower priority active task
		if (dest->task->priority > get_current_task_priority()) {
			tasking_goto_task(dest->task);
		}
	}

	// …
}

A low priority task may be running when the ISR begins. This would result in the “tasking_goto_task” call, which would preclude the “pic_signal_end_of_interrupt”. I’m still receiving mouse interrupts, so this doesn’t totally explain the behavior I’m seeing, but it seemed worth a try. I made a variant of “amc_message_send” like the following:

Code: Select all

bool amc_message_send__from_isr(const char* destination_service, amc_message_t* msg) {
    return _amc_message_send_int(destination_service, msg, false);
}

bool _amc_message_send_int(const char* destination_service, amc_message_t* msg, bool allow_preemption_for_high_priority_unblock) {
	// …

	// Higher priority tasks that were waiting on a message should preempt a lower priority active task
	if (dest->task->priority > get_current_task_priority()) {
		// But don't jump away if we're coming from an interrupt handler
		if (allow_preemption_for_high_priority_unblock) {
			tasking_goto_task(dest->task);
		}
	}

	// …
}

Mouse ISR is preempted by PIT scheduler

The behavior I’m seeing happens specifically when the mouse ISR spends too much time computing. Could it be that the mouse ISR is conflicting with the PIT ISR?

To test this, I added a “disable scheduler” flag. The mouse ISR disables scheduling when it begins, and re-enables scheduling when it ends. The mouse ISR now looks like the following:

Code: Select all

	// …
	if (regs->int_no == PS2_MOUSE_IRQ) {
		tasking_disable_scheduling();
	
		uint8_t byte = ps2_read(PS2_DATA);
		amc_message_t* amc_msg = amc_message_construct__from_core(&byte, 1);
		amc_message_send__from_isr(“com.axle.mouse_driver", amc_msg);

		tasking_reenable_scheduling();
	}
	// …

This also does not fix the issue.

Does anyone have a clue what the root cause could be here? Thanks very much!

thewrongchristian · Post by **thewrongchristian** » Sun Jan 03, 2021 6:09 pm

codyd51 wrote:Hello!

I’m encountering an issue in which my PS/2 mouse driver will move erratically or lock to the top edge of the screen, indicating my PS/2 mouse state machine has fallen out-of-sync with the physical mouse. This happens exclusively if I perform too much computation in the mouse ISR.
...
Does anyone have a clue what the root cause could be here? Thanks very much!

I think you've answered your own question. If you don't process the interrupt in a timely manner, I assume data can be lost. I'm not certain whether a typical PS/2 controller has much of a FIFO buffer, but it sounds like you're waiting too long between reading from the PS/2 controller and data is being dropped.

Instead, why not make your ISR much lighter weight? You currently have a very heavy weight ISR, which not only reads the next byte, but does an IPC call (with a malloc) and potential task switch all before you even acknowledge the interrupt.

Octocontrabass · Post by **Octocontrabass** » Sun Jan 03, 2021 8:36 pm

codyd51 wrote:Modification and iteration of a process’s IPC inbox is protected by a spinlock, as is the kernel heap.

What if an IRQ happens while one of those resources is locked?

codyd51 wrote:A low priority task may be running when the ISR begins. This would result in the “tasking_goto_task” call, which would preclude the “pic_signal_end_of_interrupt”.

It sounds like you're placing the remaining ISR code at the same priority as the task it interrupted, which is wrong (the remaining ISR code should always be higher priority than any other running task). You can either make your scheduler more complicated to handle this case, or make your ISR simpler so that it will never do anything that might cause it to yield to another task.

I like userspace drivers, so I'd make the ISR simpler and have it only unblock a driver thread. The driver thread would read the byte from the PS/2 controller, send a message if necessary, and yield to the kernel to send the EOI. (This might add complexity to the scheduler anyway, since it would have to boost priority for any thread holding a lock needed by the driver.)

codyd51 wrote:Could it be that the mouse ISR is conflicting with the PIT ISR?

Do you allow the mouse ISR to be preempted? (This is the difference between an interrupt gate and a trap gate.)

codyd51 wrote:Does anyone have a clue what the root cause could be here?

My guess is something to do with an interrupt while a spinlock is held. (I don't think it's hardware timing out, but I can't rule out that possibility.)

thewrongchristian wrote:I'm not certain whether a typical PS/2 controller has much of a FIFO buffer, but it sounds like you're waiting too long between reading from the PS/2 controller and data is being dropped.

The typical PS/2 keyboard controller can buffer one byte from one device. It's up to the attached devices to buffer any additional bytes while the keyboard controller is waiting for you to read the buffered byte. I know the keyboard has an internal FIFO that can overflow if the keyboard controller doesn't read it faster than you type, but I'm not sure about the mouse.

codyd51 · Post by **codyd51** » Mon Jan 04, 2021 2:58 pm

Thank you both for your replies and insight!

I mistakenly assumed the PS/2 controller had a FIFO buffer of questionable length, but https://wiki.osdev.org/%228042%22_PS/2_ ... erspective agrees that the buffer is 1-byte-wide.

I have a few follow-up questions.

Octocontrabass wrote: What if an IRQ happens while one of those resources is locked?

Right now I print some logs if there's spinlock contention. It doesn't tend to happen at the moment, though I understand I'll eventually have to deal with edge cases around this, such as what you alluded to with a high-priority thread needing to boost other threads that have acquired a lock it needs (and/or other exotic scenarios!)

thewrongchristian wrote: You currently have a very heavy weight ISR, which not only reads the next byte, but does an IPC call (with a malloc) and potential task switch all before you even acknowledge the interrupt.

Octocontrabass wrote: You can either make your scheduler more complicated to handle this case, or make your ISR simpler so that it will never do anything that might cause it to yield to another task.

IPC messages are dynamically allocated instances of the following structure:

Code: Select all

typedef struct amc_message {
    const char* source;
    const char* dest; // May be null if the message is globally broadcast
    char data[64];
    int len;
} amc_message_t;

Q1 If I want to avoid dynamic allocation in the ISR, it seems like I'll need a second allocator only used within ISR's that alloc's from a statically allocated pool. The structure may also need to include a flag indicating whether it's a statically allocated instance, to avoid a later call to free(). Does this pass your smell-test?

Octocontrabass wrote: I'd make the ISR simpler and have it only unblock a driver thread. The driver thread would read the byte from the PS/2 controller, send a message if necessary, and yield to the kernel to send the EOI.

Q2 I must be misunderstanding, because it sounds as though the same amount of work is being done before the EOI is sent. If the ISR doesn't send the EOI until the driver thread yields to it, how would this result in EOI's being sent faster than doing all the work in one go from the ISR?

Octocontrabass wrote: Do you allow the mouse ISR to be preempted? (This is the difference between an interrupt gate and a trap gate.)

Currently, it's an interrupt gate (and therefore, to my understanding, interrupts are disabled on entry Q3), though I did experiment with re-enabling interrupts at the start of the handler while debugging this.

From what you've suggested, I will try:

1. Updating the task priorities like so:

Code: Select all

typedef enum task_priority {
    NONE = 0
    DRIVER = 1
    TASK_RUNNING_ISR = 2 // I am new!
} task_priority_t;

2. Updating the task structure like so:

Code: Select all

{
    // ...
    task_priority_t priority;
    uint32_t priority_context; // Optional, meaning is specific to each task_priority_t
}

3. Updating the scheduler from round-robin to "highest priority runs"
4. Updating ISRs to set the interrupted task to the TASK_RUNNING_ISR priority, then downgrade them at the end of the ISR, like so:

Code: Select all

   // …
   if (regs->int_no == PS2_MOUSE_IRQ) {
       // Save the interrupted task's priority
      this_task->priority_context = this_task->priority;
     // Give ourselves the highest priority while servicing the ISR
      this_task->priority = TASK_RUNNING_ISR; 
   
      amc_message_t* amc_msg = amc_message_construct__from_core(&byte, 1);
      amc_message_send__from_isr(“com.axle.mouse_driver", amc_msg);

      // Reset the interrupted task's initial priority
      this_task->priority = this_task->priority_context;
   }
   // …

5. Removing the fib(20) in the ISR

Q4 Does this make sense and fulfill your expectations for a sensible implementation?

Thanks once again for your thought-time

codyd51 · Post by **codyd51** » Mon Jan 04, 2021 5:36 pm

Indeed, I've implemented the above and am now seeing contention in which the mouse ISR runs while the mouse driver's IPC spinlock is held:

Code: Select all

[PID 3 mouse_driver] Mouse ISR enter (int count: 44)
[PID 3 mouse_driver] Mouse ISR reset priority of interrupted task (int count: 44)

Spinlock [AMC spinlock for com.axle.mouse_driver] held by another consumer while interrupts are disabled
Proc [3] holds the spinlock!

Assertion failed: Spinlock held by another consumer while interrupts are disabled
src/kernel/util/spinlock/spinlock.c:41
Stack trace:
[0] _panic (0x0011c1ae)
[1] spinlock_acquire (0x00124567)
[2] _amc_message_send_int (0x001249fe)
[4] mouse_callback (0x00120089)
[8] _amc_message_free (0x0012497f)

In this sample, the mouse driver is executing and is holding its own IPC spinlock when it's interrupted by a mouse ISR. The current process that the ISR has hijacked is the same process that holds the spinlock, so I can't boost a different process to get a chance to acquire the spinlock. It seems as though I actually need a separate execution context to run ISRs in?

thewrongchristian · Post by **thewrongchristian** » Mon Jan 04, 2021 6:59 pm

codyd51 wrote:
In this sample, the mouse driver is executing and is holding its own IPC spinlock when it's interrupted by a mouse ISR. The current process that the ISR has hijacked is the same process that holds the spinlock, so I can't boost a different process to get a chance to acquire the spinlock. It seems as though I actually need a separate execution context to run ISRs in?

I would (and in fact do).

Interrupts run in an interrupt context, with further interrupts disabled, and needs to be executed as soon as possible.

It sounds like your spinlock doesn't block interrupts. You probably should do so, as you can't reliably use spinlocks to protect data in and from an interrupt context otherwise.

The general pattern I use is to protect a driver data structure with a spinlock, and the spinlock is protected by:

- Atomic memory ops (which protects against locking by other processors)
- Disabling interrupts (which protects against interrupts on this processor)

So, in the normal driver thread, the driver data structure is locked using the spinlock. While locked, interrupts are blocked, and interrupts on this processor cannot be run.

While operating in the interrupt context, we also lock the spinlock, which blocks other processors from modifying the data structure (we don't need to block further interrupts, they're already blocked.)

With the data structure locked, we can quickly read whatever we need from the device, and queue it in the driver data structure to process in a regular thread using more flexible synchronization primitives.

In the case of my keyboard driver, I just read the scan code, and put it in a FIFO. The main keyboard driver thread then reads from the FIFO and interprets the scan codes (I don't yet have a mouse driver, but I would do the same.)

Octocontrabass · Post by **Octocontrabass** » Mon Jan 04, 2021 7:32 pm

codyd51 wrote:Q1 If I want to avoid dynamic allocation in the ISR, it seems like I'll need a second allocator only used within ISR's that alloc's from a statically allocated pool. The structure may also need to include a flag indicating whether it's a statically allocated instance, to avoid a later call to free(). Does this pass your smell-test?

What happens if the ISR sends messages faster than the consumer can read them?

codyd51 wrote:Q2 I must be misunderstanding, because it sounds as though the same amount of work is being done before the EOI is sent. If the ISR doesn't send the EOI until the driver thread yields to it, how would this result in EOI's being sent faster than doing all the work in one go from the ISR?

I don't think you need to send EOIs faster.

codyd51 wrote:Q4 Does this make sense and fulfill your expectations for a sensible implementation?

It looks reasonable to me, but I can't promise I'm not overlooking anything.

codyd51 wrote:It seems as though I actually need a separate execution context to run ISRs in?

That's one possibility; you'll then be able to switch to the interrupted thread's context if necessary. There are other possibilities, such as disabling interrupts while the lock is held, or moving most of the code in the ISR to its own thread.

nexos · Post by **nexos** » Mon Jan 04, 2021 8:41 pm

How I plan on doing IRQ handling in my OS when I get to it is when the IRQ is sent, the initial trap handler blocks the particular IRQ that occurred in the PIC and then a thread of maximum priority gets unblocked. EOI is then sent. This thread contains the IRQ handler, and it runs. When it finishes, the interrupt is re enabled in the PIC. Note that this is all theory

.

rdos · Post by **rdos** » Tue Jan 05, 2021 2:50 am

I think IRQs must have special rules for what they can and cannot do. Doing memory allocation is a definite "not allowed" unless you have done your memory allocator lockless, which I think is very hard to do and also an overkill. Another "not allowed" is to participate in activities that could make it blocked, like taking semaphores or using IPC. IRQs must use spinlocks if they require synchronization, and those must run with interrupts disabled. I don't think that printf in a normal implementation is safe to use in an IRQ either. Finally, the scheduler must make sure that IRQs cannot be scheduled. The thread that the IRQ happens to execute in (most often the null-thread) should never be preeempted regardless of its priority.

The function you want here (to wakeup a server thread from an IRQ) can be implemented with some kind of signaling and putting woken-up threads in a temporary list that is handled by the scheduler when all IRQs have finished running. This same temporary list can also be used when threads are woken-up from another core, and then the IPI IRQ to the target core can do this just as it does for a normal IRQ wakeup.

moonchild · Post by **moonchild** » Tue Jan 05, 2021 12:50 pm

rdos wrote:I think IRQs must have special rules for what they can and cannot do.

I think the word you are looking for is ‘reentrant’.

rdos · Post by **rdos** » Tue Jan 05, 2021 3:27 pm

moonchild wrote:
rdos wrote:I think IRQs must have special rules for what they can and cannot do.
I think the word you are looking for is ‘reentrant’.

Not really. Reentrance can be achieved with semaphores, but due to the possibility of getting blocked on a semaphore, that kind of code can not be used in IRQs.

codyd51 · Post by **codyd51** » Tue Jan 05, 2021 7:17 pm

Thank you everyone! I will mark this post as solved.

thewrongchristian wrote: It sounds like your spinlock doesn't block interrupts. You probably should do so, as you can't reliably use spinlocks to protect data in and from an interrupt context otherwise.

Octocontrabass wrote: There are other possibilities, such as disabling interrupts while the lock is held

D'oh, you're correct. I now see that on a single-core system, while an atomic flag protects against two contexts executing a critical region while the lock is held, there are other cases in which interrupts must be disabled too.

Octocontrabass wrote: What happens if the ISR sends messages faster than the consumer can read them?

rdos wrote: I think IRQs must have special rules for what they can and cannot do...

Okay, I see what you're getting at. Instead of sending interrupt data over IPC and messing around with static message pools, I've made a new driver design:
- Mouse and KB interrupt just unblock a high-priority driver process
- At the end of each interrupt handler, before restoring the interrupted context and running iretd, the scheduler checks whether a higher-priority task was just unblocked and runs it
- Mouse and KB drivers:
a. wait to be unblocked
b. read data from the PS2 port, translate it, forward it to the window manager, etc
c. block themselves until the next interrupt arrives
- After a driver re-blocks itself, EOI is sent

Everything is working swimmingly now. Thanks again! I learned a lot in this thread. My one doubt is whether it's "safe"/appropriate to schedule at the end of the interrupt handler. I'm not sure where else I would put the call, aside from waiting until the active task's timeslice has ended.

Octocontrabass · Post by **Octocontrabass** » Tue Jan 05, 2021 7:53 pm

codyd51 wrote:My one doubt is whether it's "safe"/appropriate to schedule at the end of the interrupt handler.

That's an appropriate place to put it, but whether or not it's safe depends on what kind of state you expect to save and restore when scheduling. You already do something similar in your timer interrupt handler to manage time slices, right?

codyd51 · Post by **codyd51** » Wed Jan 06, 2021 4:11 am

Octocontrabass wrote:You already do something similar in your timer interrupt handler to manage time slices, right?

Yes, but had doubts as to whether it was correct. A lot of code in my kernel was originally hacked up from James Molloy's tutorial years ago, so I'm trying to re-evaluate everything piecewise.

For now, though, it's finally stable: I'm unable to cause the OS to fault regardless of uptime, with keystrokes and mouse movements showing up in 1024x768, without dropping events, with everything communicating through IPC

OSDev.org

[Solved] Issue sending IPC messages from an ISR (PS2 mouse)

[Solved] Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

d

Re: d

Re: d

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)

Re: Issue sending IPC messages from an ISR (PS2 mouse)