Multithreading causing debugging problems

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
prasoc
Member
Member
Posts: 30
Joined: Fri Feb 10, 2017 8:25 am

Multithreading causing debugging problems

Post by prasoc »

Hello,
I have recently implemented multithreading in my kernel - there's a thread for the keyboard driver, and one for the graphics driver. It seems that moving to a multithreaded kernel causes you to lose determinism in how your OS behaves, which makes it very difficult to debug.
I've added a few helper functions to "lock" the scheduler (for important, non-interruptable processes) but am having some issues with a few "random" bugs.

Can anyone give me tips on how to debug such a non-deterministic system? How have you all remedied this problem? I think I need an outside perspective on this, kind of banging my head against the wall here!
mallard
Member
Member
Posts: 280
Joined: Tue May 13, 2014 3:02 am
Location: Private, UK

Re: Multithreading causing debugging problems

Post by mallard »

Assuming your thread scheduler works correctly (a fairly big assumption...) writing a multi-threaded kernel is little different to writing any other sort of multithreaded program and the same principles apply.

Firstly, you'll need some thread synchronisation primitives. I've got by with just a simply mutex for kernel use, but you may want more "advanced" primitives like semaphores, events, condition variables, etc.

Every data structure that is used by multiple threads must have some kind of synchronisation. Usually a mutex.

Variables that are shared between threads should probably be declared "volatile"; this is a hint to the compiler that the variable may be changed by another thread and that it shouldn't apply certain optimisations like holding the value in a register between uses.

Try to avoid having code which requires the use of multiple mutexes simultaneously. Where it is necessary (and it will be), think very carefully about what happens if the second/third/n-th mutex is already held by another thread and if it's possible for a deadlock to emerge. There isn't really a catch-all solution to deadlocks, especially not in a kernel, so you'll need to examine each on a case-by-case basis.

One thing that is specific to OS kernel development is interrupts (especially IRQs). Interrupt handlers "take over" an existing thread (which thread is pretty much random) and therefore cannot obtain mutexes (what happens if the "host" thread is currently holding that mutex and is only part-way through updating the data that it protects?). You should have a way for an interrupt handler to unblock another thread and do the majority of the work of handling the interrupt on that thread, where it's possible to use mutexes and other thread synchronisation primitives normally. In the rare cases where you need to update data that is used by an interrupt handler, you'll need to disable interrupts temporarily, but be aware that this is another case where your code usually cannot block (it's a bad idea to switch to another thread with interrupts disabled; the other thread may depend on them).

In the case of an interrupt that's originated from userspace (e.g. a syscall, page fault, etc.) you don't have to worry so much, since a thread running in userspace cannot, almost by definition, be in the middle of updating kernelspace data.
Image
prasoc
Member
Member
Posts: 30
Joined: Fri Feb 10, 2017 8:25 am

Re: Multithreading causing debugging problems

Post by prasoc »

Thank you mallard, your reply was very insightful. I'm pretty sure the core of my scheduler is working as expected - the GUI debuuger for Bochs and the E9 hack are very useful for testing this at a very low level. It switches between multiple threads successfully (and for a long time; no stack overflow or register trashing).

I have a simple lock/unlocking system for the scheduler, but it stops the scheduler from incrementing the task until unlock is called. This is currently used when copying the backbuffer to the frontbuffer, as it takes quite a while.I will take a look into mutexes and other synchronisation primitives and see what fits my uses best.

What should happen when, let's say, the keyboard interrupt is triggered? Should I disable interrupts for other threads that don't use it, then re-enable it when switching? If so, is there any way to improve this (maybe with some general code for a general IRQ #)? feels a bit hacky this way. is this where I would implement an interrupt mask?
User avatar
JAAman
Member
Member
Posts: 879
Joined: Wed Oct 27, 2004 11:00 pm
Location: WA

Re: Multithreading causing debugging problems

Post by JAAman »

prasoc wrote: What should happen when, let's say, the keyboard interrupt is triggered? Should I disable interrupts for other threads that don't use it, then re-enable it when switching? If so, is there any way to improve this (maybe with some general code for a general IRQ #)? feels a bit hacky this way. is this where I would implement an interrupt mask?
the typical way to do this is:
when an interrupt happens, the interrupt handler is called by the CPU
the interrupt handler does the minimum it has to do to process the interrupt for a simple keyboard driver you might:
-read the keyboard controller and place the value into a ring buffer for the keyboard driver to handle
-tell the interrupt controller you are done (send EOI)

what happens next, depends on if your keyboard driver is itself a separate process, or if it runs in the context of any process:

if your keyboard driver runs in every process (easier to do for a beginner):
-re-enable interrupts and call the keyboard driver code
iret (end the interrupt handler)

if your keyboard driver is a separate process (requires a more sophisticated scheduler):
-tell the scheduler that the keyboard driver needs to run
--at this point the scheduler might immediately switch to the keyboard driver task, or it might schedule the keyboard driver task to run later (or on another core)
-iret (end the interrupt handler)


then, in the keyboard driver:
-take the input value from the buffer, and process it according to the current keyboard state
-if that input represents the end of a keystroke sequence, create a keystroke packet with the information and in the format expected by your programs
-send the keystroke packet to whichever task currently has "input focus"
-check the buffer for more input (if there is more input, repeat)
-if the keyboard driver runs directly, then return
-if the keyboard driver is a task, then tell the scheduler you are done and don't need to run anymore


note this is a simple overview, in real OSes, it is a bit more complex
prasoc
Member
Member
Posts: 30
Joined: Fri Feb 10, 2017 8:25 am

Re: Multithreading causing debugging problems

Post by prasoc »

This makes a lot of sense - I was trying to do too much in my IRQ (like some processing of the key buttons), but all that's really needed is to add it to the ring buffer! My keyboard driver doesn't understand anything about "input focus" yet, but the explanation you provided will be very useful for when I move up to multiple processes and using my very basic Surface class to provide the end-user with windows they can switch between.

Very helpful community here, thank you both
User avatar
JAAman
Member
Member
Posts: 879
Joined: Wed Oct 27, 2004 11:00 pm
Location: WA

Re: Multithreading causing debugging problems

Post by JAAman »

prasoc wrote: My keyboard driver doesn't understand anything about "input focus" yet
I should have realized you don't have a functioning event handling system yet...
in that case, for now, what I would do, is follow the instructions for "in every process", then in the keyboard driver:

-after creating a keystroke packet, check to see if that keystroke is a special keystroke (used to control your OS -- very handy in the early testing/debugging stage)
-if it is not, then place the keystroke in a keystroke buffer (if the buffer is full, either overwrite the oldest entry, or drop the keystroke)
-when an application asks for keystrokes (via an input system call or whatever you are currently providing for input), take from the keystroke buffer to return to the application
-if the keystroke buffer is currently empty, and the call requires a keystroke to return, just wait to return to the application until after a keystroke appears, either by calling the scheduler early (this is what mature OSes will do) or loop around checking the buffer (if your scheduler isn't sophisticated enough to call it early)


this is basically what traditional single-tasking OSes do (or OSes that don't use event messages), and should be fairly simple to implement without needing a lot of more advanced features -- but it will behave strangely if multiple processes try to ask for keystrokes
prasoc
Member
Member
Posts: 30
Joined: Fri Feb 10, 2017 8:25 am

Re: Multithreading causing debugging problems

Post by prasoc »

Ok, the event system is a bit far-off yet for my single-proc system, but seems like my next step will be implementing a standard way to allow threads to use messages to send data between threads.
At the moment, the way I pass data between the KB driver and the TTY update thread is by having a shared memory address for the keyboard buffer. Seems a bit rudimentary though

edit: after further research, I think I will implement a "port" system. Each thread can listen on a port which will allow a many-to-one communication pattern.
Post Reply