Here are a few facts about the issue:
- The GPF is generated at random times.
- The GPF only occurs when running more than one process or thread - one process with one thread works fine.
- The GPF occurs on the IRET instruction. Having looked up the intel manual, a GPF with error code 0 means either "the return code or stack segment selector is NULL or the return instruction pointer is not within the return code segment limit."
- If I set the timer frequency (which the scheduler uses) to something a lot lower (was 200 Hz, change to 20 Hz), it seems the GPF does not occur (at least not after ~30 seconds).
- If I set the timer frequency to something a lot higher (was 200 Hz, change to 1000 Hz), I start getting pagefaults of varying EIPs and error codes.
I have tried debugging by stepping through with GDB attached to QEMU, but I cannot reproduce the bug.
You can find the code for my OS on github here and the scheduler part is here.
Any insight would be much appreciated as I am at my wits end with this annoying bug.