I'm currently stuck at debugging a really weird bug in my task switching code.
I currently have one task that runs in kernel mode and I switch back and forth to it using timer interrupt.
The tasks runs good for couple of seconds and then I get either a page fault or a general protection fault followed by page fault.
I don't know how to even start debugging it, as the problem doesn't reproduce itself the same in each run, which is something I never encountered before.
Complete project is hosted here: https://github.com/mellowcandle/epOS
Relevant code snippets:
https://github.com/mellowcandle/epOS/bl ... cheduler.c
https://github.com/mellowcandle/epOS/bl ... cheduler.s
https://github.com/mellowcandle/epOS/bl ... /process.c
Thanks a lot !
Weird bug in preemptive task switching (x86)
Weird bug in preemptive task switching (x86)
“Meaningless! Meaningless!”
says the Teacher.
“Utterly meaningless!
Everything is meaningless.” - Ecclesiastes 1, 2
Educational Purpose Operating System - EPOS
says the Teacher.
“Utterly meaningless!
Everything is meaningless.” - Ecclesiastes 1, 2
Educational Purpose Operating System - EPOS
Re: Weird bug in preemptive task switching (x86)
You start debugging by using a debugger, don't be afraid of gdb.
You already said you get a #PF, read the manual about all the possibilities to cause a #PF. Check how the stack will look after #PF (according to manual), now you know where you came from, what code is there? What's in CR2? And then start working backwards to figure out why it happened.
Yes, it can be difficult and it can take time. The main alternative is to ensure that you never get into this type of mess again, one way to help with that is significant amounts of automated testing..
You already said you get a #PF, read the manual about all the possibilities to cause a #PF. Check how the stack will look after #PF (according to manual), now you know where you came from, what code is there? What's in CR2? And then start working backwards to figure out why it happened.
Yes, it can be difficult and it can take time. The main alternative is to ensure that you never get into this type of mess again, one way to help with that is significant amounts of automated testing..