Weird Infinite Loop
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Weird Infinite Loop
ESP0 and SS0 are static fields of the TSS (check intel manuals for the complete list of static fields). That means the CPU loads registers from the TSS' content, but it _never_ (and especially not when doing a task switch) store anything to that fields by itself.
If you're doing software task switching and if the value of ESP0 isn't the same for all your tasks (e.g. because you have multithreading without a Big Kernel Lock or simply because you decided so), you'll have to save the current value of TSS.ESP0 in your Thread Control Block (or on the current stack) and restore the value of ESP0 that was used by the incoming thread.
If you're doing software task switching and if the value of ESP0 isn't the same for all your tasks (e.g. because you have multithreading without a Big Kernel Lock or simply because you decided so), you'll have to save the current value of TSS.ESP0 in your Thread Control Block (or on the current stack) and restore the value of ESP0 that was used by the incoming thread.
Re:Weird Infinite Loop
I appreciate your help. I just squashed an unrelated bug though. The address of my TSS was zero because I wasn't passing the pointer from my ASM code to my GDT code written in C correctly. Now that I fixed that though, my code is firing an exception.
00007077030e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
I think that the reason it has no resolution is because it's my GDT set up routine. I don't set up the IDT until afterwards.
Is that a Page Fault that's firing? I don't understand the significance of the numbers 3 or 14. 3 could be Int 3 or Non Maskable Interrupt. 14 could be Page Fault. It seems most likely that it's a Page Fault. Argh! Can somebody verify this?
00007077030e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
I think that the reason it has no resolution is because it's my GDT set up routine. I don't set up the IDT until afterwards.
Is that a Page Fault that's firing? I don't understand the significance of the numbers 3 or 14. 3 could be Int 3 or Non Maskable Interrupt. 14 could be Page Fault. It seems most likely that it's a Page Fault. Argh! Can somebody verify this?
Re:Weird Infinite Loop
yes, that is a page fault. the 3rd says that it is the third exception generated by the cpu (without resolution) and the number in brackets is the number of the exception, which occured last.TheChuckster wrote: 00007077030e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
...
Is that a Page Fault that's firing? I don't understand the significance of the numbers 3 or 14. 3 could be Int 3 or Non Maskable Interrupt. 14 could be Page Fault. It seems most likely that it's a Page Fault. Argh! Can somebody verify this?
Re:Weird Infinite Loop
Great. To complement the mysterious invalid TSS problem, my task switcher is mysteriously causing a page fault. Might be a permissions problem.
If I leave it as is, ORing each page table and directory entry by 2, I get the unresolved exception.
If I OR each page table and directory entry by 7 to try to get each page in user mode, I now get a visible page fault on screen (my exception handler is called) with bogus address 0xFFFFFFFC8. Could my GDT or TSS be causing this?
I'm quite sure my paging is correct, but I am still unsure about the new ring 3 code obviously because it is failing repeatedly.... It worked fine with ring 0.
If I leave it as is, ORing each page table and directory entry by 2, I get the unresolved exception.
If I OR each page table and directory entry by 7 to try to get each page in user mode, I now get a visible page fault on screen (my exception handler is called) with bogus address 0xFFFFFFFC8. Could my GDT or TSS be causing this?
I'm quite sure my paging is correct, but I am still unsure about the new ring 3 code obviously because it is failing repeatedly.... It worked fine with ring 0.
Re:Weird Infinite Loop
Found out some more information about this bug. The address is 0xFFFFFFFC8, but SS is set to a bogus 1B000023. 1B is what my code segment SHOULD be. What would cause it to end up there instead? The error could indicates that I am writing to a nonpresent page. Obviously a bogus address like this is nonpresent.
Most likely this is a stack problem. Wouldn't you agree? I will have to investigate.
By the way, can anyone using Bochs test this out? Run your OS with Bochs debugger and do a dump_cpu. There should be a line similar to ldtr:s=0x0000, dl=0x00000000, dh=0x00000000, valid=0. Please tell me what value "valid" is set to.
Most likely this is a stack problem. Wouldn't you agree? I will have to investigate.
By the way, can anyone using Bochs test this out? Run your OS with Bochs debugger and do a dump_cpu. There should be a line similar to ldtr:s=0x0000, dl=0x00000000, dh=0x00000000, valid=0. Please tell me what value "valid" is set to.
Re:Weird Infinite Loop
Well. I think your stack is aligned differently in running the stack code than when it was put on the stack. That is imho the only reason for it to end up like that.TheChuckster wrote: Found out some more information about this bug. The address is 0xFFFFFFFC8, but SS is set to a bogus 1B000023. 1B is what my code segment SHOULD be. What would cause it to end up there instead? The error could indicates that I am writing to a nonpresent page. Obviously a bogus address like this is nonpresent.
Most likely this is a stack problem. Wouldn't you agree? I will have to investigate.
Re:Weird Infinite Loop
Now, this stack misalignment could be caused by what?
I checked my stack over and over, number of pushes and pops are equal. I'm not doing anything weird with the esp pointer.
Could it be the fact that I'm pushing two extra items on the stack (esp, ss) in ring transitions as opposed to same ring interrupts? Could that be misaligning it? If so, should I fix it by pushing two blank items on the stack in their place? Oh wait, can't because they're right in the MIDDLE of the stuff pushed and popped by teh CPU.
Took out my ring 0 kernel task. Same results. Maybe system calls? Pushes and pops are fine for them too.
Are you POSITIVE it's a stack misalignment?
I checked my stack over and over, number of pushes and pops are equal. I'm not doing anything weird with the esp pointer.
Could it be the fact that I'm pushing two extra items on the stack (esp, ss) in ring transitions as opposed to same ring interrupts? Could that be misaligning it? If so, should I fix it by pushing two blank items on the stack in their place? Oh wait, can't because they're right in the MIDDLE of the stuff pushed and popped by teh CPU.
Took out my ring 0 kernel task. Same results. Maybe system calls? Pushes and pops are fine for them too.
Are you POSITIVE it's a stack misalignment?
Re:Weird Infinite Loop
No, I'm not positive.TheChuckster wrote: Now, this stack misalignment could be caused by what?
I checked my stack over and over, number of pushes and pops are equal. I'm not doing anything weird with the esp pointer.
Could it be the fact that I'm pushing two extra items on the stack (esp, ss) in ring transitions as opposed to same ring interrupts? Could that be misaligning it? If so, should I fix it by pushing two blank items on the stack in their place? Oh wait, can't because they're right in the MIDDLE of the stuff pushed and popped by teh CPU.
Took out my ring 0 kernel task. Same results. Maybe system calls? Pushes and pops are fine for them too.
Are you POSITIVE it's a stack misalignment?
However, I can't see another way to make the last byte of an index appear as the first byte of something else you load off the stack. That's a single-byte offset, and in the world of everything-is-kind-of-aligned, that comes down to a stack misalignment OR a push-on-stack-misalignment.
Try to trace the code that puts the value on the stack.
Re:Weird Infinite Loop
Well I tried type-casting my initial stack values for each task to unsigned ints. The code still runs perfectly in ring 0 as always. If it were misaligned it wouldn't even do that.
Now I try throwing in a ring 3 task. Page fault as always. But this time SS = 0. Hmmm. Should my timer handler be pushing and popping SS as well? I don't know, but I tried it any how and I got a CPL != RPL error in the Bochs output log. Yuck. The system needs to temporarily hold a Ring 3 SS before ireting back into Ring 3 everything else. No?
Or maybe my TSS isn't working right? Somebody please check their ldtr value in Bochs for their OS, provided it does protection. Can we safely rule out the TSS out of the possibilities?
This bug is unique in that you can never rule out any possibilities. I narrowed it down to the task switching code but that's it. It could be anything. The problem with introducing new code is that you get new exceptions. How do you know if they are replacing the current exception or supplementing it? You don't. And on any other platform besides Bochs (Qemu, real hardware), it triple faults.
Regarding the SS value though, am I at least HEADING in the right direction to fixing this? I'm sure nobody here has encountered this error before. At this point I'm ready to ask myself the question: Does my OS really NEED protection? The main problem is that I can't do VM86 mode without a functional TSS. I get a page fault when I try to do that as well.
Now I try throwing in a ring 3 task. Page fault as always. But this time SS = 0. Hmmm. Should my timer handler be pushing and popping SS as well? I don't know, but I tried it any how and I got a CPL != RPL error in the Bochs output log. Yuck. The system needs to temporarily hold a Ring 3 SS before ireting back into Ring 3 everything else. No?
Or maybe my TSS isn't working right? Somebody please check their ldtr value in Bochs for their OS, provided it does protection. Can we safely rule out the TSS out of the possibilities?
This bug is unique in that you can never rule out any possibilities. I narrowed it down to the task switching code but that's it. It could be anything. The problem with introducing new code is that you get new exceptions. How do you know if they are replacing the current exception or supplementing it? You don't. And on any other platform besides Bochs (Qemu, real hardware), it triple faults.
Regarding the SS value though, am I at least HEADING in the right direction to fixing this? I'm sure nobody here has encountered this error before. At this point I'm ready to ask myself the question: Does my OS really NEED protection? The main problem is that I can't do VM86 mode without a functional TSS. I get a page fault when I try to do that as well.
Re:Weird Infinite Loop
Another question: Is an invalid SS a cause for a memory address to be bogus? Perhaps the stack has nothing to do with this at all...
Re:Weird Infinite Loop
well, an invalid SS may point to the wrong base address, but if you only have one valid, ring-appropriate data segment, then it would more likely give you a GPF or Stack-Fault