Weird Infinite Loop

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
TheChuckster

Weird Infinite Loop

Post by TheChuckster »

Nevermind, not fixed after all.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Weird Infinite Loop

Post by Pype.Clicker »

ESP0 and SS0 are static fields of the TSS (check intel manuals for the complete list of static fields). That means the CPU loads registers from the TSS' content, but it _never_ (and especially not when doing a task switch) store anything to that fields by itself.

If you're doing software task switching and if the value of ESP0 isn't the same for all your tasks (e.g. because you have multithreading without a Big Kernel Lock or simply because you decided so), you'll have to save the current value of TSS.ESP0 in your Thread Control Block (or on the current stack) and restore the value of ESP0 that was used by the incoming thread.
TheChuckster

Re:Weird Infinite Loop

Post by TheChuckster »

I appreciate your help. I just squashed an unrelated bug though. The address of my TSS was zero because I wasn't passing the pointer from my ASM code to my GDT code written in C correctly. Now that I fixed that though, my code is firing an exception.

00007077030e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting

I think that the reason it has no resolution is because it's my GDT set up routine. I don't set up the IDT until afterwards.

Is that a Page Fault that's firing? I don't understand the significance of the numbers 3 or 14. 3 could be Int 3 or Non Maskable Interrupt. 14 could be Page Fault. It seems most likely that it's a Page Fault. Argh! Can somebody verify this?
bluecode

Re:Weird Infinite Loop

Post by bluecode »

TheChuckster wrote: 00007077030e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
...
Is that a Page Fault that's firing? I don't understand the significance of the numbers 3 or 14. 3 could be Int 3 or Non Maskable Interrupt. 14 could be Page Fault. It seems most likely that it's a Page Fault. Argh! Can somebody verify this?
yes, that is a page fault. the 3rd says that it is the third exception generated by the cpu (without resolution) and the number in brackets is the number of the exception, which occured last.
TheChuckster

Re:Weird Infinite Loop

Post by TheChuckster »

Great. To complement the mysterious invalid TSS problem, my task switcher is mysteriously causing a page fault. Might be a permissions problem.

If I leave it as is, ORing each page table and directory entry by 2, I get the unresolved exception.

If I OR each page table and directory entry by 7 to try to get each page in user mode, I now get a visible page fault on screen (my exception handler is called) with bogus address 0xFFFFFFFC8. Could my GDT or TSS be causing this?

I'm quite sure my paging is correct, but I am still unsure about the new ring 3 code obviously because it is failing repeatedly.... It worked fine with ring 0.
TheChuckster

Re:Weird Infinite Loop

Post by TheChuckster »

Found out some more information about this bug. The address is 0xFFFFFFFC8, but SS is set to a bogus 1B000023. 1B is what my code segment SHOULD be. What would cause it to end up there instead? The error could indicates that I am writing to a nonpresent page. Obviously a bogus address like this is nonpresent.

Most likely this is a stack problem. Wouldn't you agree? I will have to investigate.

By the way, can anyone using Bochs test this out? Run your OS with Bochs debugger and do a dump_cpu. There should be a line similar to ldtr:s=0x0000, dl=0x00000000, dh=0x00000000, valid=0. Please tell me what value "valid" is set to.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Weird Infinite Loop

Post by Candy »

TheChuckster wrote: Found out some more information about this bug. The address is 0xFFFFFFFC8, but SS is set to a bogus 1B000023. 1B is what my code segment SHOULD be. What would cause it to end up there instead? The error could indicates that I am writing to a nonpresent page. Obviously a bogus address like this is nonpresent.

Most likely this is a stack problem. Wouldn't you agree? I will have to investigate.
Well. I think your stack is aligned differently in running the stack code than when it was put on the stack. That is imho the only reason for it to end up like that.
TheChuckster

Re:Weird Infinite Loop

Post by TheChuckster »

Now, this stack misalignment could be caused by what?

I checked my stack over and over, number of pushes and pops are equal. I'm not doing anything weird with the esp pointer.

Could it be the fact that I'm pushing two extra items on the stack (esp, ss) in ring transitions as opposed to same ring interrupts? Could that be misaligning it? If so, should I fix it by pushing two blank items on the stack in their place? Oh wait, can't because they're right in the MIDDLE of the stuff pushed and popped by teh CPU.

Took out my ring 0 kernel task. Same results. Maybe system calls? Pushes and pops are fine for them too.

Are you POSITIVE it's a stack misalignment?
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Weird Infinite Loop

Post by Candy »

TheChuckster wrote: Now, this stack misalignment could be caused by what?

I checked my stack over and over, number of pushes and pops are equal. I'm not doing anything weird with the esp pointer.

Could it be the fact that I'm pushing two extra items on the stack (esp, ss) in ring transitions as opposed to same ring interrupts? Could that be misaligning it? If so, should I fix it by pushing two blank items on the stack in their place? Oh wait, can't because they're right in the MIDDLE of the stuff pushed and popped by teh CPU.

Took out my ring 0 kernel task. Same results. Maybe system calls? Pushes and pops are fine for them too.

Are you POSITIVE it's a stack misalignment?
No, I'm not positive.

However, I can't see another way to make the last byte of an index appear as the first byte of something else you load off the stack. That's a single-byte offset, and in the world of everything-is-kind-of-aligned, that comes down to a stack misalignment OR a push-on-stack-misalignment.

Try to trace the code that puts the value on the stack.
TheChuckster

Re:Weird Infinite Loop

Post by TheChuckster »

Well I tried type-casting my initial stack values for each task to unsigned ints. The code still runs perfectly in ring 0 as always. If it were misaligned it wouldn't even do that.

Now I try throwing in a ring 3 task. Page fault as always. But this time SS = 0. Hmmm. Should my timer handler be pushing and popping SS as well? I don't know, but I tried it any how and I got a CPL != RPL error in the Bochs output log. Yuck. The system needs to temporarily hold a Ring 3 SS before ireting back into Ring 3 everything else. No?

Or maybe my TSS isn't working right? Somebody please check their ldtr value in Bochs for their OS, provided it does protection. Can we safely rule out the TSS out of the possibilities?

This bug is unique in that you can never rule out any possibilities. I narrowed it down to the task switching code but that's it. It could be anything. The problem with introducing new code is that you get new exceptions. How do you know if they are replacing the current exception or supplementing it? You don't. And on any other platform besides Bochs (Qemu, real hardware), it triple faults.

Regarding the SS value though, am I at least HEADING in the right direction to fixing this? I'm sure nobody here has encountered this error before. At this point I'm ready to ask myself the question: Does my OS really NEED protection? The main problem is that I can't do VM86 mode without a functional TSS. I get a page fault when I try to do that as well.
TheChuckster

Re:Weird Infinite Loop

Post by TheChuckster »

Another question: Is an invalid SS a cause for a memory address to be bogus? Perhaps the stack has nothing to do with this at all...
JAAman

Re:Weird Infinite Loop

Post by JAAman »

well, an invalid SS may point to the wrong base address, but if you only have one valid, ring-appropriate data segment, then it would more likely give you a GPF or Stack-Fault
Post Reply