Yet another "my paging is broke" thread

Combuster · Post by **Combuster** » Thu Dec 08, 2011 4:10 pm

rplacd wrote:Done that already - I've said it's it's the mov x, cr0 that's freaking out. I don't think it's changed. If you want a complete reaccounting of my methodology, by all means ask.

The symptom is a jump to (near) zero that is not an exception. mov creg can't jump. Therefore mov creg can not possibly be the direct cause of the error. Since none of the control registers appears to be modified, it is also very unlikely to be related at all.

Similarly:

Some searching on the forum says you'll want a Bochs log so that's up at http://pastebin.com/79tiUvb7. It indicates a GPF

Bochs log says that's an outright lie. Interrupt 6 is #UD (it even says that it was generated by an misplaced lock prefix).

And there's more of that that makes me believe you're making things up (although that pattern is indistinguishable from a complete lack of required skills).

rplacd · Post by **rplacd** » Fri Dec 09, 2011 3:28 am

Combuster wrote:
rplacd wrote:Done that already - I've said it's it's the mov x, cr0 that's freaking out. I don't think it's changed. If you want a complete reaccounting of my methodology, by all means ask.
The symptom is a jump to (near) zero that is not an exception. mov creg can't jump. Therefore mov creg can not possibly be the direct cause of the error. Since none of the control registers appears to be modified, it is also very unlikely to be related at all.

I comment it out and everything's peachy - albeit without paging, I leave it in and things topple over. Obviously I'm missing something huge, so go and tell me what it is already.

Similarly:
Some searching on the forum says you'll want a Bochs log so that's up at http://pastebin.com/79tiUvb7. It indicates a GPF
Bochs log says that's an outright lie. Interrupt 6 is #UD (it even says that it was generated by an misplaced lock prefix).

And there's more of that that makes me believe you're making things up (although that pattern is indistinguishable from a complete lack of required skills).

Code: Select all

00165618066p[CPU0 ] >>PANIC<< exception(): 3rd (13) exception with no resolution

Triple-fault and GPF. But of course I've probably skipped over the bit that actually matters because I didn't understand it. In that case, mea culpa.

But accusing me of pure fraud is counterproductive. Why not just school me and get it over with? I'd rather spend time realizing what's wrong - what I came here for in the first place - rather than waste time trying to prove I'm not completely ignorant, but just selectively ignorant, which doesn't get me anywhere in either case.

Combuster · Post by **Combuster** » Fri Dec 09, 2011 3:51 am

rplacd wrote:Why not just school me and get it over with? I'd rather spend time realizing what's wrong.

I can school you, but I can't give you the answer. That's because you post a bunch of self-contradicting information that makes it practically impossible to establish the true cause of the error (besides mere guesswork). Basically, I'm stuck with only bochs' log because that's the only thing I believe I can trust without being tainted by distractions, misfacts and wrong interpretations.

Which means I have to teach you how to debug. Which starts with making yourself a build of Bochs with the debugger enabled, letting it run to the start of the kernel then stepping through it one instruction at a time until you find the instruction that is the direct cause of the symptom (i.e. jump to zero). Then you go look up what line of source it is, what it's meant to do, and what it's actual effect is.

Then you restart the simulation and find the cause for your new symptom, analyse again what it's doing and what it is supposed to do and repeat. If done carefully enough you will at some point end up where the source code does not do what you want it to, and you can fix it.

That said, Bochs debugger is merely one way of solving such a problem. It happens to work for practically everything although many problems are faster diagnosed with different debugging methods. But since your choice of debugging methods obviously failed I suggest you don't try taking shortcuts this time.

rplacd · Post by **rplacd** » Fri Dec 09, 2011 9:48 am

Combuster wrote:
rplacd wrote:Why not just school me and get it over with? I'd rather spend time realizing what's wrong.
I can school you, but I can't give you the answer. That's because you post a bunch of self-contradicting information that makes it practically impossible to establish the true cause of the error (besides mere guesswork). Basically, I'm stuck with only bochs' log because that's the only thing I believe I can trust without being tainted by distractions, misfacts and wrong interpretations.

Which means I have to teach you how to debug. Which starts with making yourself a build of Bochs with the debugger enabled, letting it run to the start of the kernel then stepping through it one instruction at a time until you find the instruction that is the direct cause of the symptom (i.e. jump to zero). Then you go look up what line of source it is, what it's meant to do, and what it's actual effect is.

Then you restart the simulation and find the cause for your new symptom, analyse again what it's doing and what it is supposed to do and repeat. If done carefully enough you will at some point end up where the source code does not do what you want it to, and you can fix it.

That said, Bochs debugger is merely one way of solving such a problem. It happens to work for practically everything although many problems are faster diagnosed with different debugging methods. But since your choice of debugging methods obviously failed I suggest you don't try taking shortcuts this time.

Thank you for that - genuinely helpful. I would rather you just said "I don't think you're giving me the right information, so here's how to debug reliably" instead subjecting me to some inscrutable grudge test to get to the point. But I've waited to bring back something concrete, and for that matter I have to grovel at your feet and say you were right - apart from pointing the finger at the C++. It was actually my own idiocy:

Two issues:
- A mangled far-jump way back in GDT code, that I'd accidentally written in Intel syntax - which compiled without complaint as a near jump to address 0x08, rather than an address at segment 0x08. How I missed that earlier is my own fault.

- An actual jump, shock horror, after the line in such contention:

Code: Select all

(0) [0x0000000000100cb0] 0008:00100cb0 (unk. ctxt): mov cr0, eax              ; 0f22c0
<bochs:108> s
Next at t=236613115
(0).[236613115] ??? (physical address not available)
<bochs:109> s
(0).[236613115] ??? (physical address not available)
Next at t=236613116
(0) [0x00000000fffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
<bochs:110> creg
CR0=0x60000010: pg CD NW ac wp ne ET ts em mp pe
CR2=page fault laddr=0x00000000
CR3=0x0000000000000000
    PCD=page-level cache disable=0
    PWT=page-level write-through=0
CR4=0x00000000: smep osxsave pcid fsgsbase smx vmx osxmmexcpt osfxsr pce pge mce pae pse de tsd pvi vme
EFER=0x00000000: ffxsr nxe lma lme sce

But one reboot and a strategically placed breakpoint later...

Code: Select all

<bochs:8> creg
CR0=0x60000011: pg CD NW ac wp ne ET ts em mp PE
CR2=page fault laddr=0x00000000
CR3=0x0000000000107000
    PCD=page-level cache disable=0
    PWT=page-level write-through=0
CR4=0x00000008: smep osxsave pcid fsgsbase smx vmx osxmmexcpt osfxsr pce pge mce pae pse DE tsd pvi vme - see this?
EFER=0x00000000: ffxsr nxe lma lme sce

And thank god for AutoHotkey. Consider this thread closed and my lesson learned - caveman debugging doesn't always work! (And sorry for leading everyone else on a wild-goose chase.)

OSDev.org

Yet another "my paging is broke" thread

Re: Yet another "my paging is broke" thread

Re: Yet another "my paging is broke" thread

Re: Yet another "my paging is broke" thread

Re: Yet another "my paging is broke" thread