Page 2 of 2
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 1:08 am
by kzinti
Looks like you had another pagefault before that one. When debugging, always start with the first error. What's the very first pagefault you get?
Also what instruction is at 0xffffffff8010203a? What function is it? Is it really code? etc.
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 1:35 am
by rpio
...
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 6:52 am
by bzt
ngx wrote:The address in CR2 can't be linear, I certainly think it is physical as in my case it is
The address in CR2 is always a linear one. Having a weird address in that register might shed some light on why you're getting a page fault.
ngx wrote:I used the objdump -d on the kernel and got the instruction located there
This means you haven't mapped your code correctly, so as soon as you set the new paging table, the next instruction can't be fetched. Make sure that you map your kernel (at least this function and it's variables) at the same address as it's mapped in the old table. (In your case the PTE for ffffffff80102000 should point to the same physical page in the new table as in the old one, and the stack must be the same too to get the return address correctly.)
Cheers,
bzt
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 11:21 am
by rpio
...
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 11:50 am
by nullplan
ngx wrote:The thing you said about stack got me thinking - if the faulty instruction is iretq which stores the return address on top of the stack and the last exception has a physical address in CR2 - then maybe I just haven't mapped the stack and that is why an error occurs?
No, IRETQ only reads from the stack. If the stack were faulty, the CPU would invoke the double fault handler. You would never get to the place you got to with a faulty stack.
ngx wrote: - Does the SP contain virtual/linear or physical address?
SP contains the linear address of the stack. Common solution is to map the stack to a predetermined address and reset SP after enabling paging. You must discard all stack references from before the address space switch anyway, so might as well do it properly.
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 1:02 pm
by kzinti
Just pointing out that the code you disassembled shows that the page fault is occurring on "retq", not on "iretq".
The stack looks fine, it's your code that is not mapped properly. The fact that the page fault happens on the very next instruction after you load CR3 gives it away. It doesn't matter what that instruction actually is.
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 1:08 pm
by rpio
...
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 1:24 pm
by rpio
...
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 2:52 pm
by kzinti
The double fault happens because you aren't handling the page fault.
Re: Paging resources and wrong code
Posted: Mon Apr 12, 2021 3:38 pm
by bzt
ngx wrote:Does the SP contain virtual/linear or physical address?
Unless specifically stated otherwise in the Intel manual, once MMU is turned on, all instructions and all registers use linear address.
ngx wrote:How could I look at page mappings - is there any way to dump page tables for debugging?
Read the wiki page on kernel debugging. With qemu, start a monitor and use "info tbl", but that's very uncomfortable to use. There's a good reason why I suggested to get bochs working, because there the "page" command in the built-in debugger will do exactly what you want.
Cheers,
bzt
Re: Paging resources and wrong code
Posted: Tue Apr 13, 2021 1:34 am
by rpio
...
Re: Paging resources and wrong code
Posted: Tue Apr 13, 2021 9:27 am
by bzt
ngx wrote:The bochs does not support UEFI unfortunately as I heard, so I will not be able to run my OS on it(does it support amd64 or only x86) or does it support UEFI?
Yes, bochs supports all operating modes (real, protected and long mode as well). Officially UEFI bios isn't supported, but multiple forum members have reported that they got it working. I'm sure if you open a topic with "using UEFI with bochs", someone will be able to help you with that. (I can't, because it takes more than half a minute to boot TianoCore, while all the other boot methods finish in less than a sec. Huge difference, specially in rapid development-test cycles. So I avoid UEFI as much as I can, and if I have to, I only use qemu+TianoCore, VB+UEFI and real hardware for testing. But most of the time I just don't care, I prefer coreboot or legacy BIOS for every day OS testing as those are lighting fast. My loader is written in a way that by the time the kernel gains control, it doesn't matter at all what kind of firmware were used to load it.)
ngx wrote:But the good thing is that I managed to fix the page tables and no faults occur now
Glad to hear, well done!
Cheers,
bzt