Paging Problem on some HW: triple fault on enabling

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
micccy
Posts: 11
Joined: Sat Sep 05, 2015 2:10 am
Libera.chat IRC: micccy
Location: Italy

Paging Problem on some HW: triple fault on enabling

Post by micccy »

Hi everyone, I'm an Italian student (so excuse me if my english is poor). I'm rewriting my very basic OS, but trying to improve it so much, and I decided to implement x86_64 for the first time.
As currently I'm working on the bootloader, and writing it in assembler, my environment consists consists in NASM, bochs, other universal tools like an hex-editor, running in LinuxMint 18.3 'Sylvia' on a HP P6 Pavilion (2011) laptop machine. Everything was working very well, I was able to enter long mode, and I identity mapped the first 16Gb of RAM (to grant me access to the whole memory, waiting to encode a proper memory manager) using PSE. I then kept programming and started setting up the environment of my os, like IRQ handlers, exception handlers, my system call, basic I/O environment, when I decided to test it on real hardware (the same laptop i'm working on), via USB seen as an hard disk booting. It triple faulted. I tried on a different laptop of the same series, but the result was the same. So I tried on different PCs but all was fine.
I began trying to sort out which instruction was causing trouble, and I found that was the MOV CR0,EAX that enables paging. So i started debugging my paging system, and rewrote it as a very simple identity allocation of the first 2Mb, no PSE, as seen in tutorials, but the problem persists. I tried to do a lot of little changes like even flipping the page general enable bit, but I can't find out the problem. Also I cannot get and handle the exception, as my handlers seems to get never called (to be sure of this i put hangs in the exception handlers) and the cpu resets immediatly, as it would if i set a not page-aligned CR3.
This is driving me mad, and I'm running out of ideas on how debug this, so I thought to call for help.
Here's my original paging setup code (now disabled):

Code: Select all

TABLES CREATION:
;                MOV             EDI,2000h                       ;Setup PAGINING
;                MOV             CR3,EDI
;                MOV             ECX,4000h
;                XOR             EAX,EAX
;PGT_CLR_LP:     MOV             [EDI],AL                        ;Cleaning pages i'm going to use
;                LOOP            PGT_CLR_LP
;                MOV             EDI,2000h
;                MOV     DWORD   [EDI],3003h                     ;Setting PDPT0 in PML4T[0]
;                ADD             EDI,1000h
;                MOV     DWORD   [EDI],4003h                     ;Setting PDT0 in PDPT0[0]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],5003h                     ;Setting PDT1 in PDPT0[1]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],6003h                     ;Setting PDT2 in PDPT0[2]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],101003h                   ;Setting PDT3 in PDPT0[3]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],102003h                   ;Setting PDT4 in PDPT0[4]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],103003h                   ;Setting PDT5 in PDPT0[5]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],104003h                   ;Setting PDT6 in PDPT0[6]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],105003h                   ;Setting PDT7 in PDPT0[7]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],106003h                   ;Setting PDT8 in PDPT0[8]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],107003h                   ;Setting PDT9 in PDPT0[9]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],108003h                   ;Setting PDTA in PDPT0[A]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],109003h                   ;Setting PDTB in PDPT0[B]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],10A003h                   ;Setting PDTC in PDPT0[C]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],10B003h                   ;Setting PDTD in PDPT0[D]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],10C003h                   ;Setting PDTE in PDPT0[E]
                ;ADD             EDI,08h
                ;MOV     DWORD   [EDI],10D003h                   ;Setting PDTF in PDPT0[F]
;                MOV             EDI,4000h
;                MOV             EBX,00000083h
;                MOV             ECX,00000600h
;                PUSH            DBG32_4
;                CALL            DEBUG
;PGT_IDENTITY:   MOV             [EDI],EBX                       ;Setting up tables to identity map
;                ADD             EBX,200000h
;                ADD             EDI,0008h
;                LOOP            PGT_IDENTITY;

;                MOV             EDI,101000h
;                MOV             EBX,0C0000083h
;                MOV             ECX,1A00h
;                MOV             EDX,00h
;PGT_ID2:        MOV             [EDI],EBX
;                ADD             EDI,04h
;                MOV             [EDI],EDX
;                ADD             EBX,200000h
;                JO              PGT_IDO
;PGT_ID2C:       ADD             EDI,0004h
;                LOOP            PGT_ID2
;                JMP             LONGYS
;PGT_IDO:        ADD             EDX,01h
;                MOV             EBX,00h
;                JMP             PGT_ID2C
SWITCHING TO LONG MODE:
LONGYS:         PUSH            DBG32_5
                CALL            DEBUG
                PUSH            DBG32_6
                CALL            DEBUG    
                MOV             EAX,CR4                         ;Switch to LONG MODE
                OR              EAX,110000b                   ;Setting PAE(bit5) e il PSE(bit4)
                MOV             CR4,EAX
                PUSH            DBG32_7
                CALL            DEBUG
                MOV             ECX,0C0000080h                  ;Asking for EFER MSR (0xC0000080h)
                RDMSR                                           
                OR              EAX,100000000b                  ;Setting LM-bit (bit 8).
                WRMSR
                
                PUSH            DBG32_8
                CALL            DEBUG
                PUSH            DBG32_9
                CALL            DEBUG
 MOV             EAX,CR0
                OR              EAX,80000001h                   ;Setting Paging(bit31)
 ---------------MOV             CR0,EAX                CRASHES HERE----------------------------------------------------------------------
                LGDT            [GDT64]                         ;LOADING GDT
                LIDT            [IDT64]                         ;Loading IDT
                JMP             08h:LONG_MODE                   ;Jumping to a 64 bit segment
                JMP             64_ERR

I know that that's dodgy, in fact that was only a temporary solution , but it worked.
This is how i'm currently doing:

Code: Select all

Setting up tables:
                MOV             EDI,2000h                       ;PML4 at 0x2000
                MOV             CR3,EDI                         ;Setting CR3
                XOR             EAX,EAX                         
                MOV             ECX,4000h
CLR_PAG:        MOV             [EDI],AL                        ;Wiping pages 2000h,3000h,4000h,5000h
                LOOP            CLR_PAG
                MOV             EDI,CR3                         o
                MOV     DWORD   [EDI],3003h                     ;Setting PDPT0 in PML4[0]
                ADD             EDI,1000h
                MOV     DWORD   [EDI],4003h                     ;Setting PDT0 in PDPT0[0]
                ADD             EDI,1000h
                MOV     DWORD   [EDI],5003h                     ;Setting PT0 in PDT0[0]

                MOV             EBX,0003h                       ;Identity mapping first 512 pages in PT0
                MOV             ECX,0200h
                MOV             EDI,5000h
FILL_PT:        MOV             [EDI],EBX
                ADD             EBX,1000h
                ADD             EDI,08h
                LOOP            FILL_PT
Switching to long mode:
LONGYS:         PUSH            DBG32_5
                CALL            DEBUG
                PUSH            DBG32_6
                CALL            DEBUG    
                MOV             EAX,CR4                         ;Switch to LONG MODE
                OR              EAX,10100000b                   ;Setting PGE(bit7) and PAE(bit5) 
                MOV             CR4,EAX
                PUSH            DBG32_7
                CALL            DEBUG
                MOV             ECX,0C0000080h                  ;Asking for EFER MSR (0xC0000080h)
                RDMSR                                           
                OR              EAX,100000000b                  ;Setting LM-bit (bit 8).
                WRMSR
                
                PUSH            DBG32_8
                CALL            DEBUG
                PUSH            DBG32_9
                CALL            DEBUG
                MOV             EAX,CR0
                OR              EAX,80000001h                   ;Setting Paging(bit31)
 ---------------MOV             CR0,EAX                       CRASHES HERE------------------------------------------------------------------------------------------------
                LGDT            [GDT64]                         ;Loading GDT
                LIDT            [IDT64]                         ;Loading IDT
                JMP             08h:LONG_MODE                   ;Jumping to a 64 bit segmen
                JMP             NO_64

It crashes on the highlighted instructions. I have to say that I have exception handlers both at 32 and 64 bits,and they are tested, that the os checks for the capability of doing everything (long mode, pae, pse, extended cpuid, etc), that i'm sure that the problem is that instruction cause i put a hang before, and then after, and that the pcs i'm testing this on are teorically capable of doing those things. I really don't understand why only on some hp laptops this problem exists. I really hope that you guys can help me sort out the problem... any kind of suggestion is welcomed.
You learn more by a single triple fault than by reading the whole Intel specification...
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: Paging Problem on some HW: triple fault on enabling

Post by nullplan »

Your CLR_PAG loop is failing to advance EDI. Therefore you are not clearing 0x4000 bytes to 0, merely setting the byte at address 0x2000 to zero 0x4000 times. I would advise setting ECX to a quarter of its current value and replacing that loop with "REP STOSD" and be done with it.

I am not sure how that would cause a tripple fault, though. Maybe a PF or a GPF if the processor was unhappy about something, OK, but you claim to catch these faults. And even if not, I would expect a DF exception, not a tripple fault. Maybe IDTR not set up correctly?

BTW, if you are in 32-bit mode already, why are you writing in assembly, and not in C?
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Paging Problem on some HW: triple fault on enabling

Post by iansjack »

The same mistake is made in the FILL_IDT loop. So the page table will be completely invalid. As soon as paging is enabled there will be a (page?) fault which will try to run an exception handler at a completely random address. Bingo! - triple fault.

This sort of error would be obvious if the code was written in C.
User avatar
micccy
Posts: 11
Joined: Sat Sep 05, 2015 2:10 am
Libera.chat IRC: micccy
Location: Italy

Re: Paging Problem on some HW: triple fault on enabling

Post by micccy »

Ok thank you. These were silly mistakes, I'll fix them. I'll answer ti you properly when I have a break

UPDATE:
Thanks iansjack for replying, but in the FILL_PT loop there is an ADD EDI,08h , so I don't think that that's the problem
Last edited by micccy on Thu Apr 26, 2018 7:04 am, edited 1 time in total.
You learn more by a single triple fault than by reading the whole Intel specification...
User avatar
micccy
Posts: 11
Joined: Sat Sep 05, 2015 2:10 am
Libera.chat IRC: micccy
Location: Italy

Re: Paging Problem on some HW: triple fault on enabling

Post by micccy »

Ok, so: I'll fix the CLR_PAG loop, that was a silly mistake that I didn't notice, and probably didn't give me problema because in bochs and some machines that memory area might be already set to zero. Also I didn't access menory areas above 2Mb until some lines after, where I read acpi tables: the fact that i can handle the page fault that happens there, and the idt check that I did entering "info idt" in bochs debug, makes me think that I set IDT properly for 64bit long mode. Also I tried to cause artificially exceptions (like reading 0xFFFFFFFFFFFFFFFF in memory or doing XOR EBX,EBX /DIV BL) and they are handled properly, except for the exceptions happening between when I toggle LM bit, and when I've jumped ti a 64bit code segment (Maybe because I haven't Compatibility Mode exception handlers). It also seems that a fail in paging enabling locks up all the memory, causing subsequently page faults and triple faulting, but I'm not sure, so I'll check It out.
And to answer your question, there is no particular reason why I'm writing all in assembler, I Just like assembler a lot, and I thought it was going to be fun to write everything in assembler (also if It surely isn't the better choice). Thank you, I'll do some tests
You learn more by a single triple fault than by reading the whole Intel specification...
User avatar
micccy
Posts: 11
Joined: Sat Sep 05, 2015 2:10 am
Libera.chat IRC: micccy
Location: Italy

Re: Paging Problem on some HW: triple fault on enabling

Post by micccy »

nullplan wrote:I am not sure how that would cause a tripple fault, though. Maybe a PF or a GPF if the processor was unhappy about something, OK, but you claim to catch these faults. And even if not, I would expect a DF exception, not a tripple fault. Maybe IDTR not set up correctly?
FIXED:

I don't really understand why, but the processor on these machines doesn't like to go in long mode, before you set up a proper long mode GDT and a long mode IDT: to fix everything it was enough to move the LGDT and LIDT instructions before (and not after) the MOV CR0,EAX that enables paging.

Code: Select all

THIS WORKS WELL:

                PUSH            DBG32_8
                CALL            DEBUG
                PUSH            DBG32_9
                CALL            DEBUG

                LGDT            [GDT64]                         ;Loading 64bit GDT
                LIDT            [IDT64]                         ;Loading 64bit IDT
                MOV             EAX,CR0
                OR              EAX,80000001h                   ;Enabling Paging(bit31)
                MOV             CR0,EAX
               
               JMP             08h:LONG_MODE                   ;Jumping in a 64 bit code segment
                JMP             NO_64

AND THIS TRIPLE FAULTS:
  
               PUSH            DBG32_8
                CALL            DEBUG
                PUSH            DBG32_9
                CALL            DEBUG

                MOV             EAX,CR0
                OR              EAX,80000001h                   ;Enabling Paging(bit31)
                MOV             CR0,EAX

                LGDT            [GDT64]                         ;Loading 64bit GDT
                LIDT            [IDT64]                         ;Loading 64bit IDT
               
               JMP             08h:LONG_MODE                   ;Jumping in a 64 bit code segment
                JMP             NO_64

Also, I can now tell that in the first case no exceptions are fired. Can anybody explain why is this happening? It seems very dodgy and overprotective to build a processor that immediately crashes on a mode change if it hasn't the proper GDT and IDT tables already loaded. Also this seems to happen only with HP Pavilion laptops, but not on Bochs, other PCs, and even HP Pavilion Desktops
You learn more by a single triple fault than by reading the whole Intel specification...
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: Paging Problem on some HW: triple fault on enabling

Post by nullplan »

Well, what is a poor processor to do? For some reason it is checking the validity of the GDT on switch to long mode. That does go against the architecture manual, but hey, this is Intel.

So, when it sees a broken GDT, what will it do? Cause a GPF. In Long Mode, while the IDTR still points to the 32-bit GPF handler. So that fails, and causes a DF. Again, while the IDTR points to the Protected Mode IDT, although the processor is in Long Mode. So that faults again, and there's your tripple fault.
Carpe diem!
Post Reply