Page 1 of 1
Double fault after enabling interrupts
Posted: Fri Jul 19, 2024 7:41 pm
by glolichen
I'm working on using interrupts in an x86_64 OS and it appears that a double fault is raised 3 instructions after enabling interrupts with sti. I have read the FAQ and reprogrammed the PIC using code found on
https://wiki.osdev.org/8259_PIC. I do not think this is a real double fault because 1. qemulog does not show any other interrupt being raised and 2. a real double fault is supposed to push 0 to the stack as the error code and that is not happening for me.
I am using grub2 bootloader and set up paging, GDT and enable long mode in assembly before jumping to C code, when the IDT (and other interrupt handling code such as remapping, interrupt requests, etc), and TSS are set up.
I would greatly appreciate if someone looked over the code at
https://github.com/glolichen/os and tell me what is wrong. Also, maybe it's true that the 8259 PIC or the remapping code used is only for 32 bit protected mode and doesn't work in long mode? I also copied a lot directly from old but previously working protected mode code, so the differences between 32 and 64 bit might be causing issues. Either way, please let me know the issue. Thank you.
Re: Double fault after enabling interrupts
Posted: Fri Jul 19, 2024 8:21 pm
by Octocontrabass
glolichen wrote: ↑Fri Jul 19, 2024 7:41 pmqemulog does not show any other interrupt being raised
Can you share that log?
glolichen wrote: ↑Fri Jul 19, 2024 7:41 pma real double fault is supposed to push 0 to the stack as the error code and that is not happening for me.
How did you check that?
There are too many problems to list them all, but one that sticks out to me is your exception handlers. All of them are broken. For example, your double fault handler will display random garbage instead of the error code.
glolichen wrote: ↑Fri Jul 19, 2024 7:41 pmAlso, maybe it's true that the 8259 PIC or the remapping code used is only for 32 bit protected mode and doesn't work in long mode?
No. Changing the CPU mode doesn't change the behavior of any hardware outside the CPU.
glolichen wrote: ↑Fri Jul 19, 2024 7:41 pmI also copied a lot directly from old but previously working protected mode code, so the differences between 32 and 64 bit might be causing issues.
You shouldn't copy code you don't understand, even if it's your own code...
Re: Double fault after enabling interrupts
Posted: Fri Jul 19, 2024 9:15 pm
by glolichen
Code: Select all
SMM: enter
EAX=000000b5 EBX=00008a80 ECX=00005678 EDX=00000002
ESI=00000000 EDI=000f1a47 EBP=0000fe54 ESP=000ece54
EIP=000f8a7d EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000f7460 00000037
IDT= 000f749e 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=000000f0 CCD=000ece40 CCO=ADDL
EFER=0000000000000000
SMM: after RSM
EAX=000000b5 EBX=00008a80 ECX=00005678 EDX=00000002
ESI=00000000 EDI=000f1a47 EBP=0000fe54 ESP=000ece54
EIP=00008a80 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =dd00 000dd000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009e00
SS =dd00 000dd000 0000ffff 00009300
DS =dd00 000dd000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00008280 00000027
IDT= 00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000004 CCD=00000001 CCO=EFLAGS
EFER=0000000000000000
Servicing hardware INT=0x08
0: v=08 e=0000 i=0 cpl=0 IP=0008:ffffffff801010fe pc=ffffffff801010fe SP=0010:000000000090cfc0 env->regs[R_EAX]=ffffffff80915060
RAX=ffffffff80915060 RBX=000000000050d000 RCX=0000000000000000 RDX=ffffffff80914060
RSI=00000000000000ff RDI=0000000000000021 RBP=000000000090cff0 RSP=000000000090cfc0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff801010fe RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 ffffffff 00af9300 DPL=0 DS [-WA]
CS =0008 0000000000000000 ffffffff 00af9a00 DPL=0 CS64 [-R-]
SS =0010 0000000000000000 ffffffff 00af9300 DPL=0 DS [-WA]
DS =0010 0000000000000000 ffffffff 00af9300 DPL=0 DS [-WA]
FS =0010 0000000000000000 ffffffff 00af9300 DPL=0 DS [-WA]
GS =0010 0000000000000000 ffffffff 00af9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0018 ffffffff80105000 00000068 00008900 DPL=0 TSS64-avl
GDT= 0000000000105072 00000027
IDT= ffffffff80914060 00000fff
CR0=80000011 CR2=0000000000000000 CR3=0000000000106000 CR4=00000020
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000007 CCD=0000000080101eff CCO=LOGICL
EFER=0000000000000500
Last part of QEMU log above (the whole thing is too long and mostly irrelevant)
I added a magic breakpoint when the exception handler is called in bochs and looked at the stack. Nothing is pushed.
Octocontrabass wrote: ↑Fri Jul 19, 2024 8:21 pm There are too many problems to list them all, but one that sticks out to me is your exception handlers. All of them are broken. For example, your double fault handler will display random garbage instead of the error code.
Yeah, I'm aware. I am fixing that currently but I don't think that is the cause of this issue. I would appreciate if you could tell me some more problems with the code; even just one or two of the most serious ones would be helpful. Thanks for your time looking into this!
Re: Double fault after enabling interrupts
Posted: Sun Jul 21, 2024 6:28 pm
by Octocontrabass
Huh, that is indeed a hardware interrupt using the wrong vector. You did have code to reprogram the interrupt controllers, and I didn't see anything obviously wrong with it earlier, but you've changed it since then. Did you end up fixing the problem?
glolichen wrote: ↑Fri Jul 19, 2024 9:15 pmI would appreciate if you could tell me some more problems with the code; even just one or two of the most serious ones would be helpful.
Lots of things still need to be rewritten because of differences between 32-bit and 64-bit mode, like your code for handling the GDT and paging.
The constraint in this inline asm should be "m" instead of "g".
The "d" operand modifier (like "%d0") doesn't make sense in either places you used it here; you probably wanted "w" instead. Additionally, adding the "N" constraint to the port number (like "Nd") would allow the compiler to generate smaller code when the port number is a compile-time constant that fits in 8 bits.
Setting the IOPB offset to 0 does not disable the IOPB. There are several other problems with your TSS, such as using physical addresses instead of virtual addresses, reserving only 8 bytes for each stack in the IST, reserving space for useless ring 1 and ring 2 stacks, and using a static allocation for the ring 0 stack. (Although that last one may not be a problem if you're aiming for some experimental kernel design that doesn't have a separate ring 0 stack for each thread.)
Your makefile has a bunch of compiler flags in the linker flags. Your compiler flags should use "-mgeneral-regs-only" instead of trying to individually disable all the different extended instruction sets. You probably should use "-mcmodel=kernel" instead of "-mcmodel=large". You probably shouldn't use "-fPIC".
Your linker script is missing wildcards.
Re: Double fault after enabling interrupts
Posted: Wed Jul 24, 2024 9:27 am
by glolichen
Octocontrabass wrote: ↑Sun Jul 21, 2024 6:28 pm
Did you end up fixing the problem?
Yes, it was likely some issue with the inb/outb C wrapper. I moved the pic remapping code to the loader and changed io wrappers to inline assembly. Also I appreciate the advice on inline asm.
Octocontrabass wrote: ↑Sun Jul 21, 2024 6:28 pm
using physical addresses instead of virtual addresses
I thought after enabling paging, passing the label from the data section would give virtual addresses? The TSS start address passed to kmain is 0xFFFFFFFF80104000. Correct me if I am wrong though.
Regarding the IST, I haven't found much information on how to allocate it (on the forum, wiki and the amd/intel manuals), but am I right in saying that it should be allocated by the virtual memory manager/kmalloc? And if I'm loading the gdt in the 32 to 64 bit bootstrap, I do not need to do anything with it in C code? Thanks for your help.
Re: Double fault after enabling interrupts
Posted: Wed Jul 24, 2024 11:13 am
by nullplan
glolichen wrote: ↑Wed Jul 24, 2024 9:27 am
I thought after enabling paging, passing the label from the data section would give virtual addresses? The TSS start address passed to kmain is 0xFFFFFFFF80104000. Correct me if I am wrong though.
The address a symbol is mapped to is determined by the linker, so the exact address you get is the address the linker thought it would be. If you set up the linker correctly, then that is a virtual address, and indeed the address you gave looks like a correct virtual address for the kernel code model (where everything is linked to the -2GB line).
glolichen wrote: ↑Wed Jul 24, 2024 9:27 am
Regarding the IST, I haven't found much information on how to allocate it (on the forum, wiki and the amd/intel manuals), but am I right in saying that it should be allocated by the virtual memory manager/kmalloc? And if I'm loading the gdt in the 32 to 64 bit bootstrap, I do not need to do anything with it in C code? Thanks for your help.
Personally, I allocate the IST as part of the CPU descriptor. I have a "struct cpu" that contains all the cpu-local variables, and it includes a "struct arch_cpu" with the arch-specific stuff in it, and the latter one contains the GDT, the TSS, and the IST stacks. The purpose of this is to have one such CPU descriptor as part of the .bss section (for the BSP), and everything else simply be allocated with one malloc() call.
I do hope you mean the IST stacks. The IST itself is just part of the TSS.
Re: Double fault after enabling interrupts
Posted: Wed Jul 24, 2024 8:53 pm
by Octocontrabass
glolichen wrote: ↑Wed Jul 24, 2024 9:27 amThe TSS start address passed to kmain is 0xFFFFFFFF80104000. Correct me if I am wrong though.
Right, but what about the addresses in the TSS, like RSP0?