Page 1 of 1

[SOLVED] GPF after recieving INT 0x20

Posted: Mon Aug 05, 2024 9:01 am
by avcado
Hello. I've been working on a toy UEFI 64-bit operating system for the past week. I've implemented both a working GDT and IDT (and ISRs). Next up on the list was IRQs, so I thought that I could do the same for IRQs:

Code: Select all

void init_irqs(){
    for(uint8_t vector = 32; vector < 48; vector++){
    idt_set_descriptor(vector, irq_stub_table[vector], 0x8e);
    vectors[vector] = true;
  }
}
The assembly code that loads the IRQ handler (and also defines irq_stub_table) looks like this:

Code: Select all


%macro IRQ 2
irq_stub_%1:
  cli
  push byte 0
  push byte %2
  jmp irq_cstb
%endmacro

IRQ   0,    32
IRQ   1,    33
IRQ   2,    34
IRQ   3,    35
IRQ   4,    36
IRQ   5,    37
IRQ   6,    38
IRQ   7,    39
IRQ   8,    40
IRQ   9,    41
IRQ  10,    42
IRQ  11,    43
IRQ  12,    44
IRQ  13,    45
IRQ  14,    46
IRQ  15,    47

extern irq_handler
irq_cstb:
  push rdi
  push rsi
  push rbp
  push rsp
  push rbx
  push rdx
  push rcx
  push rax

  mov ax, ds
  push rax

  mov ax, 0x10
  mov ds, ax
  mov es, ax
  mov fs, ax
  mov gs, ax

  call irq_handler

  pop rbx
  mov ds, bx
  mov es, bx
  mov fs, bx
  mov gs, bx

  pop rax
  pop rcx
  pop rdx
  pop rbx
  pop rsp
  pop rbp
  pop rsi
  pop rdi

  add rsp, 16
  iretq


global irq_stub_table
irq_stub_table:
%assign j 0
%rep 16
  dq irq_stub_%+j
%assign j j+1
%endrep
When I boot up my kernel, using -d int -M smm=off, the relevant interrupt logs look like this:

Code: Select all

Servicing hardware INT=0x20
   560: v=20 e=0000 i=0 cpl=0 IP=0008:0000000001000b01 pc=0000000001000b01 SP=0010:0000000007f07258 env->regs[R_EAX]=0000000000000000
RAX=0000000000000000 RBX=0000000006626318 RCX=0000000000000000 RDX=0000000000000010
RSI=40c766f8458b4810 RDI=0000000007f07176 RBP=0000000007f07268 RSP=0000000007f07258
R8 =0000000000000000 R9 =000000000000018d R10=0000000000000001 R11=0000000007f00c38
R12=0000000000000000 R13=0000000006626d18 R14=0000000000000000 R15=000000000659f960
RIP=0000000001000b01 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
CS =0008 0000000000000000 00000fff 00a09a00 DPL=0 CS64 [-R-]
SS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
DS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
FS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
GS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     0000000001000910 00000017
IDT=     0000000001005920 00000fff
CR0=80010031 CR2=0000000000000000 CR3=0000000007c01000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS
EFER=0000000000000d00
check_exception old: 0xffffffff new 0xd
   561: v=0d e=0000 i=0 cpl=0 IP=0008:81e800000021bf00 pc=81e800000021bf00 SP=0010:0000000007f07228 env->regs[R_EAX]=0000000000000000
RAX=0000000000000000 RBX=0000000006626318 RCX=0000000000000000 RDX=0000000000000010
RSI=40c766f8458b4810 RDI=0000000007f07176 RBP=0000000007f07268 RSP=0000000007f07228
R8 =0000000000000000 R9 =000000000000018d R10=0000000000000001 R11=0000000007f00c38
R12=0000000000000000 R13=0000000006626d18 R14=0000000000000000 R15=000000000659f960
RIP=81e800000021bf00 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
CS =0008 0000000000000000 00000fff 00a09a00 DPL=0 CS64 [-R-]
SS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
DS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
FS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
GS =0010 0000000000000000 00000fff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     0000000001000910 00000017
IDT=     0000000001005920 00000fff
CR0=80010031 CR2=81e800000021bf00 CR3=0000000007c01000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS
EFER=0000000000000d00
At IP=0008:0000000001000b01, it is my while loop:

Code: Select all

(qemu) x/2i 0x1000b01
0x01000b00:  f4                       hlt      
0x01000b01:  eb fd                    jmp      0x1000b00
Not entirely sure what the issue is. I do remap the PIC after loading the interrupt table but before setting the interrupt flag with sti.

The source code is https://github.com/mxtlr/mmlv.

Re: GPF after recieving INT 0x20

Posted: Mon Aug 05, 2024 10:33 am
by nullplan
The first block of messages is regarding your hardware interrupt. Since an interrupt happened while processing a hlt instruction, the RIP of the interrupt points to the instruction following the hlt. This is fine.

But in the second block, you get a GPF (exception 0xd) with error code 0 (so it wasn't because of some segment), with the IP being

Code: Select all

RIP=81e800000021bf00
Well, if I was a CPU, I wouldn't want to execute at that address, either, since it is not canonical. I'm guessing the format of your interrupt gate for interrupt 0x20 is not correct. Are you aware that interrupt gates in x86_64 are long descriptors, i.e. take up 16 bytes? 8 bytes mostly following the protected mode format, followed by 4 bytes extension for the code offset, and 4 zero bytes for padding.

Re: GPF after recieving INT 0x20

Posted: Mon Aug 05, 2024 11:46 am
by Octocontrabass
Part of the problem is that you're copying code from a buggy tutorial (or from someone who followed a buggy tutorial) without fixing any of the bugs.

Part of the problem is that the tutorial is 32-bit, and the System V i386 psABI is very different from the x64 psABI.

Your interrupt gates are fine, though.

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 10:18 am
by avcado
nullplan wrote: Mon Aug 05, 2024 10:33 am Are you aware that interrupt gates in x86_64 are long descriptors, i.e. take up 16 bytes? 8 bytes mostly following the protected mode format, followed by 4 bytes extension for the code offset, and 4 zero bytes for padding.
Yes, which is why I have

Code: Select all

add rsp, 16
in both the IRQ and ISR stub. It seems like, when I'm setting up the IRQ stub table, instead of repeating 16 times, repeating 15 times, does not generate a GPF. i.e.

Code: Select all

global irq_stub_table
irq_stub_table:
%assign j 0
%rep 15
  dq irq_stub_%+j
%assign j j+1
%endrep
I find this weird, and it's probably a bug (as Octocontrabass pointed out). So, since I don't see any v=0xd in my QEMU logs, I set up a PIT handler to just
output "INT 20" to the console, but nothing.

Looks like, for some reason, my IRQ handler is not being called. I'll look into it and see if I can fix it

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 10:36 am
by nullplan
avcado wrote: Tue Aug 06, 2024 10:18 am Yes, which is why I have
That has nothing to do with it. I was talking about the size of the IDT gate descriptor. And the reason I asked was that the high part of the fault IP looked vaguely like the low part of a gate descriptor.

Well, I'd love to help you further but I cannot access your code. Clicking on the link gets me a 404. Thus I cannot check your IDT, which I suspect doesn't quite work right. Or else maybe it's your stubs.

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 10:43 am
by avcado
nullplan wrote: Tue Aug 06, 2024 10:36 am Well, I'd love to help you further but I cannot access your code. Clicking on the link gets me a 404. Thus I cannot check your IDT, which I suspect doesn't quite work right. Or else maybe it's your stubs.
Yeah that's on my part. Forgot an "r" in my username. The link is https://github.com/mxtlrr/mmlv
nullplan wrote: Tue Aug 06, 2024 10:36 am That has nothing to do with it. I was talking about the size of the IDT gate descriptor. And the reason I asked was that the high part of the fault IP looked vaguely like the low part of a gate descriptor.
I see. I followed the "Interrupts Tutorial" on the OSDev forum, but my thought that if the IDT gate descriptor is wrong then the ISRs shouldn't work either (but they do.) Maybe I'm wrong though.

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 11:19 am
by nullplan
So I had a look at your code now. And I saw the following issues:
  1. Your idt_entry_t is indeed correct. That was a red herring.
  2. irq_handler() and exception_handler() both take a registers_t by value. I don't know how the ABI for that works. I'd avoid it if possible. Change them to pointers, and set rsi to rsp after pushing. Otherwise the parameter is memory of the function called, and it is not required to maintain that memory at the same value.
  3. Your registers_t is wrong. In 64-bit mode, the pushes always push 64 bits, and the interrupt frame consists of 64-bit integers exclusively. Even the segment selectors, of which the high 48 bits may be set to garbage, but they take up 64 bits nonetheless.
  4. Both of your interrupt stubs fail to save and restore r8-r15. Did you forget about them?
  5. isr_cstb is missing the "add rsp, 16" that you mentioned.

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 12:02 pm
by avcado
nullplan wrote: Tue Aug 06, 2024 11:19 am So I had a look at your code now. And I saw the following issues:
  1. Your idt_entry_t is indeed correct. That was a red herring.
  2. irq_handler() and exception_handler() both take a registers_t by value. I don't know how the ABI for that works. I'd avoid it if possible. Change them to pointers, and set rsi to rsp after pushing. Otherwise the parameter is memory of the function called, and it is not required to maintain that memory at the same value.
  3. Your registers_t is wrong. In 64-bit mode, the pushes always push 64 bits, and the interrupt frame consists of 64-bit integers exclusively. Even the segment selectors, of which the high 48 bits may be set to garbage, but they take up 64 bits nonetheless.
  4. Both of your interrupt stubs fail to save and restore r8-r15. Did you forget about them?
  5. isr_cstb is missing the "add rsp, 16" that you mentioned.
So, I've added "add rsp, 16" to isr_cstb. Not entirely sure what you mean by points 2 & 3. I modified the registers_t struct to look like this:

Code: Select all

typedef struct registers {
	uint64_t rdi, rsi, rbp, rsp, rbx, rdx, rcx, rax;
	uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
	uint64_t ds, int_no, err_code;
	uint64_t rip;
} registers_t;
(I did push r8-r15 to stack for both isr_cstb and irq_cstb). Oddly, I'm getting the exact same "one interrupt" issue.

I remove the "asm('hlt')" and add a print statement to the while(1) loop. For some reason, RIP goes to 0x1bXXXXXXX and seems to be executing random code (which shouldn't happen):

Code: Select all

(qemu) x/1i $eip
0x1bf7b7ffe: unable to read memory
It seems like the code isn't even REACHING my irq handler, as I have a print statement that should output "IRQ recv'd". But it doesn't, and instead goes off to some random memory address.

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 3:51 pm
by Octocontrabass
avcado wrote: Tue Aug 06, 2024 12:02 pmNot entirely sure what you mean by points 2
Like this:

Code: Select all

mov rdi, rsp
call irq_handler

...

void irq_handler(registers_t * r)
avcado wrote: Tue Aug 06, 2024 12:02 pmI modified the registers_t struct to look like this:
It's good that everything is 64-bit now, but the struct layout doesn't match the order things were pushed on the stack.
avcado wrote: Tue Aug 06, 2024 12:02 pm(I did push r8-r15 to stack for both isr_cstb and irq_cstb).
Is there any particular reason why you have two nearly-identical common ISR stubs instead of just one?
avcado wrote: Tue Aug 06, 2024 12:02 pmI remove the "asm('hlt')" and add a print statement to the while(1) loop. For some reason, RIP goes to 0x1bXXXXXXX and seems to be executing random code (which shouldn't happen):
Part of the problem might be that you're not using a cross-compiler and not disabling the red zone.

Another QEMU interrupt log would probably be more helpful for debugging it, though.

Re: GPF after recieving INT 0x20

Posted: Tue Aug 06, 2024 7:48 pm
by MichaelPetch
In idt.asm you have:

Code: Select all

%macro isr_err_stub 1
isr_stub_%+%1:
  jmp isr_cstb
%endmacro

%macro isr_no_err_stub 1
isr_stub_%+%1:
  push byte %1
  jmp isr_cstb
%endmacro
In `isr_cstb` you are adding 16 to rsp at the end. This assumes though that you have the extra 16 bytes of data on the stack. You don't. With `%macro isr_err_stub 1` you don't push the ISR number, and in `isr_no_err_stub 1` you don't push a dummy error code (people prefer setting it to 0 in that case). So they should look like this:

Code: Select all

%macro isr_err_stub 1
isr_stub_%+%1:
  push byte %1
  jmp isr_cstb
%endmacro

%macro isr_no_err_stub 1
isr_stub_%+%1:
  push byte 0
  push byte %1
  jmp isr_cstb
%endmacro


For IRQ's a similar issue with

Code: Select all

%macro IRQ 2
irq_stub_%1:
  push byte %2
  jmp irq_cstb
%endmacro
should be:

Code: Select all

%macro IRQ 2
irq_stub_%1:
  push byte 0
  push byte %2
  jmp irq_cstb
%endmacro
I've also removed the unneeded CLI from macro IRQ since interrupt gates has the CPU automatically turn them off.

In idtr.c you enable interrupts at the end of `idt_init` which is bad because you haven't filled in the addresses of the interrupt handler. Move the `STI` instruction from `idt_init` to the end of `init_irqs`.

I think the strange RIP you see in the #GP (0x0d) exception log and the fact your IRQ stubs aren't being called is because of another bug in idtr.c. In init_irqs you have:

Code: Select all

   for(uint8_t vector = 32; vector < 48; vector++){
    idt_set_descriptor(vector, irq_stub_table[vector], 0x8e);
    vectors[vector] = true;
  }
The problem is irq_stub_table has entries in it from 0 to 15 (not 32 to 47). Your code should be:

Code: Select all

  for(uint8_t vector = 32; vector < 48; vector++){
    idt_set_descriptor(vector, irq_stub_table[vector-32], 0x8e);
    vectors[vector] = true;
  }
A serious issue you really need to rectify is that the 32-bit tutorial you are loosely basing this code on passes `registers_t` by value to `exception_handler` and `irq_handler`. Octocontrabass has pointed out this problem and if you don't fix it it may cause you unexpected problems. Structures passed on the stack by value are owned by the function, not the caller of the function. As a result it is possible for the compiler to use that area as a scratch area overwriting the `registers_t` structure which can corrupt the state data on the stack (when the corrupted state data is restored unusual things can happen). You should be passing a pointer to the `registers_t` structure instead to avoid this problem. This issue is covered on the OS Dev Wiki along with all the other errata for that tutorial: https://wiki.osdev.org/James_Molloy%27s ... r_handlers .

Your `registers_t` structure needs to have all fields in reverse order they were pushed on the stack. In idtr.h you need to change

Code: Select all

typedef struct registers {
       uint64_t rdi, rsi, rbp, rsp, rbx, rdx, rcx, rax;
       uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
       uint64_t ds, int_no, err_code;
       uint64_t rip;
} registers_t;
to:

Code: Select all

typedef struct registers {
        uint64_t ds, r15, r14, r13, r12, r11, r10, r9, r8;
        uint64_t rax, rcx, rdx, rbx, useless_rsp, rpb, rsi, rdi;
        uint64_t int_no, err_code;
        uint64_t rip, cs, rflags, rsp, ss;
} registers_t;
I've renamed the `rsp` field to `useless_rsp`. This is the stack pointer within the interrupt which is generally not very useful. I recommend removing it and remove the `push rsp` and `pop rsp` from all your stub handlers. It doesn't hurt anything by having it, but you won't have much need for it.

Re: GPF after recieving INT 0x20

Posted: Wed Aug 07, 2024 1:33 am
by nullplan
MichaelPetch wrote: Tue Aug 06, 2024 7:48 pm I've renamed the `rsp` field to `useless_rsp`. This is the stack pointer in interrupt context which is useless. I recommend removing it and remove the `push rsp` and `pop rsp` from all your stub handlers. It doesn't hurt anything by having it, but you won't have much need for it.
They'll definitely hurt if something does decide to clobber the value. "pop rsp" overwrites rsp with the value on stack, so you will suddenly have a different one. It's just that "push rsp" pushes the old value of rsp, so this works out.