Interrupt on privilege change does not push SS:ESP

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
CaramelizedSophus
Posts: 4
Joined: Fri Jun 24, 2022 1:39 am

Interrupt on privilege change does not push SS:ESP

Post by CaramelizedSophus »

Hello everyone, it's been a while...

I wanted to tell you about a problem I have when handling interrupts from user space, or rather, from ring 3.

Let's go by parts. I'll put it in a list to make it easier to read.

Where we are?:
- We are in protected mode only (32 bits).
- GDT and IDT are correctly configured (i will provide more info below).
- We ignore the paging code for now, since it is disabled for testing purposes.
- I do set a new stack for the kernel.
- Kernel interrupts (without DPL change) are executed correctly, both hardware and exceptions.
- Jump to ring 3 is done correctly, and everything runs fine.
- The selectors change and are fine, and the CPL is fine(3).

The problem:
- When a hardware (timer) interrupt is received, it jumps to the interrupt handler, but SS:ESP does not get on the stack.
- The user stack is kept, it does not change to the kernel stack.
- Ignores ss0 and esp0 in the TSS.
- Being IOPL=0, if it tries to acknowledge the PIC interrupts, a GPF occurs.
- All this ends messing up the stack.

More info:

The GDT:
  • NULL all zeroed
  • KERNEL DPL(0) CS = 0x08; BASE=0x00000000; LIMIT= 0xFFFFFFFF; ACCESS=0x9A; FLAGS=0xC
  • KERNEL DPL(0) DS = 0x10; BASE=0x00000000; LIMIT= 0xFFFFFFFF; ACCESS=0x92; FLAGS=0xC
  • USER DPL(3) CS = 0x18; BASE=0x00000000; LIMIT= 0xFFFFFFFF; ACCESS=0xFA; FLAGS=0xC
  • USER DPL(3) DS = 0x20; BASE=0x00000000; LIMIT= 0xFFFFFFFF; ACCESS=0xF2; FLAGS=0xC
  • TSS = 0x28; BASE=0x00000000; LIMIT= 0xFFFFFFFF; ACCESS=0x9A; FLAGS=0xC
The IDT:

All Entries(hardware interrupts and exceptions) are with
  • Selector: KERNEL CS (0x8)
  • DPL = 0
  • 32-bit Interrupt gate
The TSS:
  • SS0 = 0x10(kernel DS)
  • ESP0 = about 0x0010922c
  • CS = kernel CS | 0x3
  • DS,ES,FS,GS,SS = kernel DS | 0x3
Debug Info:

First of all, say that I dumped the memory related to all system structures like GDT, TSS and IDT, and checked that are ok. Also checked, the stack push and pop, instruction by instruction, in kernel mode and when jumping to user mode(by iret).

This is my selectors before setting the for the jump, while still in supervisor mode:

Code: Select all

<bochs:11> sreg
es:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
cs:0x0008, dh=0x00cf9d00, dl=0x0000ffff, valid=1
	Code segment, base=0x00000000, limit=0xffffffff, Execute-Only, Conforming, Accessed, 32-bit
ss:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ds:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
fs:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
gs:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ldtr:0x0000, dh=0x00008200, dl=0x0000ffff, valid=1
tr:0x0028, dh=0x0000eb10, dl=0x896089c8, valid=1
gdtr:base=0x00108900, limit=0x4f
idtr:base=0x00108060, limit=0x7ff
Now, how the stack is filled before the iret:

Code: Select all

<bochs:28> print-stack
Stack address size 4
 | STACK 0x00109214 [0x0010295c] (<unknown>)  << This is my user function
 | STACK 0x00109218 [0x0000001b] (<unknown>)  << My user cs ... 0x18 | 0x3(rpl)
 | STACK 0x0010921c [0x00000202] (<unknown>)  << The eflags
 | STACK 0x00109220 [0x00106400] (<unknown>)  << The new stack (taken into consideration that it grows down, don't worry)
 | STACK 0x00109224 [0x00000023] (<unknown>)  << The user stack selector, ss.... 0x20 | 0x3 (rpl)
 | STACK 0x00109228 [0x00100350] (<unknown>)   << The ret from kernel, just ignore
 | STACK 0x0010922c [0x00000000] (<unknown>)
 | STACK 0x00109230 [0x00000000] (<unknown>)
 | STACK 0x00109234 [0x00000000] (<unknown>)
 | STACK 0x00109238 [0x00000000] (<unknown>)
 | STACK 0x0010923c [0x00000000] (<unknown>)
....
From user, I can write a string to the screen framebuffer, so that is OK.

Now, after the switch, the selectors look like this:

Code: Select all

<bochs:32> sreg
es:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
cs:0x001b, dh=0x00cffb00, dl=0x0000ffff, valid=1
	Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-Conforming, Accessed, 32-bit
ss:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ds:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
fs:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
gs:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ldtr:0x0000, dh=0x00008200, dl=0x0000ffff, valid=1
tr:0x0028, dh=0x0000eb10, dl=0x896089c8, valid=1
gdtr:base=0x00108900, limit=0x4f
idtr:base=0x00108060, limit=0x7ff
And then, when it jumps to the handler, the selectors look like this:

Code: Select all


es:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
cs:0x000b, dh=0x00cf9d00, dl=0x0000ffff, valid=1
	Code segment, base=0x00000000, limit=0xffffffff, Execute-Only, Conforming, Accessed, 32-bit
ss:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=31
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ds:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=31
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
fs:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
gs:0x0023, dh=0x00cff300, dl=0x0000ffff, valid=1
	Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ldtr:0x0000, dh=0x00008200, dl=0x0000ffff, valid=1
tr:0x0028, dh=0x0000eb10, dl=0x896089c8, valid=1
gdtr:base=0x00108900, limit=0x4f
idtr:base=0x00108060, limit=0x7ff

Also, look at the stack:

Code: Select all

 | STACK 0x001063e8 [0x00000000] (<unknown>) << EAX value, not important for us
 | STACK 0x001063ec [0x00000020] (<unknown>) << Interrupt number(32 = timer)
 | STACK 0x001063f0 [0x00000000] (<unknown>) << Dummy error code
 | STACK 0x001063f4 [0x00102971] (<unknown>) << Correct EIP from user function
 | STACK 0x001063f8 [0x0000001b] (<unknown>) << The user CS
 | STACK 0x001063fc [0x00000216] (<unknown>) << The eflags
 | STACK 0x00106400 [0x000a7325] (<unknown>) << Garbage(out of stack boundaries)
 | STACK 0x00106404 [0x616c6f48] (<unknown>) << Garbage(out of stack boundaries)
 | STACK 0x00106408 [0x6e754d20] (<unknown>) << Garbage(out of stack boundaries)
IMPORTANT: Look at the stack frame before and after, and you realize. ESP is not changed upon interrupt.
Kernel stack is 0x0010922c and the user stack is on 0x001063fc.

So SS does not change, nor ESP. And the user SS:ESP are not pushed to the stack, the kernel stack!!


Some code:

Now lets have a look on our code.

First of all, the GDT and TSS:

Code: Select all

void gdt_init()
{
    //null entry
    gdt_set_gate(GDT_NULL_ENTRY,0,0,0,0);
    //kernel code segment
    gdt_set_gate(GDT_KERNEL_CS_ENTRY, 0xFFFFFFFF, 0x0, (GDT_P|GDT_S|GDT_EX|GDT_DC), (GDT_G|GDT_DB));
    //kernel data segment
    gdt_set_gate(GDT_KERNEL_DS_ENTRY, 0xFFFFFFFF, 0x0, (GDT_P|GDT_S|GDT_RW), (GDT_G|GDT_DB));
    //user code segment
    gdt_set_gate(GDT_USER_CS_ENTRY, 0xFFFFFFFF, 0x0, (GDT_P|GDT_DPL(3)|GDT_S|GDT_EX|GDT_RW), (GDT_G|GDT_DB));
    //user data segment
    gdt_set_gate(GDT_USER_DS_ENTRY, 0xFFFFFFFF, 0x0, (GDT_P|GDT_DPL(3)|GDT_S|GDT_RW), (GDT_G|GDT_DB));

    gdt.base = (uint32) &gdt_entries;
    gdt.limit = (sizeof(gdt_entry_t)*MAX_GDT_ENTRIES)-1;

    install_tss(GDT_TSS_ENTRY, KERNEL_DS, 0x00); //I update the stack later
    

    __install_gdt(&gdt);
    __flush_tss(GDT_TSS_ENTRY * sizeof(gdt_entry_t));
}

void update_tss_stack(uint32 kernel_esp)
{
    kernel_tss.esp0 = kernel_esp;
}

void install_tss (uint32 entry, uint16 ss, uint32 esp)
{
    uint32 base = (uint32) &kernel_tss;

    gdt_set_gate(entry, base + sizeof(tss_t), base, 
    (GDT_P | GDT_DPL(3) | TSS_32B_AVAIL), 0);

    memset(&kernel_tss, 0, sizeof(tss_t));

    kernel_tss.ss0 = ss;
    kernel_tss.esp0 = esp;

    kernel_tss.cs = KERNEL_CS | 0x3;
    kernel_tss.ss = KERNEL_DS | 0x3;
    kernel_tss.es = KERNEL_DS | 0x3;
    kernel_tss.ds = KERNEL_DS | 0x3;
    kernel_tss.fs = KERNEL_DS | 0x3;
    kernel_tss.gs = KERNEL_DS | 0x3;
}
The ltr instruction:

Code: Select all

__flush_tss:
    movw 4(%esp), %ax
    ltr %ax
ret
The user mode jump code:

Code: Select all


_user_stack_bottom:
.space 1024, 0
_user_stack:

.section .text
.global __jump_user
__jump_user:
    cli
    mov $0x23, %eax
    movw %ax, %ds
    movw %ax, %es
    movw %ax, %fs
    movw %ax, %gs

    mov $_user_stack, %eax

    pushl $0x23 // SS ESP EFLAGS CS EIP
    pushl %eax

    pushfl
    pop %eax
    or $0x200, %eax
    push %eax

    pushl $0x1b
    push $user_entry
    
iret

In the user_entry function I do not do nothing weird, just print a string and keep the CPU on an endless loop:

Code: Select all

user_entry:
        
    mov $fmt, %ebx
    pushl %ebx
    mov $hola, %ebx
    pushl %ebx
    call kprintf

    add $8, %esp

    w:
    nop
    nop
    nop
    jmp w
This is the IRQ handler:

Code: Select all


_irq_\num:
    xchg %bx,%bx //Bochs breakpoint, just ignore
    pushl $\num
    pushl $\num+HW_OFFSET
    jmp _irq_common
.endm

_irq_common:
    pushal
    
    pushl %ds
    pushl %es
    pushl %fs
    pushl %gs

    push %esp
    call irq_handler

    addl $4, %esp

    popl %gs
    popl %fs
    popl %es
    popl %ds

    popal

    addl $8,%esp

iret
And last, but not least, the structure (passed by reference) to the C handler:

Code: Select all


struct regs
{
	uint32 gs, fs, es, ds;//0-12
	uint32 edi, esi, ebp, esp, ebx, edx, ecx, eax;//16----32-36-40-44
	uint32 int_no, err_code;
	uint32 eip, cs, eflags, useresp, ss;
};

I think that's all. If you have any question, need more info, or the code itself, just let me know.

Bye!
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Interrupt on privilege change does not push SS:EIP

Post by Octocontrabass »

CaramelizedSophus wrote:First of all, say that I dumped the memory related to all system structures like GDT, TSS and IDT, and checked that are ok.
Check again. Your GDT is definitely wrong, and I suspect your IDT is also wrong.
CaramelizedSophus wrote:

Code: Select all

cs:0x0008, dh=0x00cf9d00, dl=0x0000ffff, valid=1
	Code segment, base=0x00000000, limit=0xffffffff, Execute-Only, Conforming, Accessed, 32-bit
Your kernel code segment should not be conforming.
MichaelPetch
Member
Member
Posts: 797
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Interrupt on privilege change does not push SS:EIP

Post by MichaelPetch »

Octocontrabass wrote:
CaramelizedSophus wrote:

Code: Select all

cs:0x0008, dh=0x00cf9d00, dl=0x0000ffff, valid=1
	Code segment, base=0x00000000, limit=0xffffffff, Execute-Only, Conforming, Accessed, 32-bit
Your kernel code segment should not be conforming.
I have to agree with Octo your primary problem is that your Kernel Code segment is set to conforming. What this means is that when control is transferred to the interrupt handler the CPL will not be adjusted to the requested privilege level of 0. CPL will remain whatever it currently was when the interrupt happened. If in user mode that means CPL will remain 3 while the interrupt handler is called. Since there is no privilege level change there is no stack transition that is done and thus SS0:ESP0 is never read from the TSS and the user mode stack is used.

That seems to track with the information you are providing. You say you set CS this way:

Code: Select all

KERNEL DPL(0) CS = 0x08; BASE=0x00000000; LIMIT= 0xFFFFFFFF; ACCESS=0x9A; FLAGS=0xC
Access byte of 0x9a is non-conforming, but your BOCHS output shows the conforming bit is set. You don't show the code for `__install_gdt` but I wonder if `__install_gdt` does a LGDT and sets the DS,ES,FS,GS and SS segment registers and doesn't do a FAR JMP (ljmp) to set CS. If you didn't do a far jump to set CS, CS will be using cached values from whatever the previous GDT was using for CS access attributes. Can you show us `__install_gdt`?

Once you fix the conforming segment issue problem you are going to have to modify your `_irq_common` function so that after you push the segment registers you set DS,ES,FS,GS to the value 0x10 (your kernel data segment). If you don't it will attempt to use the previous values for those registers which would be your user mode segments if the interrupt occurs while in ring 3.

Edit: I noticed after the fact your `install_tss` function sets the CS/DS/ES/SS/FS/GS registers to `KERNEL_DS | 0x3;`. When doing a transitions between different privilege levels to handle an interrupt the processor doesn't set the state of the segment registers other than SS (from SSx:ESPx) UNLESS the interrupt descriptor entry for that interrupts points at a task gate. Task gates for interrupts are generally not used as they are slow and have a fair amount of overhead. Tasks are also not re-entrant.

Although not the cause of any problem your TSS size is calculated incorrectly. As well DPL being 3 may work but realistically setting it to 0 would mean it would be inaccessible at anything but kernel mode. You have:

Code: Select all

    gdt_set_gate(entry, base + sizeof(tss_t), base,
    (GDT_P | GDT_DPL(3) | TSS_32B_AVAIL), 0);
I'd change it to be:

Code: Select all

    gdt_set_gate(entry, sizeof(tss_t)-1, base,
    (GDT_P | GDT_DPL(0) | TSS_32B_AVAIL), 0);
Post Reply