Still in Ring 0 after iret task switching

Viacheslav · Post by **Viacheslav** » Tue Jun 06, 2023 10:19 am

I have the following code performing task switch to jump to user mode:

global jump_usermode
extern usermode
jump_usermode:
    mov ax, 0x23
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax

    mov eax, esp
    push dword 0x23
    push eax
    pushfd
;;  make sure NT flag is set  
    or word [esp], 0x4000
    popfd
    pushfd
    push 0x1B
    push usermode
    iret

My TSS is initialized like this (setting back link field to the TSS segment selector):

Code: Select all

tss = (struct tss_entry){0}; 
tss.link = 0x28;
tss.ss0 = 0x10 | 0;
tss.esp0 = (uint32_t)kernel_int_stack_end;

ltr();

where ltr() basically executes ltr instruction with 0x28 segment selector.

However after landing in "usermode" function I still can execute privileged instruction like cli and I still can read cs segment selector equals 0x8.

Code: Select all

void usermode(void) {
    asm volatile ("cli");
    asm volatile("int $0x80");

    while(1);
}

I can not imagine why this happens, but basically iret with values loaded for Ring 3 has no effect at all.
Thank you for pointing where should I search it.

P.S.:
My GDT descriptor for TSS is defined as:

Code: Select all

gdt_fill_entry(5, (uintptr_t)&tss, sizeof(tss) - 1,
                   (1 << 0)  |  (1 <<  3)  |  (1 <<  7),
                   0x00);

Where the third field is the limit, the fourth is the access byte, and the fifth are Flags (0x00).

Octocontrabass · Post by **Octocontrabass** » Thu Jun 15, 2023 12:00 pm

Why are you using hardware task switching? You don't need a hardware task switch to jump to user mode.

Gigasoft · Post by **Gigasoft** » Fri Jun 16, 2023 2:04 am

You are switching from a task to itself, which is an invalid operation. Do not set the NT flag and leave the link field as 0, to allow the IRET to proceed normally.

Apart from that, if your OS is multitasking, your code also has another privilege escalation vulnerability where another thread can changed the pushed CS before the IRET happens.

Viacheslav · Post by **Viacheslav** » Fri Jun 16, 2023 5:03 am

Gigasoft wrote:You are switching from a task to itself, which is an invalid operation. Do not set the NT flag and leave the link field as 0, to allow the IRET to proceed normally.

Ok, now I set tss.link to 0 and leave the NT flag unset. I can see now that I'm in Ring 3 after IRET as my CS register is set to 0x18.
However when I'm running asm volatile ("int 0x80"), as I have mapped my syscall handler to index 128 in the IDT, the program just hangs up on "int 0x80" instruction. So I can not transition back to ring 0.
Is it the problem with my TSS structure ?

The same happens when executing some privileged instruction like CLI or RDMSR . I was expecting to get a General Protection Fault exception in my interrupt handler, but everything just freezes. I have tried on both Qemu and Bochs with the same result... (

Octocontrabass · Post by **Octocontrabass** » Fri Jun 16, 2023 10:49 am

Viacheslav wrote:I can see now that I'm in Ring 3 after IRET as my CS register is set to 0x18.

CS should be 0x1B for ring 3, not 0x18.

Viacheslav wrote:Is it the problem with my TSS structure ?

You need to set the IOPB offset in your TSS, but that won't fix the freezes. What kind of descriptors are in your IDT?

Viacheslav wrote:I have tried on both Qemu and Bochs with the same result... (

Does the Bochs log say anything when it freezes? How about QEMU with "-d int"?

Viacheslav · Post by **Viacheslav** » Fri Jun 23, 2023 7:02 am

I imagine it is better to set the IOPB to sizeof(tss) to indicate that there is no bitmap for now.

I filled the IDT entries 0-31 with exception handlers, 32-46 with PIC IRQ handlers and everything else with some void handler.
My exceptions are working well when I'm in ring 0.

In fact now I implemented syscalls with SYSENTER and SYSEXIT. And it really works.

However, the problem with interrupts in usermode didn't go.
The system still freezes when executing privileged instruction in userspace, so I have no way to handle it

qemu-system-i386 with -d int reports nothing. Bochs logs the same thing as when I don't enter usermode at all.

On the Osdev tutorial page for Getting_to_Ring_3 in TSS initialization section there is a function:

Code: Select all

 
void set_kernel_stack(uint32_t stack) { // Used when an interrupt occurs
	tss_entry.esp0 = stack;
}

which is never used in the tutorial's code.
I don't get the point of this one, as we already the kernel stack pointer inside the write_tss.
Do I miss something here ?

.

Octocontrabass · Post by **Octocontrabass** » Fri Jun 23, 2023 10:19 am

Viacheslav wrote:qemu-system-i386 with -d int reports nothing.

Try adding "-accel tcg". QEMU's logging functions don't seem to work correctly with hardware acceleration.

Viacheslav wrote:I don't get the point of this one, as we already the kernel stack pointer inside the write_tss.
Do I miss something here ?

That function is meant to be used when switching tasks. Switching tasks normally involves switching to a different kernel stack, and that means you'll need the TSS to point to a different stack.

Viacheslav · Post by **Viacheslav** » Thu Jun 29, 2023 11:20 am

Octocontrabass wrote:Try adding "-accel tcg". QEMU's logging functions don't seem to work correctly with hardware acceleration.

Qemu says: "The -accel and "-machine accel=" options are incompatible", although I didn't add -machine option. I'm using qemu version 7.1.0

Anyways I added "-machine accel=tcg" and I got a hole bunch of logs after executing "cli" in user mode:
(This is the last part of the log file that repeats about 50 times with different register values, so I could not add the hole text here as the website died immediately.
The full log file is here , as osdev.org does not support 64kb+ files.)

Code: Select all

SMM: enter
EAX=000000b5 EBX=00007d85 ECX=00005678 EDX=00000003
ESI=3fef1350 EDI=3ffbecc0 EBP=00006958 ESP=00006958
EIP=000f7d84 EFL=00000016 [----AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6180 00000037
IDT=     000f61be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000014 CCD=00006944 CCO=EFLAGS
EFER=0000000000000000
SMM: after RSM
EAX=000000b5 EBX=00007d85 ECX=00005678 EDX=00000003
ESI=3fef1350 EDI=3ffbecc0 EBP=00006958 ESP=00006958
EIP=00007d85 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =d980 000d9800 ffffffff 00809300
CS =f000 000f0000 ffffffff 00809b00
SS =0000 00000000 ffffffff 00809300
DS =0000 00000000 ffffffff 00809300
FS =0000 00000000 ffffffff 00809300
GS =ca00 000ca000 ffffffff 00809300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 00000000
IDT=     00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=00000001 CCO=EFLAGS
EFER=0000000000000000
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x09
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x09
Servicing hardware INT=0x08
Servicing hardware INT=0x08

The logs after "int 0x80" look similarly (Full log file):

Code: Select all

SMM: enter
EAX=000000b5 EBX=00007d85 ECX=00005678 EDX=00000003
ESI=3fef1350 EDI=3ffbecc0 EBP=00006958 ESP=00006958
EIP=000f7d84 EFL=00000016 [----AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f6180 00000037
IDT=     000f61be 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000014 CCD=00006944 CCO=EFLAGS
EFER=0000000000000000
SMM: after RSM
EAX=000000b5 EBX=00007d85 ECX=00005678 EDX=00000003
ESI=3fef1350 EDI=3ffbecc0 EBP=00006958 ESP=00006958
EIP=00007d85 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =d980 000d9800 ffffffff 00809300
CS =f000 000f0000 ffffffff 00809b00
SS =0000 00000000 ffffffff 00809300
DS =0000 00000000 ffffffff 00809300
FS =0000 00000000 ffffffff 00809300
GS =ca00 000ca000 ffffffff 00809300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 00000000
IDT=     00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=00000001 CCO=EFLAGS
EFER=0000000000000000
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x08
Servicing hardware INT=0x09
Servicing hardware INT=0x08
Servicing hardware INT=0x09
Servicing hardware INT=0x08
Servicing hardware INT=0x08

Not sure what those "servicing hardware" messages mean.
I can't explain also why my segment selectors after "SMM: after RSM" message are filled with some rubbish.

Anyway Thank you for the help. I have probably done some dumb mistake somewhere in GDT/TSS initialization.
Sadly I don't know any other way to test if my TSS is valid.
I would be glad if this can help to find the issue.

Octocontrabass · Post by **Octocontrabass** » Fri Jun 30, 2023 7:14 pm

Code: Select all

check_exception old: 0xffffffff new 0xd
  2523: v=0d e=0000 i=0 cpl=3 IP=001b:c010bb8e pc=c010bb8e SP=0023:c01187c8 env->regs[R_EAX]=00000006

This is a #GP probably caused by trying to execute a privileged instruction in ring 3.

Code: Select all

TR =002b c011a000 00000067 0000e900 DPL=3 TSS32-avl

Why is your TSS DPL set to 3 instead of 0?

Code: Select all

check_exception old: 0xd new 0xe
  2524: v=0e e=0002 i=0 cpl=3 IP=001b:c010bb8e pc=c010bb8e SP=0023:c01187c8 CR2=fffffffc

In the process of handling the #GP, a #PF occurs. Based on the value of CR2, it looks like there might be a problem with ESP0 in your TSS. (That might mean your struct isn't defined correctly.)

Code: Select all

check_exception old: 0xe new 0xe
  2525: v=08 e=0000 i=0 cpl=3 IP=001b:c010bb8e pc=c010bb8e SP=0023:c01187c8 env->regs[R_EAX]=00000006

In the process of handling the #PF, another #PF occurs. The CPU turns it into a #DF because this situation is not recoverable.

Code: Select all

check_exception old: 0x8 new 0xe
  2526: v=03 e=0000 i=1 cpl=0 IP=0008:000efb51 pc=000efb51 SP=0010:00000fc8 env->regs[R_EAX]=000f6106

In the process of handling the #DF, another #PF occurs. That should be a triple fault, I'm not sure what QEMU is doing.

nullplan · Post by **nullplan** » Fri Jun 30, 2023 10:15 pm

Viacheslav wrote:Not sure what those "servicing hardware" messages mean.

Those mean that you failed to initialize the PICs correctly. But considering your other problems, that may be OK for now. It needs correcting eventually though. Hardware interrupt 8 is the timer interrupt in BIOS compatible mode. For protected mode, you need to reprogram the PICs to send their IRQs on different interrupts, typically 0x20 and 0x28. Otherwise you cannot distinguish hardware timer interrupts and double faults.

Octocontrabass · Post by **Octocontrabass** » Fri Jun 30, 2023 10:33 pm

Oh, I forgot to check the second log. The only part where it's different is the first exception:

Code: Select all

check_exception old: 0xffffffff new 0xd
  5470: v=0d e=0402 i=0 cpl=3 IP=001b:c010bb8e pc=c010bb8e SP=0023:c01187c8 env->regs[R_EAX]=00000006

This is a #GP caused by trying to use INT 0x80 when the IDT doesn't allow it.

Viacheslav · Post by **Viacheslav** » Sat Jul 01, 2023 5:48 pm

Octocontrabass wrote:
Code: Select all
check_exception old: 0xffffffff new 0xd
  5470: v=0d e=0402 i=0 cpl=3 IP=001b:c010bb8e pc=c010bb8e SP=0023:c01187c8 env->regs[R_EAX]=00000006
This is a #GP caused by trying to use INT 0x80 when the IDT doesn't allow it.

Ok, that was a lot of bugs, I really had a problem with kernel stack pointer in my TSS. The GPL of the TSS was initially set to 0, I have really set it to 3 later while experimenting. And finally DPL bits were not set for 0x80 IDT entry.

Now everything works just fine, however I would stick to SYSENTER/SYSEXIT way for syscalls if they are really more efficient on x86 systems.

Thank you

P.S. Seems that now I learned how to read qemu logs, you are just searching for those "check_exception" messages...

Viacheslav · Post by **Viacheslav** » Sun Jul 02, 2023 5:13 am

nullplan wrote:Those mean that you failed to initialize the PICs correctly. But considering your other problems, that may be OK for now. It needs correcting eventually though. Hardware interrupt 8 is the timer interrupt in BIOS compatible mode. For protected mode, you need to reprogram the PICs to send their IRQs on different interrupts, typically 0x20 and 0x28. Otherwise you cannot distinguish hardware timer interrupts and double faults.

That would be strange, as I did it the way Osdev wiki does. And I'm clearly receiving ISR number 0x21 on a keyboard interrupt.

Octocontrabass · Post by **Octocontrabass** » Sun Jul 02, 2023 2:23 pm

You reprogram the PICs, but your bootloader probably doesn't.

OSDev.org

Still in Ring 0 after iret task switching

Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching

Re: Still in Ring 0 after iret task switching