Why do you have to trick the CPU into going to ring 3? (x86)

j4cobgarby · Post by **j4cobgarby** » Tue Oct 08, 2019 4:16 am

I'm reading the following article: https://wiki.osdev.org/Getting_to_Ring_3
It also seems fairly okay, except for the last code snippet:

Code: Select all

GLOBAL _jump_usermode ;you may need to remove this _ to work right.. 
EXTERN _test_user_function
_jump_usermode:
     mov ax,0x23
     mov ds,ax
     mov es,ax 
     mov fs,ax 
     mov gs,ax ;we don't need to worry about SS. it's handled by iret
 
     mov eax,esp
     push 0x23 ;user data segment with bottom 2 bits set for ring 3
     push eax ;push our current ss for the iret stack frame
     pushf
     push 0x1B; ;user code segment with bottom 2 bits set for ring 3
     push _test_user_function ;may need to remove the _ for this to work right 
     iret
;end

What's up with this? I read that they're doing it this way because for an unexplained reason they have to trick the CPU into entering ring 3? Why could this possibly be? I really don't understand it, since as far as I can work out, this code is supposed to be run from ring 0, and I thought that ring 0 always had permission to run ring 3 code, so what's stopping me from simply running a CALL instruction to the beginning of the ring 3 userland code?

iansjack · Post by **iansjack** » Tue Oct 08, 2019 4:57 am

What's to stop you is that you will get a GPF exception. The processor doesn't allow a direct call from a higher privilege level to a lower one.

nullplan · Post by **nullplan** » Tue Oct 08, 2019 11:40 am

If it was possible to perform a far call from ring 0 to ring 3, where would the return address be stored? If on the kernel stack, then it is unaccessible to the user code that would execute the return instruction. If on the user stack, then user code could change the return pointer and execute whatever it wants as ring 0.

Let's take a step back: In order to run at ring 3, you need to change CS. How can you do that?

1. Far call
2. Far jump
3. Far return
4. Interrupt return

For the first three, you immediately get #GP if target DPL is different from CPL. Call could use a call gate but you probably did not want to install a new descriptor into your GDT just for a single use. Also, if the target is a nonconforming segment, target DPL must be less than or equal to CPL, so that would be violated here.

If you're using hardware task switching, you could jump to the task gate. But you are probably not using task gates, and shouldn't, since nobody uses that mechanism, so it is ill-tested. That leaves the interrupt return as the only way to set CS to a segment with a larger DPL than CPL. And it allows you to set EFLAGS, SS, ESP, and EIP at the same time.

Of course, you could also use SYSEXIT or SYSRET if you so choose.

Don't feel bad though, since returning from an interrupt that never happened is what you do on most architectures to enter userspace. PPC, for instance, does not allow you to enter Problem State without activating the MMU at the same time, and the only way that works is by executing the RFI instruction. RFI stands for "return from interrupt".

j4cobgarby · Post by **j4cobgarby** » Tue Oct 08, 2019 4:42 pm

iansjack wrote:What's to stop you is that you will get a GPF exception. The processor doesn't allow a direct call from a higher privilege level to a lower one.

Maybe I'm wrong then, I thought ring 0 was a higher privelege level than ring 3

j4cobgarby · Post by **j4cobgarby** » Tue Oct 08, 2019 4:43 pm

nullplan wrote:If it was possible to perform a far call from ring 0 to ring 3, where would the return address be stored? If on the kernel stack, then it is unaccessible to the user code that would execute the return instruction. If on the user stack, then user code could change the return pointer and execute whatever it wants as ring 0.

Let's take a step back: In order to run at ring 3, you need to change CS. How can you do that?

1. Far call
2. Far jump
3. Far return
4. Interrupt return

For the first three, you immediately get #GP if target DPL is different from CPL. Call could use a call gate but you probably did not want to install a new descriptor into your GDT just for a single use. Also, if the target is a nonconforming segment, target DPL must be less than or equal to CPL, so that would be violated here.

If you're using hardware task switching, you could jump to the task gate. But you are probably not using task gates, and shouldn't, since nobody uses that mechanism, so it is ill-tested. That leaves the interrupt return as the only way to set CS to a segment with a larger DPL than CPL. And it allows you to set EFLAGS, SS, ESP, and EIP at the same time.

Of course, you could also use SYSEXIT or SYSRET if you so choose.

Don't feel bad though, since returning from an interrupt that never happened is what you do on most architectures to enter userspace. PPC, for instance, does not allow you to enter Problem State without activating the MMU at the same time, and the only way that works is by executing the RFI instruction. RFI stands for "return from interrupt".

Thanks! That makes a lot of sense. What are SYSEXIT and SYSRET, by the way?

iansjack · Post by **iansjack** » Wed Oct 09, 2019 12:18 am

Might I suggest that you read the Intel Programmer's Manuals. They answer the questions that you have asked here.

linguofreak · Post by **linguofreak** » Wed Oct 09, 2019 11:06 am

To add to what nullplan said, note that except for the first entry into a new process when it is first set up (possibly even just the first entry into the first process started, if you have a Unix-like fork/exec mechanism and implement exec mostly or entirely in userspace), every kernel -> userspace transition will be a return to the point where some bit of previously-running userspace code left off when a userspace -> kernel transition occured, either because that userspace code called the kernel, or because a hardware interrupt interrupted the userspace code. Even if the userspace code in question is not the userspace code that had been running immediately previously, it will have previously have made a kernel -> userspace transition by one of the two methods above, with the scheduler having scheduled another process (or more than one) in the interim.

So since the first entry into a process will only occur once, and but reentries will occur an arbitrarily high number of times, it makes sense to implement initial entry in the same way as reentry, even if it involves tricking the CPU into thinking it's returning to code that had already been running.

OSDev.org

Why do you have to trick the CPU into going to ring 3? (x86)

Why do you have to trick the CPU into going to ring 3? (x86)

Re: Why do you have to trick the CPU into going to ring 3? (

Re: Why do you have to trick the CPU into going to ring 3? (

Re: Why do you have to trick the CPU into going to ring 3? (

Re: Why do you have to trick the CPU into going to ring 3? (

Re: Why do you have to trick the CPU into going to ring 3? (

Re: Why do you have to trick the CPU into going to ring 3? (