FS Base and GS Base got erased on CPL change

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
songziming
Member
Member
Posts: 71
Joined: Fri Jun 28, 2013 1:48 am
Contact:

FS Base and GS Base got erased on CPL change

Post by songziming »

My kernel (64-bit mode) use fs.base to store per-cpu section offsets, so per-cpu variables on current cpu can be referenced like:

Code: Select all

movl %fs:(variable), %eax
But I found that fs.base and gs.base got erased when CPL changes. I first wrote 2 magic values to fs.base and gs.base in ring0, then jump to ring3, performing a syscall to jump back to ring0. Then in that syscall handler I read the values from fs.base and gs.base, and they're all zero.

I searched in Intel and AMD manuals but they didn't mention that behavior. So I wonder is that normal?
If that's a feature, should i store per-cpu offsets in KernelGS.Base, and perform swapgs on every CPL change?

Thanks in advance
Reinventing the Wheel, code: https://github.com/songziming/wheel
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: FS Base and GS Base got erased on CPL change

Post by alexfru »

The IRET instruction will clear the segment registers, which the code (to which IRET returns) can't use without privilege violation.
See the pseudo-code for IRET in the intel manual:

Code: Select all

RETURN-TO-OUTER-PRIVILEGE-LEVEL:
...
CPL ← RPL of the return code segment selector;
FOR each of segment register (ES, FS, GS, and DS)
    DO
        IF segment register points to data or non-conforming code segment
        and CPL > segment descriptor DPL (* Stored inhidden part of segment register *)
            THEN (* Segment register invalid *)
                SegmentSelector ← 0; (* NULL segment selector *)
        FI;
    OD;
songziming
Member
Member
Posts: 71
Joined: Fri Jun 28, 2013 1:48 am
Contact:

Re: FS Base and GS Base got erased on CPL change

Post by songziming »

@alexfru that's very informative. So now I use gs to store per-cpu offset, and perform swapgs when necessary, the code is working now. (In the exception/interrupt handlers, I manually check the last 2 bits of CS saved on stack, and do swapgs if they are 1)

Since there are swapgs, kernel and user softwares can use their own version of gs.base. But what about fs.base? If user application want to use fs.base, and after wrfsbase, the value saved in fs.base will be erased during the next syscall. And there's no swapfs instruction.

Does software even use fs.base? And if they do, how do OSes like Linux save the value of fs.base during system call.
Reinventing the Wheel, code: https://github.com/songziming/wheel
User avatar
zhiayang
Member
Member
Posts: 368
Joined: Tue Dec 27, 2011 7:57 am
Libera.chat IRC: zhiayang

Re: FS Base and GS Base got erased on CPL change

Post by zhiayang »

Code: Select all

IF segment register points to data or non-conforming code segment
Personally, I load %fs and %gs with null selectors, since they only "participate" in segmentation insofar as their base address is used, and that can be controlled either via wrfsbase or the appropriate MSR. Since the selector would not be pointing to a data or code descriptor, they do not get zeroed out. (I think -- I just tested this, and my fsbase value appears to stick across to the ring3 process)

Also, the processor doesn't zero out the base address, it simply writes a 0 to the selector. On Intel, this has the behaviour which you describe, in that the processor will also zero out the base address & limit (in the hidden part of the segment descriptor) when a NULL value is written to the selector. On AMD processors, the manual (in chapter 2, section 4.5.3) explicitly states:
When a null selector is loaded into FS or GS, the contents of the corresponding hidden descriptor register are not altered.
EDIT: I appear to have misread (and am mistaken) -- it does appear to be the case that while in Ring 0, FSBase becomes 0...

EDIT 2: this is not the case, I was doubly mistaken -- I had forgotten about some code I placed to zero out the segment selectors (including %fs) on a context switch -- naturally this clears the hidden base register (on QEMU at least, which appears to emulate the Intel behaviour). The stuff above should be correct, I think.
Post Reply