Page 1 of 1

Set 'em and Forget 'em

Posted: Fri Mar 17, 2017 6:27 pm
by CelestialMechanic
In days of old (when knights were bold ...) the Linux kernel GDT had four main selectors (among others). These selectors were for data and code for ring 0, data and code for ring 3. The user selectors only covered the first 3 gigabytes of address space, so when a transition was made to ring 0 any data selectors such as DS and ES had to be replaced with their ring 0 counterparts for the duration of the system call or interrupt and restored on return. (I am referring to 32-byte systems here.)

Since then the preference has been for all four of these selectors to cover all 4 gigabytes of address space, and the paging system provides the protection for the upper 1 gigabyte. It has occurred to me that if DS, ES, FS, and GS are all set to the ring 3 data selector (and not used for special purposes such as thread local storage) that there will be no need to change these selectors again, ever. Just set 'em and forget 'em.

Of course we still need the ring 0 data selector for SS, but otherwise all memory can be read (and written to regardless of the S/U and R/W flags in the page table). When in user mode these bits will keep ring 3 code from accessing protected memory.

Has anyone tried this? Is there something I'm missing here? I'm about to have my microkernel start multithreading Real Soon Now, I will try this and no longer have my interrupts touch the DS, ES, FS, and GS registers.

Please forgive me if this topic was touched on in the past, but I could not find anything quite like this question.

Re: Set 'em and Forget 'em

Posted: Fri Mar 17, 2017 6:49 pm
by alexfru
SS:ESP will come from the TSS on transition into the kernel. However, the user can populate DS, ES, FS and GS with a null selector and you don't want a #GP in the kernel when accessing memory through null selectors.

Re: Set 'em and Forget 'em

Posted: Fri Mar 17, 2017 8:05 pm
by CelestialMechanic
SS:ESP will come from the TSS on transition into the kernel.
True, but the user should provide an initial ring 3 SS:ESP on the kernel stack above EFLAGS, CS:EIP at creation time for any thread meant to operate in user mode.
However, the user can populate DS, ES, FS and GS with a null selector and you don't want a #GP in the kernel when accessing memory through null selectors.
I've never heard of this. But I think I will limit myself to sensible values for selectors. I can't help thinking that somewhere down the road NULL selectors will cause problems. Indeed, the reason for the NULL selector was to provide a way for software to trap this error and (possibly) correct it.

Re: Set 'em and Forget 'em

Posted: Fri Mar 17, 2017 10:14 pm
by Brendan
Hi,
CelestialMechanic wrote:Has anyone tried this? Is there something I'm missing here? I'm about to have my microkernel start multithreading Real Soon Now, I will try this and no longer have my interrupts touch the DS, ES, FS, and GS registers.
This is one of my old tricks!
CelestialMechanic wrote:
SS:ESP will come from the TSS on transition into the kernel.
True, but the user should provide an initial ring 3 SS:ESP on the kernel stack above EFLAGS, CS:EIP at creation time for any thread meant to operate in user mode.
However, the user can populate DS, ES, FS and GS with a null selector and you don't want a #GP in the kernel when accessing memory through null selectors.
I've never heard of this. But I think I will limit myself to sensible values for selectors. I can't help thinking that somewhere down the road NULL selectors will cause problems. Indeed, the reason for the NULL selector was to provide a way for software to trap this error and (possibly) correct it.
Typically the kernel also uses FS or GS used for "per CPU" data.

This means:
  • Potentially malicious CPL=3 code can load NULL into DS, ES, FS or GS. When the kernel uses the segment register it causes a general protection fault; and the general protection fault handler can restore the correct value for that segment and return from the general protection fault (to re-try the instruction that caused the exception and continue running normally).
  • Potentially malicious CPL=3 code can load its code segment into DS, ES, FS or GS. When the kernel uses the segment register it causes a general protection fault; and the general protection fault handler can restore the correct value for that segment and return from the general protection fault (to re-try the instruction that caused the exception and continue running normally).
  • Potentially malicious CPL=3 code can load its code segment or its data segment into FS or GS. If the kernel makes sure that there's a "not present" area starting at virtual address 0x00000000 in every process, and also makes sure that all offsets in its "per CPU" data areas are smaller than the size of that "not present" area; then if CPL=3 code can load its data segment into FS or GS the kernel will get a page fault when trying to access the "per CPU" data; and the page fault handler can restore the correct value for FS or GS and return from the page fault (to re-try the instruction that caused the exception and continue running normally).
In this way you can never load kernel data segments during IRQ handlers and the kernel API, and just let the kernel "auto-correct" if malicious CPL=3 tried to mess things up. Because segment register loads are slow (and because most user-space software isn't malicous) this can improve performance.

Note that there are a few restrictions:
  • If you use a segment register for "task local storage" then potentially malicious CPL=3 code can also load its "task local storage" segment into DS, ES, FS or GS. I've always used paging to create "thread specific storage" instead, so I've never had to care about this.
  • If you use virtual8086 mode; then it will break everything (if an IRQ handler interrupts a virtual8086 mode task, then the kernel would end up using "nonsense real mode" values in segment registers). This is one of the reasons I started refusing to use Virtual8086 mode originally (back before UEFI and long mode existed). Fortunately (now that UEFI and long mode do exist) there's even less reason to want to use Virtual8086 mode.
For both of these cases; there's no sane way to avoid loading kernel data segments during IRQ handlers.


Cheers,

Brendan

Re: Set 'em and Forget 'em

Posted: Mon Mar 20, 2017 8:02 am
by onlyonemac
It's common these days to avoid using the segment registers for anything other than changing privilege level when required, and to use paging for memory protection. Paging makes the segmentation model redundant, and the only times you should ever have to change the segment registers if you're using paging is to get around a privilege level limitation imposed by the fact that, even if you don't use it, the segmentation system still exists and is still active.

Of course, you can always design your OS in another way that uses paging and segmentation alongside each other. I can't quite think of a situation where that would be a good design choice, but it's possible that someone might find an interesting way to use both together.