OSDev.org

Posted: **Wed Nov 28, 2018 10:43 am**

Hi all,

laughing at the misfortune of other is always cathartic, but it just doesn't improve the situation. So I recently looked at the CPU Bugs wiki page and noticed that for most of them, no mitigations are listed at all. So I thought maybe we should add them.

So to start, I'll list the mitigations I know:

1. ESP not cleared

When returning from 64-bit or 32-bit mode to 16-bit mode, bits 31:16 of ESP aren't cleared. Bits 63:32 probably aren't, either, but you can't see those from 16-bit mode. Linux's solution is to leak those bits with pride! That is to say, they create a special tiny stack per CPU and switch to it before returning to anything that might be a 16-bit segment. That tiny stack (needs only 48 bytes in 64-bit mode) is set to readonly, since the only thing they do with it is to write the IRET frame to it (using a writable address) and then IRET. If an exception arrives while the kernel is on that stack, it will immediately cause a page fault, and therefore a double fault. #DF is on an IST stack. The #DF handler will recognize the situation and set things up so it looks like a general protection fault came in from userspace.

This way, the bits are still leaking, but they only identify the CPU the code is running on, not the actual kernel stack address. And since the stacks are so small and the location is randomized on bootup, you could only find out about your CPU if the machine has more than 85 CPUs.

I just noticed: That means 16-bit userspace can never use the top 16 bits of ESP. Which is bad, because usually userspace can do whatever they want with the registers and the kernel needn't care. You could use BX for a stack pointer if you don't use the PUSH or CALL instructions.But here that's not possible.

2. SYSRET with non-canonical address

Intel CPUs have a problem with SYSRET with non-canonical return address. The description in the wiki is a bit vague; can someone expand?

Easiest fix is probably to not allow an executable page to be mapped at the last address before the address boundary. The only way I can think of how the problem can happen is if you have a SYSCALL instruction at 0x00007ffffffffffe, which is an error, anyway. So you could also recognize the error in the syscall handler and deliver a SIGSEGV or similar.

3. SS selector

Apparently, AMD CPUs don't update the SS descriptor cache correctly on SYSRET, which is a problem if you SYSRET after a task switch out of an interrupt handler. One possible mitigation would be to IRET out of every syscall that was interrupted. Or alternatively explicitly save and load SS on task switch even in 64-bit mode.

4. PUSH selector

I found that one actually documented on felixcloutier.com. In 32-bit mode on Intel CPUs, if a segment selector is pushed onto the stack for any reason (be it a push instruction or an implicit push following an intterupt), then only a 16-bit move to memory is used. The high 16 bits are garbage in that case. Obvious mitigation is to only consider the low 16 bits of any such slot to be significant, which is good practice, anyway.

5. Nesting of NMI interrupts

And what fun we had with this one. Obvious mitigation is to not put the NMI handler on an IST stack. Which necessitates not using SYSCALL, or else hoping really hard that no NMI happens between the syscall entry point and the moment the kernel stack is set up.

6. F00F bug

The wording is a bit weird on the wiki page, but I think they mean the mitigation is to map the page which contains the IDT entry for #UD as uncachable or write-through.

7. FDIV bug

If you actually still care about this one (all CPUs with >120MHz are unaffected), on the affected machines you could just emulate the coprocessor. Maybe even emulate it with itself. There is no option to get an interrupt just on FDIV, so you would instead get an interrupt for every coprocessor command. And unless that command is FDIV you can just execute that command in kernel space. And for FDIV you can calculate everything in software.

Or alternatively disable the FPU entirely on the affected machines.

8. Meltdown

As I understand it, the workaround on the affected machines (they will patch this in hardware later, right?) is to move all the kernel entry points (syscall, interrupts, maybe call gates), and the "current process" descriptor into special sections each. Every process then contains two different CR3 values (and attendant map tables): One which contains the entire kernel mapping (as usual), and one in which only these special entry sections and the entire userspace are mapped. On entry, then, CR3 has to be loaded with the value for the full kernel mappings, and on exit to userspace it has to be loaded with the value for the partial mapping. This way, out-of-order execution can't access kernel space at all, since those maps are marked as "not present".

All right, that's about all I know about these. For many other bugs, the description is sparse and the mitigation is non-present. What about you guys?

P.S.: Does anyone have an idea how to format this post so it looks more structured?

Posted: **Wed Nov 28, 2018 11:11 am**

nullplan wrote: 1. ESP not cleared

Most of the OSs developed by users of this forum are either entirely 16-bit-real-mode or entirely 32/64-bit and don't support 16-bit code at all... That being said, shouldn't it be possible to return to 16-bit usermode via 16-bit kernelmode and clear those upper bits along the way?

nullplan wrote: I just noticed: That means 16-bit userspace can never use the top 16 bits of ESP. Which is bad, because usually userspace can do whatever they want with the registers and the kernel needn't care. You could use BX for a stack pointer if you don't use the PUSH or CALL instructions.But here that's not possible.

That's fine, in "canonical" 16-bit modes, the stack segment (like all segments) can't be more than 64KB anyway. It's only a problem if you're running 16-bit code with a 32-bit stack segment, which, while it was done by some OSs in the 1990s, isn't something that there's any reason to do nowadays.

nullplan wrote: 6. F00F bug
7. FDIV bug

Probably easiest just to ignore these, or possibly add some detection code followed by panic("Unsupported buggy CPU") or similar. BTW, Pentium CPUs of any speed manufactured after (approximately) early-1995 are not affected by the FDIV bug, including those rated less than 120Mhz... Remember that pre-Pentium or non-Intel (e.g. AMD or Cyrix) CPUs weren't affected at all; buggy CPUs were only widely available for 2-3 years and probably didn't exceed 20% of the market.

Posted: **Wed Nov 28, 2018 12:59 pm**

mallard wrote: Most of the OSs developed by users of this forum are either entirely 16-bit-real-mode or entirely 32/64-bit and don't support 16-bit code at all... That being said, shouldn't it be possible to return to 16-bit usermode via 16-bit kernelmode and clear those upper bits along the way?

Well in that case, why bring up the bug at all? That is the way I'm going to go, after all, only allowing 64-bit code and never looking back. But it is still interesting to think about solutions to this problem.

Switching the kernel into 16-bit mode has its own problems. You are probably going to have to run that code in ring 1, in order to switch stacks if anything happens, and I couldn't figure out if that means that you need to reserve a page or two in the low 1MB of virtual address space for the kernel, which would be inconvenient for the user. Changing the design from using 2 rings to using 3 rings has its own can of worms. Also, it offers no further benefit: The best you can do is clear the upper 16 bits of ESP and make them useless that way, but the espfix stack already makes those bits useless for an attacker. And you can't preserve the old bits that used to be in there, as far as I can tell, since only the low 16 bits of ESP are restored.

mallard wrote: That's fine, in "canonical" 16-bit modes, the stack segment (like all segments) can't be more than 64KB anyway. It's only a problem if you're running 16-bit code with a 32-bit stack segment, which, while it was done by some OSs in the 1990s, isn't something that there's any reason to do nowadays.

My point was that userspace might be using ESP for something other than holding the stack pointer. I don't know why you would do that, but making sense has never been a requirement of userspace code. Especially not Win16 code, which seems to be the main client for this kind of application.

Posted: **Fri Nov 30, 2018 2:55 am**

Why not link to the official errata datasheets? There are a lot of CPU bugs out there, but most of them have workarounds documented by Intel and AMD. The wiki can include additional descriptions for cases where the datasheet is unclear or incomplete.

OSDev.org

CPU Bug mitigations

CPU Bug mitigations

Re: CPU Bug mitigations

Re: CPU Bug mitigations

Re: CPU Bug mitigations