Oh boy this thread exploded while I was sleeping.
~ wrote:Wouldn't it be enough to just disable the CPU cache entirely at least for security-critical machines as it's easier to configure as the default? Or invalidate the entire cache every time we switch/enter/exit/terminate/create a process or thread?
Disable the cache and you will end up with a computer that runs about as fast as the average PC in the '90s. RAM is
slow.
Brendan wrote:In theory, managed languages could work (by making it impossible for programmers to generate code that tries to access kernel space); but quite frankly every "managed language" attempt that's ever hit production machines has had so many security problems that it's much safer to assume that a managed languages would only make security worse (far more code needs to be trusted than kernel alone), and the performance is likely to be worse than an PTI approach (especially for anything where performance matters).
Considering we have rowhammer working from javascript? (a language that is literally compiled at load time and where the result may vary depending on the particular browser version) Managed languages are not going to save you when even higher level ones can be used to mess with low-level bugs. (EDIT: and now that I read, spectre works from javascript too apparently?)
So yeah, that idea is immediately ineffective already.
Brendan wrote:For "make kernel pages inaccessible" it doesn't necessarily need to be all kernel pages. Pages that contain sensitive information (e.g. encryption keys) would need to be made inaccessible, but pages that don't contain sensitive information don't need to be made inaccessible. This gives 2 cases.
If PCID can't be used; then you could separate everything into "sensitive kernel data" and "not sensitive kernel data" and leave all of the "not sensitive kernel data" mapped in all address spaces all the time to minimise the overhead. For a monolithic kernel (especially a pre-existing monolithic kernel) it'd be almost impossible to separate "sensitive" and "not sensitive" (because there's all kinds of drivers, etc to worry about) and it'd be easy to overlook something; so you'd mostly want a tiny stub where almost everything is treated as "sensitive" to avoid the headaches. For a micro-kernel it wouldn't be too hard to distinguish between "sensitive" and "not sensitive", and it'd be possible to create a micro-kernel where everything is "not sensitive", simple because there's very little in the kernel to begin with. The performance of a micro-kernel would be much less effected or not effected at all; closing the performance gap between micro-kernel and monolithic, and potentially making micro-kernels faster than monolithic kernels.
Note: For this case, especially for monolithic kernels, if you're paying for the TLB trashing anyway then it wouldn't take much more to have fully separated virtual address spaces, so that both user-space and kernel-space can be larger (e.g. on a 32-bit CPU, let user-space have almost 4 GiB of space and let kernel have a separate 4 GiB of space).
If PCID can be used (which excludes 32-bit OSs); then the overhead of making kernel pages inaccessible is significantly less. In this case, if nothing in the kernel is "sensitive" you can do nothing, and if anything in the kernel is "sensitive" you'd probably just use PCID to protect everything (including the "not sensitive" data). In practice this probably means that monolithic kernels and some micro-kernels are effected; but "100% not sensitive micro-kernel" wouldn't be effected.
In other words; it reduces the performance gap between some micro-kernel and monolithic kernels, but not all micro-kernels, and probably not enough to make some micro-kernels faster than monolithic kernels.
How do you get a non-sensitive kernel? Even in the smallest kernels, the kernel is the one in charge of taking care of assigning memory to each process, and that's probably the most sensitive part of the whole system since the kernel is the one granting permissions to everything else. The kernel itself may not hold the sensitive information but messing with it can open the gates to accessing said sensitive information elsewhere.
Mind you, it's possible I'm misunderstanding or overlooking something in what you're saying.
Brendan wrote:The other thing I'd want to mention is that for all approaches and all kernel types (but excluding "kernel given completely separate virtual address space so that both user-space and kernel-space can be larger"), the kernel could distinguish between "more trusted" processes and "less trusted" processes and leave the kernel mapped (and avoid PTI overhead) when "more trusted" processes are running. In practice this means that if the OS supports (e.g.) digitally signed executables (and is therefore able to associate different amounts of trust depending on the existence of a signature and depending on who the signer was) then it may perform far better than an OS that doesn't. This makes me think that various open source groups that shun things like signatures (e.g. GNU) may end up penalised on a lot of OSs (possibly including future versions of Linux).
The problem is that software is buggy (programmers are not perfect). Your executable may be from a trusted source and still have some exploit that can be used to attack the machine from an outside vector (e.g. data coming in).
Digitally signing the executable is only useful to (supposedly reliably) know that the executable is the one you were intended to get, but says nothing about the actual safety of it.
Korona wrote:There is no absolutely effective software defense against Spectre. We would need ISA updates (e.g. an instruction that invalidates speculative state like the branch prediction buffer). The PoC does not even depend on RDTSC and can read Chromes address space from JavaScript.
The ISA itself is fine, the problem is the implementation. From what I gather, it affects
every CPU with out-of-order execution by definition (it's a timing attack), so pretty much every CPU from the '90s onwards. Yeowch. It's true that a different ISA would help make up for the lack of OOO if we went down that route, but even then it'd be pretty bad.
Honestly with timing attacks it's often better to make them useless than to try to prevent them.
Korona wrote:It seems that the Spectre exploit can be mitigated on Intel by replacing all indirect jumps with the sequence
*snip*
.. which is ugly at best and also somewhat inefficient: It introduces of two jumps AND prevents branch prediction. It seems that GCC will be patched to use this sequence. People who wrote their OSes in assembly: Have fun fixing your jumps
.
For calls it gets even uglier as you need a call to a label to push the current RIP before you jump to the trampoline.
And then an attacker will just use their own code that works for the exploit and make that moot =P
~ wrote:Now that I think about it, I think that this problem could also be greatly mitigated if most of the data of a program was put on disk and then only a portion of it loaded at a time. The memory usage would be much better, much lower, and storing all data structures mainly on disk for most applications would make hitting enough of very private data (enough to be usable or recognizable) too difficult and infrequent, so storing all user data on disk could also be an option.
Then you have the same performance problem as disabling caches, but a tad worse. Also as a bonus you put so much more wear on the drive that it will stop working much sooner (which is a potentially even bigger problem - you pretty much just DOS'd the hardware!).