Breaking ASLR on systems with memory caches

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
User avatar
xenos
Member
Member
Posts: 1118
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Breaking ASLR on systems with memory caches

Post by xenos »

I was just pointed to an interesting article that I would like to share:

http://www.cs.vu.nl/~herbertb/download/ ... ndss17.pdf

tl;dr: The article presents a method how address space layout randomization, a security feature used to prevent attacks which are based on knowing the location of a program's code and data, can be broken by probing the memory cache. This probing is done in such as way that an attacker figures out which entries from page tables at all levels of the paging hierarchy have been recently fetched (as a consequence of a page walk, when a TLB miss occurs), and this basically tells the attacker for which virtual address the page walk was performed (and this is the information that ASLR is supposed to disguise). This is demonstrated in JavaScript.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: Breaking ASLR on systems with memory caches

Post by onlyonemac »

I'm not the most clued-up about these things but, in my opinion, relying on ASLR to prevent an exploit is a bit like hiding a file containing all your passwords in a long tree of subdirectories with a false name. In other words, if information is held in a place that's accessible to an attacker, assume that they can access it, and don't think "oh, they won't find this".
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
XenOS wrote:I was just pointed to an interesting article that I would like to share:
Once upon a time Intel invented a new instruction - "RDTSC". Intel weren't stupid and knew that access to such precise timing would be a security problem, so they also provided a flag in CR4 that an OS can use to prevent CPL=3 code from being able to use the RDTSC instruction, and in the "Volume 3: System Programming Guide" (starting from the very first version that mentioned the time stamp counter) they wrote (highlighting is mine):
Intel in 1995 wrote:A secure operating system would set the TSD flag during system initialization to disable user access to the time stamp counter. An operating system that disables user access to the timestamp counter should emulate the instruction through a user-accessible programming interface.
This same warning (using the exact same wording) has existed in every single version of the Intel manuals since, and still exists today.

Sadly; incompetent morons ignored Intel then and continued to ignore Intel since. The end result is that every 6 months or so (for a period of over 20 years) there's yet another white-paper showing yet another security problem that relies on very precise timing side-channels that could've and should've been impossible.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Breaking ASLR on systems with memory caches

Post by SpyderTL »

I recently watched a YouTube video on the CCCGen channel about measuring the timings of memory access to both find memory addresses that have been used by the OS, and also to communicate between processes by having one process either access memory or clear cache entries, and the second process measure how long it takes to access the same memory to determine if the access was slow (meaning the bit would be a 1) or fast (meaning the bit would be a 0).

I think they said that they got up to 75Kbps using this approach... Pretty fascinating stuff. Unfortunately, that channel has been (quite recently) shut down by YouTube for breaking TOS, somehow.

EDIT: Found it. https://www.youtube.com/watch?v=iS8YPDuZbKM

Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Breaking ASLR on systems with memory caches

Post by OSwhatever »

Is there a good reason cache flush instructions should be available to user space at all?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
OSwhatever wrote:Is there a good reason cache flush instructions should be available to user space at all?
CLFLUSH (and the non-temporal moves that came later) are important for performance in cases where the data you touched recently won't be data you need again soon (especially when it's a lot of data you won't need again soon). They're an effective work around for pathological cases caused by the cache's "least recently used eviction" algorithm that are needed to avoid potentially severe performance problems.

There are also a few other uses - mostly, in code that benchmarks RAM chip latency and/or code that tests if RAM is faulty; where you don't want to accidentally test the CPUs cache and not test RAM.

If CLFLUSH was only allowed at CPL=0 and you had a kernel API function to do it instead, then the only thing it'd do is make good code much slower (because it's mostly used inside loops in performance critical code that processes a large amount of data, the overhead of an exception of kernel API call would be disastrous); and bad/malicious code would just use the kernel API function. If you didn't allow CLFLUSH (or non-temporal moves) at all (disabled with no alternative provided by kernel) you'd be stuck with the potentially severe performance problems (and it'd be harder to benchmark or test RAM).

Note that there is no actual security problem with CLFLUSH itself. There's a security problem caused by dodgy RAM chips ("rowhammer") where even good/non-malicious software can cause unintended RAM corruption; and there's a problem with "too precise timing" that leads to all kinds of timing side-channels (including data caches, and instruction caches, and TLBs, and shared resources in "attacker and victim sharing core" SMT cases, and ....).

The simple solution is for the kernel to disable RDTSC (and RDTSCP) at CPL=3 and either emulate them (for backward compatibility) or provide kernel API function/s. This allows kernel to nerf precision and/or nerf accuracy (and make a lot of timing side-channels far less practical). It also allows kernel to avoid the "unsynchronised time stamp on different CPUs" problem and/or allows kernel to provide a more convenient calibrated time stamp (that doesn't depend on CPU speed - e.g. always "microseconds since something" for all CPUs in all power saving states).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Sik
Member
Member
Posts: 251
Joined: Wed Aug 17, 2016 4:55 am

Re: Breaking ASLR on systems with memory caches

Post by Sik »

Brendan wrote:Note that there is no actual security problem with CLFLUSH itself. There's a security problem caused by dodgy RAM chips ("rowhammer") where even good/non-malicious software can cause unintended RAM corruption; and there's a problem with "too precise timing" that leads to all kinds of timing side-channels (including data caches, and instruction caches, and TLBs, and shared resources in "attacker and victim sharing core" SMT cases, and ....).
That was my thought when I first heard about it, although rather than blame the DRAM chips themselves I'd blame the DRAM controller instead. Under absolutely no circumstance should any memory write interfere with memory refresh, yet that's exactly what seemed to be happening (hence the RAM corruption).
azblue
Member
Member
Posts: 147
Joined: Sat Feb 27, 2010 8:55 pm

Re: Breaking ASLR on systems with memory caches

Post by azblue »

There are a couple questions I have arising from the OP and some of the comments:

1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)

2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
azblue wrote:There are a couple questions I have arising from the OP and some of the comments:

1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)
First; "security through obscurity" is where you're relying on the attacker not being able to figure out how the security works. If the attacker knows exactly how the security works but doesn't know something else (a password, an encryption key, a seed for a random number generator, etc) then it's not considered "security through obscurity".

It's best to think of security as many layers, where an attacker has to penetrate through multiple layers to succeed. The question is whether or not ASLR is a worthwhile additional layer - do the benefits (for security) justify the disadvantages (for code complexity, performance, etc). There's no right answer - it's more a question of design goals.

Note that for 32-bit 80x86 usually you can't get enough entropy for ASLR to be very effective (the "random" addresses end up with the lowest 12 bits known due to page alignment and several upper bits known due splitting the virtual address space into zones; and that only leaves about 18 "random middle bits"); and because the CPU doesn't support efficient "IP relative addressing" for 32-bit code there's a performance cost involved with relocated code and data. This means that ASLR is closer to "not worthwhile" for 32-bit 80x86. For 64-bit 80x86 it's different - there are more addressing bits than can be random, and there is support for "IP relative addressing" and less performance cost involved; so it's "more worthwhile" for 64-bit 80x86 than it would be for 32-bit 80x86.
azblue wrote:2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
Assume you have an "uncertainty value" like this:

Code: Select all

    do {
        last_tsc = rdtsc();
    } while(last_tsc % uncertainty != 0);
    return last_tsc/uncertainty;
I'd expect that an "uncertainty value" of a few hundred cycles would be enough to make half of the attacks (those that rely on cache timing or "shared resource in SMT timing") go from "vaguely plausible" to "impractical". I'd also be tempted to make the amount of uncertainty proportional to how often software asks for the time (e.g. 200 cycles of uncertainty if you ask for the TSC infrequently, and 999999 cycles of uncertainty if you ask for TSC millions of times per second).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Korona »

Brendan wrote:I'd expect that an "uncertainty value" of a few hundred cycles would be enough to make half of the attacks (those that rely on cache timing or "shared resource in SMT timing") go from "vaguely plausible" to "impractical". I'd also be tempted to make the amount of uncertainty proportional to how often software asks for the time (e.g. 200 cycles of uncertainty if you ask for the TSC infrequently, and 999999 cycles of uncertainty if you ask for TSC millions of times per second).
Note that there are programs that legitimately ask for TSC 10s - 100s of thousands times per second. The overhead of a syscall would totally kill those applications.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
Korona wrote:
Brendan wrote:I'd expect that an "uncertainty value" of a few hundred cycles would be enough to make half of the attacks (those that rely on cache timing or "shared resource in SMT timing") go from "vaguely plausible" to "impractical". I'd also be tempted to make the amount of uncertainty proportional to how often software asks for the time (e.g. 200 cycles of uncertainty if you ask for the TSC infrequently, and 999999 cycles of uncertainty if you ask for TSC millions of times per second).
Note that there are programs that legitimately ask for TSC 10s - 100s of thousands times per second. The overhead of a syscall would totally kill those applications.
Which applications, and why?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
willedwards
Member
Member
Posts: 96
Joined: Sat Mar 15, 2014 3:49 pm

Re: Breaking ASLR on systems with memory caches

Post by willedwards »

azblue wrote:There are a couple questions I have arising from the OP and some of the comments:

1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)

2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
1. You should absolutely bother with exploit mitigation in general and ASLR in particular.

https://www.openbsd.org/papers/auug04/mgp00001.html is old but still excellent background.

You can harden your exploit mitigation further by W^X, even for JITed code, having an at-install-time or at-each-load-time code order randomizer and so on. I think I heard that apple and google are now doing the latter, e.g. apple for watch apps.

There's no substitute for strong memory protection and carefully audited kernel APIs. But you absolutely want exploit mitigation too.

Re 2 iirc they demonstrated a pure-Javascript timer loop that worked for this attack.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Korona »

Brendan wrote:Which applications, and why?
For example if you're polling (IRQs do not make sense at high I/O rates) a low-latency link (e.g. Infiniband) and want to keep track of RTT or wait times. I'm sure there are more examples in real-time applications, this is just one I know of.

EDIT: Another example would be performance measurements in HPC applications like SAT or ILP solvers or physics/weather/fluid/whatever simulation.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Breaking ASLR on systems with memory caches

Post by Brendan »

Hi,
Korona wrote:
Brendan wrote:Which applications, and why?
For example if you're polling (IRQs do not make sense at high I/O rates) a low-latency link (e.g. Infiniband) and want to keep track of RTT or wait times. I'm sure there are more examples in real-time applications, this is just one I know of.

EDIT: Another example would be performance measurements in HPC applications like SAT or ILP solvers or physics/weather/fluid/whatever simulation.
That doesn't really sound like enough to matter; given that NIC drivers are drivers (which tend to be "more trusted" and could or would be exempt) and the rest don't need precision at all (e.g. benchmark how much work they can do in a 5 day period and you can use a sundial instead ;-) ).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Breaking ASLR on systems with memory caches

Post by Rusky »

Brendan wrote:That doesn't really sound like enough to matter; given that NIC drivers are drivers (which tend to be "more trusted" and could or would be exempt) and the rest don't need precision at all (e.g. benchmark how much work they can do in a 5 day period and you can use a sundial instead ;-) ).
Low-latency networking tends to be done in userspace, not in a driver; performance measurements that only measure overall throughput are useful but insufficient.

Someone's always going to be able to come up with a good reason to want high-precision timing information. The solution is not to get rid of it, but to use a well-designed execution environment with enough mitigations that it becomes impractical to take advantage of timing for malicious purposes.
Post Reply