Breaking ASLR on systems with memory caches
- xenos
- Member
- Posts: 1118
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Breaking ASLR on systems with memory caches
I was just pointed to an interesting article that I would like to share:
http://www.cs.vu.nl/~herbertb/download/ ... ndss17.pdf
tl;dr: The article presents a method how address space layout randomization, a security feature used to prevent attacks which are based on knowing the location of a program's code and data, can be broken by probing the memory cache. This probing is done in such as way that an attacker figures out which entries from page tables at all levels of the paging hierarchy have been recently fetched (as a consequence of a page walk, when a TLB miss occurs), and this basically tells the attacker for which virtual address the page walk was performed (and this is the information that ASLR is supposed to disguise). This is demonstrated in JavaScript.
http://www.cs.vu.nl/~herbertb/download/ ... ndss17.pdf
tl;dr: The article presents a method how address space layout randomization, a security feature used to prevent attacks which are based on knowing the location of a program's code and data, can be broken by probing the memory cache. This probing is done in such as way that an attacker figures out which entries from page tables at all levels of the paging hierarchy have been recently fetched (as a consequence of a page walk, when a TLB miss occurs), and this basically tells the attacker for which virtual address the page walk was performed (and this is the information that ASLR is supposed to disguise). This is demonstrated in JavaScript.
-
- Member
- Posts: 1146
- Joined: Sat Mar 01, 2014 2:59 pm
Re: Breaking ASLR on systems with memory caches
I'm not the most clued-up about these things but, in my opinion, relying on ASLR to prevent an exploit is a bit like hiding a file containing all your passwords in a long tree of subdirectories with a false name. In other words, if information is held in a place that's accessible to an attacker, assume that they can access it, and don't think "oh, they won't find this".
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Re: Breaking ASLR on systems with memory caches
Hi,
Sadly; incompetent morons ignored Intel then and continued to ignore Intel since. The end result is that every 6 months or so (for a period of over 20 years) there's yet another white-paper showing yet another security problem that relies on very precise timing side-channels that could've and should've been impossible.
Cheers,
Brendan
Once upon a time Intel invented a new instruction - "RDTSC". Intel weren't stupid and knew that access to such precise timing would be a security problem, so they also provided a flag in CR4 that an OS can use to prevent CPL=3 code from being able to use the RDTSC instruction, and in the "Volume 3: System Programming Guide" (starting from the very first version that mentioned the time stamp counter) they wrote (highlighting is mine):XenOS wrote:I was just pointed to an interesting article that I would like to share:
This same warning (using the exact same wording) has existed in every single version of the Intel manuals since, and still exists today.Intel in 1995 wrote:A secure operating system would set the TSD flag during system initialization to disable user access to the time stamp counter. An operating system that disables user access to the timestamp counter should emulate the instruction through a user-accessible programming interface.
Sadly; incompetent morons ignored Intel then and continued to ignore Intel since. The end result is that every 6 months or so (for a period of over 20 years) there's yet another white-paper showing yet another security problem that relies on very precise timing side-channels that could've and should've been impossible.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Breaking ASLR on systems with memory caches
I recently watched a YouTube video on the CCCGen channel about measuring the timings of memory access to both find memory addresses that have been used by the OS, and also to communicate between processes by having one process either access memory or clear cache entries, and the second process measure how long it takes to access the same memory to determine if the access was slow (meaning the bit would be a 1) or fast (meaning the bit would be a 0).
I think they said that they got up to 75Kbps using this approach... Pretty fascinating stuff. Unfortunately, that channel has been (quite recently) shut down by YouTube for breaking TOS, somehow.
EDIT: Found it. https://www.youtube.com/watch?v=iS8YPDuZbKM
I think they said that they got up to 75Kbps using this approach... Pretty fascinating stuff. Unfortunately, that channel has been (quite recently) shut down by YouTube for breaking TOS, somehow.
EDIT: Found it. https://www.youtube.com/watch?v=iS8YPDuZbKM
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
-
- Member
- Posts: 595
- Joined: Mon Jul 05, 2010 4:15 pm
Re: Breaking ASLR on systems with memory caches
Is there a good reason cache flush instructions should be available to user space at all?
Re: Breaking ASLR on systems with memory caches
Hi,
There are also a few other uses - mostly, in code that benchmarks RAM chip latency and/or code that tests if RAM is faulty; where you don't want to accidentally test the CPUs cache and not test RAM.
If CLFLUSH was only allowed at CPL=0 and you had a kernel API function to do it instead, then the only thing it'd do is make good code much slower (because it's mostly used inside loops in performance critical code that processes a large amount of data, the overhead of an exception of kernel API call would be disastrous); and bad/malicious code would just use the kernel API function. If you didn't allow CLFLUSH (or non-temporal moves) at all (disabled with no alternative provided by kernel) you'd be stuck with the potentially severe performance problems (and it'd be harder to benchmark or test RAM).
Note that there is no actual security problem with CLFLUSH itself. There's a security problem caused by dodgy RAM chips ("rowhammer") where even good/non-malicious software can cause unintended RAM corruption; and there's a problem with "too precise timing" that leads to all kinds of timing side-channels (including data caches, and instruction caches, and TLBs, and shared resources in "attacker and victim sharing core" SMT cases, and ....).
The simple solution is for the kernel to disable RDTSC (and RDTSCP) at CPL=3 and either emulate them (for backward compatibility) or provide kernel API function/s. This allows kernel to nerf precision and/or nerf accuracy (and make a lot of timing side-channels far less practical). It also allows kernel to avoid the "unsynchronised time stamp on different CPUs" problem and/or allows kernel to provide a more convenient calibrated time stamp (that doesn't depend on CPU speed - e.g. always "microseconds since something" for all CPUs in all power saving states).
Cheers,
Brendan
CLFLUSH (and the non-temporal moves that came later) are important for performance in cases where the data you touched recently won't be data you need again soon (especially when it's a lot of data you won't need again soon). They're an effective work around for pathological cases caused by the cache's "least recently used eviction" algorithm that are needed to avoid potentially severe performance problems.OSwhatever wrote:Is there a good reason cache flush instructions should be available to user space at all?
There are also a few other uses - mostly, in code that benchmarks RAM chip latency and/or code that tests if RAM is faulty; where you don't want to accidentally test the CPUs cache and not test RAM.
If CLFLUSH was only allowed at CPL=0 and you had a kernel API function to do it instead, then the only thing it'd do is make good code much slower (because it's mostly used inside loops in performance critical code that processes a large amount of data, the overhead of an exception of kernel API call would be disastrous); and bad/malicious code would just use the kernel API function. If you didn't allow CLFLUSH (or non-temporal moves) at all (disabled with no alternative provided by kernel) you'd be stuck with the potentially severe performance problems (and it'd be harder to benchmark or test RAM).
Note that there is no actual security problem with CLFLUSH itself. There's a security problem caused by dodgy RAM chips ("rowhammer") where even good/non-malicious software can cause unintended RAM corruption; and there's a problem with "too precise timing" that leads to all kinds of timing side-channels (including data caches, and instruction caches, and TLBs, and shared resources in "attacker and victim sharing core" SMT cases, and ....).
The simple solution is for the kernel to disable RDTSC (and RDTSCP) at CPL=3 and either emulate them (for backward compatibility) or provide kernel API function/s. This allows kernel to nerf precision and/or nerf accuracy (and make a lot of timing side-channels far less practical). It also allows kernel to avoid the "unsynchronised time stamp on different CPUs" problem and/or allows kernel to provide a more convenient calibrated time stamp (that doesn't depend on CPU speed - e.g. always "microseconds since something" for all CPUs in all power saving states).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Breaking ASLR on systems with memory caches
That was my thought when I first heard about it, although rather than blame the DRAM chips themselves I'd blame the DRAM controller instead. Under absolutely no circumstance should any memory write interfere with memory refresh, yet that's exactly what seemed to be happening (hence the RAM corruption).Brendan wrote:Note that there is no actual security problem with CLFLUSH itself. There's a security problem caused by dodgy RAM chips ("rowhammer") where even good/non-malicious software can cause unintended RAM corruption; and there's a problem with "too precise timing" that leads to all kinds of timing side-channels (including data caches, and instruction caches, and TLBs, and shared resources in "attacker and victim sharing core" SMT cases, and ....).
Re: Breaking ASLR on systems with memory caches
There are a couple questions I have arising from the OP and some of the comments:
1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)
2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)
2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
Re: Breaking ASLR on systems with memory caches
Hi,
It's best to think of security as many layers, where an attacker has to penetrate through multiple layers to succeed. The question is whether or not ASLR is a worthwhile additional layer - do the benefits (for security) justify the disadvantages (for code complexity, performance, etc). There's no right answer - it's more a question of design goals.
Note that for 32-bit 80x86 usually you can't get enough entropy for ASLR to be very effective (the "random" addresses end up with the lowest 12 bits known due to page alignment and several upper bits known due splitting the virtual address space into zones; and that only leaves about 18 "random middle bits"); and because the CPU doesn't support efficient "IP relative addressing" for 32-bit code there's a performance cost involved with relocated code and data. This means that ASLR is closer to "not worthwhile" for 32-bit 80x86. For 64-bit 80x86 it's different - there are more addressing bits than can be random, and there is support for "IP relative addressing" and less performance cost involved; so it's "more worthwhile" for 64-bit 80x86 than it would be for 32-bit 80x86.
I'd expect that an "uncertainty value" of a few hundred cycles would be enough to make half of the attacks (those that rely on cache timing or "shared resource in SMT timing") go from "vaguely plausible" to "impractical". I'd also be tempted to make the amount of uncertainty proportional to how often software asks for the time (e.g. 200 cycles of uncertainty if you ask for the TSC infrequently, and 999999 cycles of uncertainty if you ask for TSC millions of times per second).
Cheers,
Brendan
First; "security through obscurity" is where you're relying on the attacker not being able to figure out how the security works. If the attacker knows exactly how the security works but doesn't know something else (a password, an encryption key, a seed for a random number generator, etc) then it's not considered "security through obscurity".azblue wrote:There are a couple questions I have arising from the OP and some of the comments:
1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)
It's best to think of security as many layers, where an attacker has to penetrate through multiple layers to succeed. The question is whether or not ASLR is a worthwhile additional layer - do the benefits (for security) justify the disadvantages (for code complexity, performance, etc). There's no right answer - it's more a question of design goals.
Note that for 32-bit 80x86 usually you can't get enough entropy for ASLR to be very effective (the "random" addresses end up with the lowest 12 bits known due to page alignment and several upper bits known due splitting the virtual address space into zones; and that only leaves about 18 "random middle bits"); and because the CPU doesn't support efficient "IP relative addressing" for 32-bit code there's a performance cost involved with relocated code and data. This means that ASLR is closer to "not worthwhile" for 32-bit 80x86. For 64-bit 80x86 it's different - there are more addressing bits than can be random, and there is support for "IP relative addressing" and less performance cost involved; so it's "more worthwhile" for 64-bit 80x86 than it would be for 32-bit 80x86.
Assume you have an "uncertainty value" like this:azblue wrote:2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
Code: Select all
do {
last_tsc = rdtsc();
} while(last_tsc % uncertainty != 0);
return last_tsc/uncertainty;
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Breaking ASLR on systems with memory caches
Note that there are programs that legitimately ask for TSC 10s - 100s of thousands times per second. The overhead of a syscall would totally kill those applications.Brendan wrote:I'd expect that an "uncertainty value" of a few hundred cycles would be enough to make half of the attacks (those that rely on cache timing or "shared resource in SMT timing") go from "vaguely plausible" to "impractical". I'd also be tempted to make the amount of uncertainty proportional to how often software asks for the time (e.g. 200 cycles of uncertainty if you ask for the TSC infrequently, and 999999 cycles of uncertainty if you ask for TSC millions of times per second).
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: Breaking ASLR on systems with memory caches
Hi,
Cheers,
Brendan
Which applications, and why?Korona wrote:Note that there are programs that legitimately ask for TSC 10s - 100s of thousands times per second. The overhead of a syscall would totally kill those applications.Brendan wrote:I'd expect that an "uncertainty value" of a few hundred cycles would be enough to make half of the attacks (those that rely on cache timing or "shared resource in SMT timing") go from "vaguely plausible" to "impractical". I'd also be tempted to make the amount of uncertainty proportional to how often software asks for the time (e.g. 200 cycles of uncertainty if you ask for the TSC infrequently, and 999999 cycles of uncertainty if you ask for TSC millions of times per second).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 96
- Joined: Sat Mar 15, 2014 3:49 pm
Re: Breaking ASLR on systems with memory caches
1. You should absolutely bother with exploit mitigation in general and ASLR in particular.azblue wrote:There are a couple questions I have arising from the OP and some of the comments:
1. Does this mean we shouldn't bother with ASLR? (Isn't it almost "security through obscurity" anyway?)
2. Regarding timing attacks, how precise can timing be before it becomes a security concern? This is something I've wondered for quite a while.
https://www.openbsd.org/papers/auug04/mgp00001.html is old but still excellent background.
You can harden your exploit mitigation further by W^X, even for JITed code, having an at-install-time or at-each-load-time code order randomizer and so on. I think I heard that apple and google are now doing the latter, e.g. apple for watch apps.
There's no substitute for strong memory protection and carefully audited kernel APIs. But you absolutely want exploit mitigation too.
Re 2 iirc they demonstrated a pure-Javascript timer loop that worked for this attack.
Re: Breaking ASLR on systems with memory caches
For example if you're polling (IRQs do not make sense at high I/O rates) a low-latency link (e.g. Infiniband) and want to keep track of RTT or wait times. I'm sure there are more examples in real-time applications, this is just one I know of.Brendan wrote:Which applications, and why?
EDIT: Another example would be performance measurements in HPC applications like SAT or ILP solvers or physics/weather/fluid/whatever simulation.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: Breaking ASLR on systems with memory caches
Hi,
Cheers,
Brendan
That doesn't really sound like enough to matter; given that NIC drivers are drivers (which tend to be "more trusted" and could or would be exempt) and the rest don't need precision at all (e.g. benchmark how much work they can do in a 5 day period and you can use a sundial instead ).Korona wrote:For example if you're polling (IRQs do not make sense at high I/O rates) a low-latency link (e.g. Infiniband) and want to keep track of RTT or wait times. I'm sure there are more examples in real-time applications, this is just one I know of.Brendan wrote:Which applications, and why?
EDIT: Another example would be performance measurements in HPC applications like SAT or ILP solvers or physics/weather/fluid/whatever simulation.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Breaking ASLR on systems with memory caches
Low-latency networking tends to be done in userspace, not in a driver; performance measurements that only measure overall throughput are useful but insufficient.Brendan wrote:That doesn't really sound like enough to matter; given that NIC drivers are drivers (which tend to be "more trusted" and could or would be exempt) and the rest don't need precision at all (e.g. benchmark how much work they can do in a 5 day period and you can use a sundial instead ).
Someone's always going to be able to come up with a good reason to want high-precision timing information. The solution is not to get rid of it, but to use a well-designed execution environment with enough mitigations that it becomes impractical to take advantage of timing for malicious purposes.