bzt wrote:moonchild wrote:If you implement a safe language with the JIT, you can't have buffer overflows :)
As I've said, possible BOFs
unrelated to JIT. BTW I haven't seen any WASM vm that verifies the bytecode and guarantees to generate 100% bullet-proof native code, and even wasmjit runs the compiled code in ring 0 (yeah, what could go wrong, right?). Do you know any WASM JIT compiler that actually verifies the code and places bound-checks in the generated code? Please let me know if there's any. (I'm not sarcastic, I'm genuinely curious)
I think you're conflating language with implementation.
A programming language is a set of rules that describe the behaviour of programs. These are the
semantics. (There are also usually syntactic rules, describing how programs are constructed; but we don't care about those right now.) These rules are generally contained in the specification, if there is one. According to the
rules of the c programming language, c programs can have buffer overflows[1]. On the other hand, the
rules of java state that if you access outside the bounds of an array, an exception will be thrown and you wouldn't be able to read/write bad memory.
IOW c programs can have buffer overflows, and java programs cannot.
Assembly is also a programming language, and its semantics are described in the intel/amd SDM. A
compiler is a tool which understands the rules of two different languages and creates a mapping from one to the other. E.G. from c to assembly. This means: according to the respective rules of c and assembly, the program as written in assembly should have the same semantics as the program written in assembly[2]. This has nothing to do with the fact that it is actually possible to run the assembly code; it has only to do with the rules that
describe the behaviour of c and assembly.
So, there are two potential problems:
- Buffer overflows may be part of the semantics of the source language. So, an arbitrary piece of code written in the source language might contain buffer overflows and there is nothing we can do about that. Example: C.
- Buffer overflows may be part of the semantics of the destination language. Prima facie, this is not necessarily a problem. If the source language doesn't have buffer overflows, then the compiler should be constrained to avoid producing programs which contain buffer overflows. However, if the compiler has a bug, it may produce output which does not correspond to the input, and which does have buffer overflows. Example: V8 (js -> assembly). (This is usually only a problem if you need to run untrusted code. I don't think there are many examples of security bugs introduced into trusted code by compiler bugs, though the recent gcc-memcpy fiasco does come to mind.)
Problem 1 is solved by using a language which is proven to be safe, like java. Problem 2 is solved by using a compiler which has been proven to have no bugs. Note that these problems are actually orthogonal; for example, compcert is a formally verified compiler from c to assembly, and if your code contains buffer overflows, its output will too.
Wasm the language is safe for running in the kernel. (It doesn't guarantee boundschecking for array accesses, so it can corrupt its own data; but it's only allows to read/write from specific regions, so it's unable to corrupt its own generated code.) However the WASM VMs can still have bugs in them. The question is not whether the WASM compiler verifies the bytecode—it does; or, at least, tries to—the question is if it has bugs. (Spoiler: it does.)
Problem 2 applies to any type of programming language implementation.[3] It even applies to CPUs—there have been some pretty bad CPU bugs. The main problem is not executable memory, it's incorrect semantics applied to untrusted code. You would have (most) of the same problems with a compiler (and thinking about it more, I think AOT compilation with a signed cache might actually make more sense than JIT).
Getting back to the point—I don't know of anything specifically for wasm, but there are formally verified compilers, yes. I already mentioned compcert, but maybe more interesting is cakeml, a formally verified compiler for ml, which is a safe language. And the coq jit framework I linked earlier can probably implement wasm, which gives a path to a verified wasm jit; though again, I think aot might ultimately be a better path for a project like this one.
1. This is not
technically true. If you access outside the bounds of an array, you get undefined behaviour. Which means it is possible to implement boundschecking for see (see, for instance asan). This doesn't affect the broader point, however; and most code is not compiled with asan.
2. Wherever the source language has undefined behaviour, the compiler has the option of turning that into defined behaviour in the target language. (If the target language has no undefined behaviour, then it will
have to choose some behaviour.) IOW undefined behaviour in the source does not have to keep being undefined in the destination
3. Though, in the specific case when your userspace language is implemented by an interpreter which is implemented in a safe compiled language, the odds of that kind of a memory bug are pretty much nil. Because the interpreter is trusted code, and unlikely to be hit very badly by compiler bugs. However, most interpreters are quite slow...