A question about binary translation/simulation

Programming, for all ages and all languages.
Post Reply
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

A question about binary translation/simulation

Post by Ethin »

So, I've had this absolutely crazy theory/idea in my mind for a while that I was going to add to my OS if its feasible. But its not about OS theory or OS development, so it didn't make sense to put this there.
The idea is kinda like QEMU user-mode emulation, except a bit different. The idea is to create a framework that could one day become a universal binary execution engine for user-mode (not kernel-mode) programs. To illustrate this, lets say you wanted to run a RISC-V program on your x86 machine. On Linux you'd need to register binary formats with the kernel and then run qemu-system-riscv32 or qemu-system-riscv64 with the appropriate options to load your program and run it. However, using my theoretical framework, here's what might happen:
1. The kernel loads and parses the ELF program.
2. The kernel checks the architecture field and determines that its not native to the one the kernel is running on.
3. Instead of "rejecting" the binary like Linux might if you loaded a program that didn't have a registered binary handler, the kernel sends the code to the framework, which does the following:
a. The framework disassembles the program, converting it into a raw instruction stream.
b. The framework iterates through the instruction stream. For each instruction:
I. The framework reads the opcode and operands.
II. The framework attempts to locate an equivalent opcode for the architecture that the processor knows, in this case x86.
III. Once the opcode is found, the framework performs register allocation and completes the instruction, taking into account any bits (e.g.: the AQLR bit on RISC-V atomic instructions) and prepending things like lock prefixes. If there is no equivalent instruction in the native opcode set, the opcode is simulated using the appropriate instructions.
IV. The newly generated instructions are added to a "new" instruction stream and this process continues.
c. Once the framework has scanned the complete instruction stream, it reassembles the binary in memory, alters any fields and sections as necessary, and returns the new binary to the kernel.
4. The kernel runs the binary.
As you can imagine, there are lots of problems to this idea. For one, I've no idea how to detect functions in an instruction stream, so I'd need to find a way of figuring that out to account for ABI differences. For two, architectues like ARM/AArch64 have instructions like HVC and SMC for which there is no equivalent in other architectures. However, my idea was to start with something a lot simpler like RISC-V and then to allow others to help expand it in future. Do you guys have any other ideas on how something like this might work and any problems that I might encounter and how to solve them? Is it even a good idea to try something like this, or should I just leave it up to QEMU, assuming I even try this at all?
reapersms
Member
Member
Posts: 48
Joined: Fri Oct 04, 2019 10:10 am

Re: A question about binary translation/simulation

Post by reapersms »

The process you're thinking of here has been done to various degrees, but is rather difficult to pull off in the general case. The magic names to search for for more in-depth information beyond this post would be "static recompilation", "dynamic recompilation", "static binary translation", and "dynamic binary translation". The most noteworthy attempts I can recall have all been Apple related, with their transition approaches for handling the change from 68k->PPC, PPC->x86, and x86->Arm.

The general case tends towards straight emulation, which is what QEMU is going to do. The program in question would run inside its sandbox, which would be running a relatively complete system, and not be a particularly seamless transition from there to the outside world.

Usually the problem gets constrained a bit by punting on anything beyond userspace, or things that are 'hard'. The former means you ignore most of the nitty gritty details like locks, and expect those to all map to syscalls. The latter would be things like self-modifying code, or code generators. The most successful approaches are all dynamic, and involve generating code snippets at the basic block level, while tracing through the execution path. Any real identification of function boundaries or syscall replacement is going to be exceedingly specific to the environment you are dealing with.

Translating everything up front is generally not going to be feasible, as there's no real way to reliably identify what is code and what is data, besides tracing through the execution path directly. The various JIT based languages out there work around this by explicitly defining their bytecode in a way to help with this, along with mandating a large amount of metadata beyond that to identify classes, types, etc.

Dynamic tracing can work well, but that still requires a rather non-trivial framework underneath to manage the code snippets getting tossed around, and will never quite match up with a real native setup unless the two architectures are particularly well suited to each other. Modern compilers will also generate some rather suprising code at times that will screw up naive tracers -- tracing down both paths of a conditional jump for instance will get you into a lot of trouble if the compiler knew ahead of time that one path was never going to be used, and it decided to dump random data in there.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: A question about binary translation/simulation

Post by Ethin »

Ah, thank you! I was just wondering if something like this would be a good idea to try for. My kernel does have a built-in disassembler that isn't being used that much, so I thought I might as well put it to use. ABI translation is definitely a problem, and you only gave me more. I imagine I'd also then need to pretty rigidly define syscalls if I were to implement that, too, unless I also wanted to translate syscall numbers.
reapersms
Member
Member
Posts: 48
Joined: Fri Oct 04, 2019 10:10 am

Re: A question about binary translation/simulation

Post by reapersms »

Syscall translation is probably easiest (for some definition of easy) to do by way of a shim library implementing the original syscalls and such in the context of your OS. The syscall code itself is unlikely to look remotely the same between architectures, though recent history has had things converging on a dedicated SYSCALL instruction. If the OS on either side of the divide are anything other than identical releases, there will likely be rather important semantic differences between even similarily named and used syscalls, if there are even equivalents. Using POSIX vs Win32 as an example, sure close() and CloseFile() or CloseHandle() might be pretty similar, but fork() is going to cause trouble. If solving that was easy, cygwin would have been a lot shinier.

Even between two nominally POSIX systems you can run into a number of environmental issues.

If you can rely on dynamic linking, just replacing, say, libc would take care of things, but you can't really guarantee that due to static linked binaries and macro expansion/inlining.

There are certainly a number of traps along the way where an approach works great for some combination of small/trivial/toy binaries, but anything of substance will expose the nasty issues at the worst possible time.

If you do decide to go down this rabbit hole, I strongly suggest formalizing the scope of what sort of program you expect to run seamlessly up front, such as "simple console-only, without much in the way of fancy IO piping" as a starting point.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: A question about binary translation/simulation

Post by linguofreak »

reapersms wrote:The process you're thinking of here has been done to various degrees, but is rather difficult to pull off in the general case. The magic names to search for for more in-depth information beyond this post would be "static recompilation", "dynamic recompilation", "static binary translation", and "dynamic binary translation". The most noteworthy attempts I can recall have all been Apple related, with their transition approaches for handling the change from 68k->PPC, PPC->x86, and x86->Arm.
The 68k -> PPC transition is especially noteworthy in that they left a good chunk of the legacy 68k code in their firmware and OS in place and interpreted it at runtime (not sure if they were JIT compiling it or just running it through a plain interpreter). So it wasn't just legacy applications that were running non-native, it was an appreciable fraction of kernel mode code.
The general case tends towards straight emulation, which is what QEMU is going to do. The program in question would run inside its sandbox, which would be running a relatively complete system, and not be a particularly seamless transition from there to the outside world.
Actually, qemu-user exists, which doesn't do any hardware emulation at all: it just does dynamic recompilation and thunks system calls for endianness / word-width / other ABI stuff.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: A question about binary translation/simulation

Post by Ethin »

I doubt my OS will have a GUI server any time soon or anything like that. When I get to running apps I'll definitely be starting out with console apps -- simple ones in particular.
Post Reply