A question about the PE/COFF and ELF formats
Posted: Thu Apr 07, 2022 9:15 am
So in my compilers class we've entered the code generation phase. I won't get too hung up on it if you guys think I shouldn't do this, but my professor gave the go-ahead and I thought it would be a good exercise to learn the PE/COFF and ELF binary formats.
My professor is having us generate a C/C++ file containing inline assembly using the MSVC __asm block statement (__asm { ... }). But I'm on Linux, so don't have access to MSVC, so I'd need to either (1) use weird GCC hacks, (2) generate the assembly in a separate .S/.asm file and manually assemble it with an external assembler, or (3) generate the full binary myself. I'm opting for the third option, to (as I noted above) learn the binary formats and how building one actually works under the hood (without all the extra stuff a compiler does like debug information generation). I've done some digging and have settled on using asmjit for instruction generation (there's no way I'm pulling in LLVM, but if anyone knows a better library than asmjit, please do tell) and COFFI/ELFIO for PE/COFF/ELF binary generation. My question is... What makes a legal PE/COFF or ELF binary? What sections do I absolutely require, and what can I leave out?
I'm not looking to add in a bunch of fancy stuff to this (though the code we're supposed to create is fully relocatable, yay, so I'd love to learn how to add that in), just the basics. The compiler can't call out to external libraries/call over the FFI boundary, so the ABI doesn't really matter (at least, I'm pretty sure I can forget the ABI since all you can call are functions you've specifically declared and defined); I just want to know what I absolutely need to add (excluding instructions obviously) to create a fully working binary that, unless I don't write an instruction properly, won't throw any signals or cause problems. I need to learn both PE/COFF/ELF because I need to be able to debug the generated code to ensure that the assembly is correct and doesn't misbehave, and I'm not very skilled in using LLDB (I'm more experienced in GDB). If you guys think I shouldn't do it and should just opt for the (much simpler) option of generating just inline assembly and letting GCC/Clang do all the heavy lifting, or if you have any other advice, I'm definitely all ears. I'm mainly doing the standalone binary generation for the hell of it and as a major learning opportunity that I thought I might as well grab with both hands, particularly since my professor is encouraging it and thinks that it would be a good way of earning extra credit (though Idk if he actually will give me extra credit for that (though as far as I know I'm the only one who's considered doing this), we'll see).
My professor is having us generate a C/C++ file containing inline assembly using the MSVC __asm block statement (__asm { ... }). But I'm on Linux, so don't have access to MSVC, so I'd need to either (1) use weird GCC hacks, (2) generate the assembly in a separate .S/.asm file and manually assemble it with an external assembler, or (3) generate the full binary myself. I'm opting for the third option, to (as I noted above) learn the binary formats and how building one actually works under the hood (without all the extra stuff a compiler does like debug information generation). I've done some digging and have settled on using asmjit for instruction generation (there's no way I'm pulling in LLVM, but if anyone knows a better library than asmjit, please do tell) and COFFI/ELFIO for PE/COFF/ELF binary generation. My question is... What makes a legal PE/COFF or ELF binary? What sections do I absolutely require, and what can I leave out?
I'm not looking to add in a bunch of fancy stuff to this (though the code we're supposed to create is fully relocatable, yay, so I'd love to learn how to add that in), just the basics. The compiler can't call out to external libraries/call over the FFI boundary, so the ABI doesn't really matter (at least, I'm pretty sure I can forget the ABI since all you can call are functions you've specifically declared and defined); I just want to know what I absolutely need to add (excluding instructions obviously) to create a fully working binary that, unless I don't write an instruction properly, won't throw any signals or cause problems. I need to learn both PE/COFF/ELF because I need to be able to debug the generated code to ensure that the assembly is correct and doesn't misbehave, and I'm not very skilled in using LLDB (I'm more experienced in GDB). If you guys think I shouldn't do it and should just opt for the (much simpler) option of generating just inline assembly and letting GCC/Clang do all the heavy lifting, or if you have any other advice, I'm definitely all ears. I'm mainly doing the standalone binary generation for the hell of it and as a major learning opportunity that I thought I might as well grab with both hands, particularly since my professor is encouraging it and thinks that it would be a good way of earning extra credit (though Idk if he actually will give me extra credit for that (though as far as I know I'm the only one who's considered doing this), we'll see).