OSDev.org

Posted: **Thu Oct 01, 2020 9:48 am**

How would my kernel run executables? I want to be binary compatible with Linux.

Posted: **Thu Oct 01, 2020 10:17 am**

PavelCheckov wrote:How would my kernel run executables?

Load the first part (1kB should suffice) of the file from disk, verify the ELF header (ELF magic and machine class and type, and file type must be ET_EXEC or ET_DYN), set up file mappings according to all the LOAD segments. Set up the stack according to the ELF entry point rules (argc first, then the argv pointers, a NULL, the environment pointers, a NULL, then the aux vectors. Of course, those pointers need to point somewhere, so you need to copy the strings into the address space as well). If a PT_INTERP segment is present, interpret the contents as a file name and load that one as well. Then just start at the entry point.

PavelCheckov wrote: I want to be binary compatible with Linux.

You need the same syscalls with the same behaviors, and the same VFS, then. And once you have all that, what's the point of your OS?

Posted: **Thu Oct 01, 2020 12:20 pm**

Don't you also need to do some relocation and load any needed dynamic libraries?

Posted: **Thu Oct 01, 2020 12:33 pm**

I would recommend to keep "linux compatible" as a far future target and first start simple. Load flat binaries, and if that works load ELF and make relocation and if that works, too, create linux syscalls and then linux libraries. At least that's how I approach it.

Greetings
Peter

Posted: **Thu Oct 01, 2020 12:55 pm**

iansjack wrote:Don't you also need to do some relocation and load any needed dynamic libraries?

That's for the interpreter to do, not the kernel. Separation of concerns. Plus, there are ready-made ELF interpreters out there (e.g. musl, which has one of the simpler ones).

Posted: **Thu Oct 01, 2020 2:14 pm**

I agree with @nullplan, except one thing (on that I'm with @iansjack):

nullplan wrote:That's for the interpreter to do, not the kernel.

Not necessarily. Loading shared libraries by an interpreter could be a huge security risk. People usually don't think about that, just get on with it, but the truth is, by denying userspace code to map executable pages altogether, you can eliminate a big deal of attack vectors at once. (Not to mention how easy it is to inject a keylogger with LD_PRELOAD.)

However user mapped executable pages are also required for JIT compiled code (unless the bytecode compiler too is in the kernel), so the decision is on you: would you prefer security or performance?

Cheers,
bzt

Posted: **Thu Oct 01, 2020 2:29 pm**

bzt wrote:I agree with @nullplan, except one thing (on that I'm with @iansjack):
nullplan wrote:That's for the interpreter to do, not the kernel.
Not necessarily. Loading shared libraries by an interpreter could be a huge security risk. People usually don't think about that, just get on with it, but the truth is, by denying userspace code to map executable pages altogether, you can eliminate a big deal of attack vectors at once. (Not to mention how easy it is to inject a keylogger with LD_PRELOAD.)

Security theater again. So instead of separating the code that parses the dynamic information from the ELF executable out into a different program, you would prefer to run it at ring 0? As I said, musl has one of the simpler interpreters, and it is plenty complicated enough to give you a headache. I've also looked at the one in glibc, and now I know what horror looks like in C. Imagine running that stuff in your kernel! This is really hard to get right, and almost impossible to get secure. With an interpreter, the only thing that a user can achieve with a prepared executable is to exploit himself, so there it doesn't matter, but if the interpreter is running at ring 0, this changes dramatically. In general, you should have as little stuff in your kernel as possible (but no less).

The dynamic interpreter has indeed been shown to be a security risk in the past, but that was with executables running with elevated privileges (i.e. setuid). Your option would give the interpreter such elevated privilege all the time.

And if you think LD_PRELOAD a security risk, then just build your interpreter without support for it. Or just do it like me and mostly link statically. That reduces attack surface like you wouldn't believe.

Posted: **Thu Oct 01, 2020 3:06 pm**

The question was how to run executables. Whether the relocation is done by the kernel itself or a user program is irrelevant - it still needs to be done. The simple explanation is not enough to answer the OP' s question.

Posted: **Thu Oct 01, 2020 9:48 pm**

iansjack wrote:The question was how to run executables. Whether the relocation is done by the kernel itself or a user program is irrelevant - it still needs to be done.

Let's read that again, shall we?

PavelCheckov wrote:How would my kernel run executables?

And that is what I have answered. Incidentally, relocations do not need to be processed in static executables, so maybe that is a good place to start.

iansjack wrote:The simple explanation is not enough to answer the OP' s question.

No, of course not. The manner in which the question was asked and the previous posts of the OP did not inspire confidence in me that he appreciates the complexities of what he is asking. A complete answer to his question should probably contain liberal amounts of quotes from the ELF spec, the relevant processor supplements, and the relevant ABI documents. And maybe a link to the wiki page. However, my first answer is enough to get him started, and once he actually has a working static ELF loader, we can continue to more serious detail questions that will undoubtedly come up.

If i had responded to the original question with a litany of detail, this probably would not have helped the OP one bit, except maybe to make them panic at the wall of text and abandon his effort. So I intentionally gave a short answer to match the question.

Posted: **Fri Oct 02, 2020 8:25 am**

You need the same syscalls with the same behaviors, and the same VFS, then. And once you have all that, what's the point of your OS?

I am making a 64 bit version of CP/M.

Posted: **Fri Oct 02, 2020 8:28 am**

nullplan wrote:So instead of separating the code that parses the dynamic information from the ELF executable out into a different program, you would prefer to run it at ring 0?

Yes. It's more secure if only kernel can map executable code, and it maps those read-only. Some architecture can force this in hardware (from user space a page is either executable or writable, but never both, see ARM's WNX feature). Having the dynamic linker in a user space program means you must allow user space to write pages which will be executed later.

nullplan wrote:Or just do it like me and mostly link statically. That reduces attack surface like you wouldn't believe.

No it won't reduce risk at all. What improves security is that user space code can't write into memory that will be executed later. Statically linking won't stop a malicious code to put code in a buffer overflow.

iansjack wrote:Whether the relocation is done by the kernel itself or a user program is irrelevant - it still needs to be done.

That's absolutely true.

Cheers,
bzt

Posted: **Fri Oct 02, 2020 2:35 pm**

PavelCheckov wrote:I am making a 64 bit version of CP/M.

And you want to be compatible with Linux of all things? Good luck with that.

bzt wrote:Yes. It's more secure if only kernel can map executable code, and it maps those read-only.

You're not listening to my concern: i don't think all that complexity belongs in the kernel. If it is there, it is likely to be buggy, and in the best case someone will stumble over one of those bugs and see their program crash. In the worst case, someone will find an exploit and gain ring 0 access.

And what is this "It's more secure" nonsense? Security is not an absolute property you can get a measurement on. What attack do you prevent with this? Buffer overflows are still exactly as devastating (see below), and also there is a thing called non-executable stack that everyone should use, that does what you claim here without the horrendous side effects.

bzt wrote:Having the dynamic linker in a user space program means you must allow user space to write pages which will be executed later.

False. The only executable pages the dynamic linker needs to map come straight from disk files, and are read-only. Except for textrels (an awful hack, that nobody should ever use, and that is unsupported by most dynamic linkers anyway), the relocation targets are always in writable sections. You know, data. In all the ABIs I have read, there was only ever a single one that used writable code sections, and that was the original PowerPC ELF ABI (used an uninitialized PLT section, to be filled out by the dynlinker at load time), and that one was phased out two decades ago.

This is all meaningless anyway, since the OP wants to be compatible with Linux, and Linux allows userspace code to map executable pages, both from files and anonymously, and both writable and write protected.

bzt wrote:No it won't reduce risk at all. What improves security is that user space code can't write into memory that will be executed later. Statically linking won't stop a malicious code to put code in a buffer overflow.

You keep pivoting. A post ago, LD_PRELOAD was the big problem, now it's suddenly buffer overflows. You are right, static linking does not address those. You failed to bring them up before. But your regime doesn't put a stop to them, either. Have you ever heard of ROP (return-oriented programming)? No additional executable pages are needed. Once a buffer is overflowed, control flow can be subverted through already existing code to do whatever it is the attacker needs to do. Stack canaries can be helpful in mitigating against those attacks, but your strict no-exec-mapping regime certainly does not help.

You know what an attacker can do in a dynamically linked process? They can overwrite the GOT entry for printf() to instead point to system(). Or point abort() and exit() to fork() instead. And then suddenly a lot of things will be executed that shouldn't be. This is entirely impossible in a statically-linked program. And your dynlinker-in-kernel regime also fails to prevent that attack.

Posted: **Fri Oct 02, 2020 2:46 pm**

PavelCheckov wrote:How would my kernel run executables? I want to be binary compatible with Linux.

What does your kernel do now? Does it output something to the screen?

Which bootloader do you use?

Does your PC have Legacy BIOS or UEFI?

Greetings
Peter

Posted: **Fri Oct 02, 2020 9:38 pm**

Ok, I've decided to answer in one last post.

nullplan wrote:You're not listening to my concern: i don't think all that complexity belongs in the kernel.

A dynamic linker is not of a big complexity. Not simple, but not particularly complex either (typically no more than few hundred SLoC).

nullplan wrote:You keep pivoting. A post ago, LD_PRELOAD was the big problem

No, I'm not "pivoting". A post ago I wrote, and I quote: "the truth is, by denying userspace code to map executable pages altogether, you can eliminate a big deal of attack vectors at once. (Not to mention how easy it is to inject a keylogger with LD_PRELOAD.)"

Did you read my post at all? Or did you just read the upper-case letters at the end and forget about the rest?

nullplan wrote:And what is this "It's more secure" nonsense?
...
What attack do you prevent with this?

Are you serious? What do you think, why did ARM implement WNX permission in hardware if its just a "nonsense"?

Code: Select all

The architecture also provides controls bits in the System Control Register (SCTLR_ELx) to make all write-able addresses non-executable.

nullplan wrote:False. The only executable pages the dynamic linker needs to map come straight from disk files, and are read-only. Except for textrels (an awful hack, that nobody should ever use, and that is unsupported by most dynamic linkers anyway)

Excuse me? Are you saying that text relocations are unsupported by dynamic linkers? And that they shouldn't be ever used?

nullplan wrote:but your strict no-exec-mapping regime certainly does not help.

Yes, it does help, because the malicious code cannot modify the existing code (rendering all buffer-overflow attacks impossible once and for all), and it cannot return to non-executable data. There's simply only valid, unmodified code to return to.

nullplan wrote:They can overwrite the GOT entry for printf()

No, they can't overwrite, that's the point! Having the dynamic linker in the kernel means GOT is only writeable by the kernel, and for user space it's read-only. Any attempt to change the GOT from ring 3 would trigger an exception.

nullplan wrote:And your dynlinker-in-kernel regime also fails to prevent that attack.

It does prevent all attacks you mentioned. I'm sad to inform you that it is you who misunderstands the concept of a "dynlinker-in-kernel".

Think about the things I wrote above. If you still can't see why a user space dynamic linker is bad for security, then I'm sorry, I'm afraid I cannot help you more.

Cheers,
bzt

Posted: **Fri Oct 02, 2020 11:43 pm**

bzt wrote:A dynamic linker is not of a big complexity. Not simple, but not particularly complex either (typically no more than few hundred SLoC).

We're just going to have to agree to disagree on that one. A dynamic linker is a complex piece of software parsing attacker-controlled data, and it is getting nowhere near my kernel.

bzt wrote:Did you read my post at all? Or did you just read the upper-case letters at the end and forget about the rest?

Let's not get into that discussion or we'll be here all week.

bzt wrote:Are you serious? What do you think, why did ARM implement WNX permission in hardware if its just a "nonsense"?

Nice evade. You didn't actually answer my question. Nice of ARM to prevent data execution in a centralized way, but x86 has had data execution prevention for a while now as well. And that page doesn't say why you shouldn't allow userspace to map executable pages.

By the way, the "nonsense" was to say "It's more secure". There is no dipstick for security, no pressure gauge, no measuring tape. What attacks will your regime prevent? And if it is just good old stack smash classic, then data execution prevention already prevents this, and attackers are working around it with ROP.

bzt wrote:Excuse me? Are you saying that text relocations are unsupported by dynamic linkers? And that they shouldn't be ever used?

YES! Textrels are an awful hack only present in non-PIC code that has been linked dynamically. PIC code actually manages to put all the relocations into the data section, and the actual code section remains unaltered. This also means that the dynamic linker only has to support a limited number of relocation types, compared to the link editor. Textrels mean you are writing into the text section, which means it cannot be shared between processes anymore. So the one thing shared libraries are trying to do will be undermined by them.

bzt wrote:Yes, it does help, because the malicious code cannot modify the existing code (rendering all buffer-overflow attacks impossible once and for all), and it cannot return to non-executable data. There's simply only valid, unmodified code to return to.

No, no, and no. Malicious code already cannot modify existing code. For instance, here I am looking at all Firefox instances on my system, looking for pages that are writable and executable at the same time, and failing to find any:

Code: Select all

$ for i in $(pidof firefox), do grep wx /proc/$i/maps; done
$

This is on a glibc system, which, I'll remind you, supports lazy relocations. Still no writable and executable page.

This entirely fails to prevent buffer-overflow attacks since those are caused by code writing over the limits of their buffers. Code that is already present in the executable, mind you. Strong limit checks on array access would prevent those, but then programmers would start yammering about performance. In any case, not something a kernel can remedy., And just because there is only already-approved code to return to, does not mean a function has to be run from start to end. This is what ROP does, write a bunch of addresses to the stack that are all code, but only the tails of functions, to achieve some desired effect (e.g. set %rdi to some predetermined value, then jump to system()).

bzt wrote:No, they can't overwrite, that's the point! Having the dynamic linker in the kernel means GOT is only writeable by the kernel, and for user space it's read-only. Any attempt to change the GOT from ring 3 would trigger an exception.

The GOT is in the same segment as the data section. You cannot write-protect one without write-protecting the other. And write-protecting the data section would be kind of counter productive.

bzt wrote:Think about the things I wrote above. If you still can't see why a user space dynamic linker is bad for security, then I'm sorry, I'm afraid I cannot help you more.

Your arrogance astounds me. You fail to even consider that you might be wrong. This was my last reply on the matter.

OSDev.org

Executables

Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables

Re: Executables