Feedback request: planning on using non-standard ABI

Hellbender · Post by **Hellbender** » Fri May 01, 2015 2:50 am

Hi.

I'm a long time software developer, starting my first OS project for fun and research.

I'm still on the planning phase and would like to have feedback on the following:

1) I plan to use a non-standard calling convention (arguments, local variables not in call stack). This is to avoid buffer overflows from corrupting return values. Is it going to be super difficult making gcc to support my own calling convention? I have some compiler development experience but I'm not familiar with gcc internals (yet).

2) I plan to limit the maximum frame size to one page (4k), maybe allowing larger frames for leaf calls. I'd do this so that I can prepare call frame memory with every other page not present, to avoid buffer overflows corrupting other frames. Do you think that I'm going to face huge porting issues with existing software (I'd like to compile at least gcc and related build tools)? If this is a major no-no, I could allocate larger frames from heap behind the scenes..

Thanks for any comments.

Combuster · Post by **Combuster** » Fri May 01, 2015 4:48 am

I don't know a thing about GCC internals, but if you can devise for yourself a standard prologue and epilogue you can work it out. In fact, it might be wise to actually generate a fully separate stack for data. That means you can call/ret on the original stack using *SP, and have a separate stack where you can use *BP for, and interleave present and non-present pages on that new stack. In other words, you might end up with something like

Code: Select all

SUB EBP, 8192
(...)
CALL my_new_abi_function
(...)
ADD EBP, 8192
RET

One advantage of having a separate stack is that thanks to EBP preservation you can transparently call using the traditional ABI from your version, at the cost of the subsequent stackframes appearing on the callstack.

Only using the page interleaving still gives options for smashes to access the return address because it has to be somewhere - either in the current frame by smashing a local pointer, or in the return-to frame by smashing a called pointer, so I wouldn't call your method necessarily safer than using a canary, it just hard-points you at which line of code does the smash instead of doing a post-mortem diagnostic.

Large local array allocations can happen, but I don't really expect many of them. You might get away without them for busybox-class applications, but you probably won't be able to compile GCC itself under your own ABI without adding support for such. One problem though would be that any such arrays require an additional level of indirection beyond what they are used to.

Candy · Post by **Candy** » Fri May 01, 2015 5:05 am

The latter you can easily check for any program - there's a compiler flag on GCC to warn (error) on a too large stack frame, with a settable limit. Downside: I know this because on the large project I'm working on at work we recently reduced it to 40k, and fixed all offenders (about 20). Reducing it to 10k would imply fixing 600 more - and increasing quickly as you reduce it to 4k or so. This may imply that it's an unrealistic limit.

WRT your own ABI: If you use x86_64, the first 6 parameters are already register-passed, more if you use floating point or SSE types. If you use that ABI and limit the maximum parameter counts such that no spilling occurs, you basically get what you want except with a standard ABI.

Hellbender · Post by **Hellbender** » Fri May 01, 2015 5:17 am

Combuster wrote:That means you can call/ret on the original stack using *SP, and have a separate stack where you can use *BP for, and interleave present and non-present pages on that new stack.

Yes, this was the thing I had in mind.

Combuster wrote:Only using the page interleaving still gives options for smashes to access the return address because it has to be somewhere - either in the current frame by smashing a local pointer, or in the return-to frame by smashing a called pointer

I was thinking that since there is no user data in the callstack (where the return address is), there are no pointers to the callstack memory in any typical code. Thus, any buffer overflow could not overwrite return address, because callstack is separated from framestack by (large number of) non-present pages. Am I missing something in this line of thought?

Combuster wrote:Large local array allocations can happen, but I don't really expect many of them. You might get away without them for busybox-class applications, but you probably won't be able to compile GCC itself under your own ABI without adding support for such.

This sounds good enough for me. It's quite a few years until I have to worry about the GCC =)

Combuster wrote:One problem though would be that any such arrays require an additional level of indirection beyond what they are used to.

My plan was something like the following speudo-code:

Code: Select all

oldEBP = EBP;
push(EBP);
EBP = alloc(frame_size);
memcpy(EBP, oldEBP, arguments_size);

...

free(EBP);
pop(EBP);

Hellbender · Post by **Hellbender** » Fri May 01, 2015 5:24 am

Candy wrote:The latter you can easily check for any program - there's a compiler flag on GCC to warn (error) on a too large stack frame, with a settable limit. Downside: I know this because on the large project I'm working on at work we recently reduced it to 40k, and fixed all offenders (about 20). Reducing it to 10k would imply fixing 600 more - and increasing quickly as you reduce it to 4k or so. This may imply that it's an unrealistic limit.

This is great info, thanks! I'm gonna collect some frame size statistics to see what makes sense.

Brendan · Post by **Brendan** » Fri May 01, 2015 7:04 am

Hi,

On 32-bit 80x86, there really isn't enough registers; and there's problems with variadic functions and functions with more arguments than you have registers. Passing arguments in registers can make code faster, but it can also make code slower (e.g. caller saves "in use" values in registers onto the stack before it can store arguments in those registers; then callee pushes arguments from registers onto stack so it can use the registers itself); and this is partly why some registers are "callee preserved" (so the caller knows it won't need to save "in use" values in those callee preserved registers).

Don't forget that for GCC you can already tell it to use the "fastcall" calling convention (which passes the first 2 arguments in registers); and (for performance) this is possibly a good compromise between passing too many arguments in registers (and harming performance) and passing too many arguments on the stack (and harming performance).

In theory the best approach is "no ABI"; as this allows the compiler to customise/optimise the calling used by each function individually to suit the function itself and any callers the compiler knows about; and get the fastest code for each specific case. In practice most modern compilers already support this, but only for static functions or if/when you use whole program optimisation (and not for dynamically linked/shared libraries).

Also; by changing the ABI significantly you'll probably break the compiler's code optimisers, and will need to fix them. You're not talking about minor changes to prologue/epilogue and "function call generation" alone.

Of course you will be breaking more than just GCC (e.g. debuggers and linkers won't understand your calling convention either).

Hellbender wrote:I was thinking that since there is no user data in the callstack (where the return address is), there are no pointers to the callstack memory in any typical code. Thus, any buffer overflow could not overwrite return address, because callstack is separated from framestack by (large number of) non-present pages. Am I missing something in this line of thought?

Instead of having buffer overflows (that can potentially corrupt data and return addresses on the stack and cause security vulnerabilities), you'll have buffer overflows (that can potentially corrupt data on a stack and cause security vulnerabilities). It doesn't prevent or solve the problem and only modifies the symptoms. Are you really sure it's worth the hassle?

Hellbender wrote:My plan was something like the following speudo-code:
Code: Select all
oldEBP = EBP;
push(EBP);
EBP = alloc(frame_size);
memcpy(EBP, oldEBP, arguments_size);

...

free(EBP);
pop(EBP);

Which calling convention will "memcpy()" use? Will you end up with infinite recursion (because each call to "memcpy()" requries a call to "memcpy()")?

Note: It'd make more sense to use EBP as a "top of data stack" and ESP as a "top of return stack"; so that instead of calling malloc and free you can just add/subtract the size from EBP.

Cheers,

Brendan

Combuster · Post by **Combuster** » Fri May 01, 2015 7:46 am

On another note, guardpaging the stack doesn't include guardpaging the heap, which would potentially imply that you lose that category of isolation whenever you overflow a single stack slot. And guarding every malloc call is a bad idea when all your 16-byte linked list items suddenly get blown up to 4K and blow out your RAM a factor 256 faster, so you'd have to be smart with that as well.

Brendan wrote:Note: It'd make more sense to use EBP as a "top of data stack" and ESP as a "top of return stack"; so that instead of calling malloc and free you can just add/subtract the size from EBP.

I gathered from his follow-up post that was the original plan.

OSDev.org

Feedback request: planning on using non-standard ABI

Feedback request: planning on using non-standard ABI

Re: Feedback request: planning on using non-standard ABI

Re: Feedback request: planning on using non-standard ABI

Re: Feedback request: planning on using non-standard ABI

Re: Feedback request: planning on using non-standard ABI

Re: Feedback request: planning on using non-standard ABI

Re: Feedback request: planning on using non-standard ABI