Is the enter/leave instruction better?

wyj · Post by **wyj** » Mon Oct 26, 2015 2:53 am

for assembly,
is it better to do sth like
foo:
enter
......
leave
ret

or

foo:
push rbp
mov rbs,rsp
sub rsp,sth
......
mov rsp,rbp
pop rbp
ret

which one is better for us to use?

and I can see some different version of
sub rsp,sth
while gcc may even do nothing to rsp.

it is actually unnecessary since you can always read parameters with an offset
but which one is better for us?

Brendan · Post by **Brendan** » Mon Oct 26, 2015 3:58 am

Hi,

In assembly; it's better to not bother (and just use ESP/RSP to find things on the stack) because it's faster and lets you use EBP/RBP as an extra general purpose register (which helps a lot for 32-bit code where there's less other general purpose registers).

Apart from that, the ENTER/LEAVE instructions are better for code size and worse for speed; so you want to use them in things like "only executed once" initialisation code and want to avoid them in code that's executed often.

Cheers,

Brendan

Combuster · Post by **Combuster** » Mon Oct 26, 2015 7:02 am

In practice you never see ENTER being used. LEAVE is short and doesn't do much and therefore is much more optimised from the processor's perspective. It can be found in various forms of optimised code. Not using EBP as a copy of the stack pointer is faster, but it will break stacktrace functionality. Considering you seem to be using 64-bit code, you can even consider using the red zone - as long as it's not part of the kernel.

tlf30 · Post by **tlf30** » Mon Oct 26, 2015 1:12 pm

A really good topic on the matter that was posted some time ago: http://forum.osdev.org/viewtopic.php?t=22683

wyj · Post by **wyj** » Fri Oct 30, 2015 8:19 am

Combuster wrote:In practice you never see ENTER being used. LEAVE is short and doesn't do much and therefore is much more optimised from the processor's perspective. It can be found in various forms of optimised code. Not using EBP as a copy of the stack pointer is faster, but it will break stacktrace functionality. Considering you seem to be using 64-bit code, you can even consider using the red zone - as long as it's not part of the kernel.

yes it is x64, I'm currently write only the so called "leaf function" , with no more than 4 paras, 4 is usually enough for most of the occasions

and if not I will use struct and pointer to avoid stack(laugh)

I hate to count bytes on stack to be honest

and I am sorry but what do you mean by "red zone?"

SpyderTL · Post by **SpyderTL** » Fri Oct 30, 2015 11:18 am

wyj wrote:and I am sorry but what do you mean by "red zone?"

Calling Conventions - System V X86_64

There is a 128 byte area below the stack called the 'red zone', which may be used by leaf functions without increasing %rsp. This requires the kernel to increase %rsp by an additional 128 bytes upon signals in user-space. This is not done by the CPU - if interrupts use the current stack (as with kernel code), and the red zone is enabled (default), then interrupts will silently corrupt the stack. Always pass -mno-red-zone to kernel code (even support libraries such as libc's embedded in the kernel) if interrupts don't respect the red zone.

kzinti · Post by **kzinti** » Fri Oct 30, 2015 12:24 pm

Curious... Seems to me like it would be better to enable red zones in the kernel and properly fix the stack when entering interrupt gates. Anyone has done some testing here?

gerryg400 · Post by **gerryg400** » Fri Oct 30, 2015 11:35 pm

kiznit wrote:Curious... Seems to me like it would be better to enable red zones in the kernel and properly fix the stack when entering interrupt gates. Anyone has done some testing here?

The problem is if you enable red-zone in your kernel and an interrupt occurs stuff gets pushed onto your ring0 stack. If there is a red-zone it will be trashed. Userspace red-zone is okay because when an interrupt occurs nothing is pushed onto the userspace stack and the red-zone is undisturbed.

kzinti · Post by **kzinti** » Fri Oct 30, 2015 11:52 pm

Right... What was I thinking... =)

gerryg400 · Post by **gerryg400** » Sat Oct 31, 2015 12:01 am

kiznit wrote:Right... What was I thinking... =)

Yeah, don't feel too bad. The red-zone has caught plenty. Read this http://forum.osdev.org/viewtopic.php?f= ... t=red+zone

azblue · Post by **azblue** » Fri Dec 18, 2015 7:30 am

Combuster wrote:Considering you seem to be using 64-bit code, you can even consider using the red zone - as long as it's not part of the kernel.

I jut learned about the red zone from this thread (and the other one linked), but I don't understand why it needs to be confined exclusively to long mode; shouldn't it also work in ring 3 protected mode leaf functions? If they're not calling other functions, and an interrupt will switch to another stack, I don't see why long mode is required (or why the red zone would be limited to 128 bytes).

Brendan · Post by **Brendan** » Fri Dec 18, 2015 8:16 am

Hi,

azblue wrote:
Combuster wrote:Considering you seem to be using 64-bit code, you can even consider using the red zone - as long as it's not part of the kernel.
I jut learned about the red zone from this thread (and the other one linked), but I don't understand why it needs to be confined exclusively to long mode; shouldn't it also work in ring 3 protected mode leaf functions? If they're not calling other functions, and an interrupt will switch to another stack, I don't see why long mode is required (or why the red zone would be limited to 128 bytes).

The first thing to understand is that for instructions like "mov rax,[rsp+(-123)]" there's 3 alternatives:

encode the offset (-123) as a sign extended 8-bit immediate
encode the offset (-123) as a sign extended 16-bit immediate and waste 2 extra bytes (one for the extra immediate byte and another for the size override prefix you'd need)
encode the offset (-123) as a sign extended 32-bit immediate and waste 3 extra bytes

The point of the red zone is to make code more efficient by avoiding the need to adjust RSP (e.g. doing "sub rsp,256" to make space, which causes a dependency problem for later instructions that use RSP because they have to wait until the new value of RSP has been calculated); while also increasing the chance that those (shorter, better) "sign extended 8-bit immediate" instructions can be used.

It's this "(shorter, better) sign extended 8-bit immediate" that's responsible for the 128 byte size limit. If the red zone was larger, you'd have to use something less efficient (16-bit or 32-bit immediate), and it'd probably better to adjust RSP instead.

Now, calling conventions...

There is no reason the same red zone stuff couldn't be done for 32-bit code (or 16-bit code). In fact, if you're willing to write your own compiler and ensure that all your shared libraries, kernel API, etc. is designed for it; nothing prevents you from implementing any calling convention you like. The problem here is that it'd break compatibility with the calling conventions that everything has used for about 25 years.

Also note that you only need a strictly defined calling convention for cases where the tools can't optimise the calling convention properly (e.g. because the called function is in a completely different object file, or in a shared library or something). For better tools (e.g. where native code generation is done by a link-time optimiser, and where the calling conventions used by most functions can be optimised properly) the "strictly defined calling convention" wouldn't be used anywhere near as much, and would have far less performance impact. Basically; if you're going to replace standard tools just to improve that strictly defined calling convention; then you're probably solving the wrong problem in the first place.

Cheers,

Brendan

TightCoderEx · Post by **TightCoderEx** » Fri Dec 18, 2015 6:51 pm

I generally use ENTER all the time as most procedures are at least 500 - 750 cycles, so the .8% saving is negligible. There is also a two byte saving for frames greater than 128 bytes and not often, but there have been times when nested frames where handy

Code: Select all

enter 18, 1

. That being said, I'll probably refrain from using it in interrupt handlers, but I can't really see there would be a need for a procedure frame in a handler anyway.

OSDev.org

Is the enter/leave instruction better?

Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?

Re: Is the enter/leave instruction better?