enter and leave instruction in asm

wenn32 · Post by **wenn32** » Wed Oct 27, 2010 4:37 am

hello, i am currently learning asm and please look at this asm program


[section .data]
fmt: db 'First = %d',10,'Second = %d',10,0

[section .text]
global _main

extern _printf

_main:

enter 8,0
mov dword [ebp - 4],123
mov dword [ebp - 8],456
push dword [ebp - 8]
push dword [ebp - 4]
push fmt
call _printf
add esp,12

leave
mov eax,0
ret

what is the meaning of enter and leave instruction?

Combuster · Post by **Combuster** » Wed Oct 27, 2010 4:45 am

To create (enter) and discard (leave) a stack frame. See the intel manuals (Intel 2A) for a detailed description.

bewing · Post by **bewing** » Wed Oct 27, 2010 2:43 pm

Well, the manuals don't do a very good job of telling you why you might want to create a stack frame.

When a function gets called in C, for example, the arguments get pushed onto the stack. Then the function gets called. The stack will be used lots more in just a second, so many programmers think it is a good idea to save a copy of ESP at this moment. The EBP register was made for doing exactly that. So you do "PUSH EBP; MOV EBP, ESP". That is called "setting up a stack frame pointer", which is EBP. That is what the ENTER opcode does -- it's pretty much a replacement for those two opcodes. But then while the function is running, you can use ESP as much as you want and leave it trashed -- since you saved a good copy of the pointer in EBP. You can also use EBP to easily access the arguments that were pushed onto the stack. LEAVE does a "MOV ESP, EBP; POP EBP".

Brendan · Post by **Brendan** » Wed Oct 27, 2010 6:32 pm

Hi,

bewing wrote:When a function gets called in C, for example, the arguments get pushed onto the stack. Then the function gets called. The stack will be used lots more in just a second, so many programmers think it is a good idea to save a copy of ESP at this moment. The EBP register was made for doing exactly that. So you do "PUSH EBP; MOV EBP, ESP". That is called "setting up a stack frame pointer", which is EBP. That is what the ENTER opcode does -- it's pretty much a replacement for those two opcodes.

Actually, up to 3 opcodes ("PUSH EBP; MOV EBP, ESP; SUB ESP,<space_for_local_variables>").

Ironically, on most CPUs ENTER/LEAVE are implemented in micro-code and it's faster to use 2 or 3 smaller/simpler instructions instead, so most compilers don't use ENTER/LEAVE at all.

Also note that if you replace ENTER/LEAVE with the faster/smaller/simpler alternative instructions and then optimise the assembly (e.g. replace "MOV ESP,EBP" with "ADD ESP,<space_for_local_variables>" and remove the "MOV EBP,ESP", then use ESP instead of EBP to access local variables and input parameters to free up EBP for normal use) you end up with smaller/faster code with no stack frame.

Cheers,

Brendan

JamesM · Post by **JamesM** » Thu Oct 28, 2010 2:17 am

Ironically, on most CPUs ENTER/LEAVE are implemented in micro-code

Every instruction is microcoded on every CPU since the 1990s.

wenn32 · Post by **wenn32** » Thu Oct 28, 2010 8:42 am

thanks!

Brendan · Post by **Brendan** » Thu Oct 28, 2010 7:36 pm

Hi,

JamesM wrote:
Ironically, on most CPUs ENTER/LEAVE are implemented in micro-code
Every instruction is microcoded on every CPU since the 1990s.

Simple instructions are decoded directly into a small number of micro-ops (typically 1 micro-op). Complex instructions aren't, and (for the sake of over-simplifying) are a little bit like miniature subroutines stored in microcode ROM (or "microcoded") rather than actual instructions that are executed directly (quickly).

From Intel's Optimisation Reference Manual:
"Assembler/Compiler Coding Rule 40. (ML impact, M generality) Avoid using complex instructions (for example, enter, leave, or loop) that have more than 4 uops and require multiple cycles to decode. Use sequences of simple instructions instead.

Complex instructions may save architectural registers, but incur a penalty of 4 uops to set up parameters for the microcode ROM."

Cheers,

Brendan

Brynet-Inc · Post by **Brynet-Inc** » Fri Oct 29, 2010 5:00 pm

berkus wrote:Just don't forget to add you're speaking about Intel cpus, not all cpus.

JamesM works for a very successful designer of semiconductors.

Combuster · Post by **Combuster** » Sat Oct 30, 2010 5:24 am

Depending where you put the distinction between microcoded instructions and others, on the Athlon series of AMD there are two distinct decoder systems: directpath and vectorpath. The manuals suggest that the decoding operations are hardwired into the directpath unit, while a the vector unit generates internal opcodes from an internal memory (ROM is technically not the best description).

Thing is, out-of-order execution always causes a need to save some internal state to be dispatched to the various ALU components - at this point, the distinction between microcodes and storage of "simple" control signals is kind of blurred. An ancient processor like a 6502 simply grabs an instruction, load the operands when needed, perform the ALU op, then store the operands where needed. The moment you start pipelining that, you can do the loads where possible in one cycle, the operation in the next, and the store in the third. If you do that out of order, you can just save some control signals for later use. In all cases, there is no technical difference between having a "discrete" lookup table that we label a "microcode rom" that converts an opcode into signal batches, or that it is done by a more efficient logic network that takes advantage of the similarities in instruction formatting - the net effect is, at this level, the same.

Therefore the statement "microcode rom is slow" or "microcoded instructions are slow" is, as a generalisation, wrong.

The difference is how much load is put on the so-called microcode unit. If it can always respond with the same amount of operations, there's no difference. If it has to respond with a variable number of operations, then it can become a bottleneck the moment the amount of instructions dispatched is high compared to the input. The execution engine will then start seeing bunches of operations belonging to one instruction, and then goes idle because it has no other source instruction in the queue it might do in parallel. And that is the situation behind the microcode myth: "complex microcoded instructions break the amount of independent work available to the processor, so that it can no longer do more than one thing at the same time"

Owen · Post by **Owen** » Sat Oct 30, 2010 6:41 am

Also note that, IIRC, simple LEAVEs are fast (at least as good as the simple instructions), but ENTER and complex LEAVEs are expensive and to be avoided.

(This would appear to be corroborated by GCC generating LEAVEs quite often when it doesn't elide the frame pointer altogether)

JamesM · Post by **JamesM** » Sat Oct 30, 2010 9:03 am

berkus wrote:
Brynet-Inc wrote:
berkus wrote:Just don't forget to add you're speaking about Intel cpus, not all cpus.
JamesM works for a very successful designer of semiconductors.
This alone still doesn't mean that ALL cpus are microcoded. If you want superscalar and out-of-order, then yes, unit-specific mops make more sense, but for the microcontroller sort of cpus just dumb direct execution may be more efficient.

Prove me wrong, James.

I'd like to point out that I do not work in the processor division of said company, so my statements have as much research behind them as any of yours.

Every CPU with a pipeline requires one operation to be broken down into multiple micro-ops - LOAD, EXECUTE, WRITEBACK for a very simple system (ignoring instruction fetch because although it is a pipeline stage it obviously doesn't depend on instruction content).

As Combuster rightly mentions, to split an insn into mops, you need what is functionally equivalent to a lookup table. A request to ROM is functionally equivalent to just "a combinitorial function - there's no constraint on how that combinitorial function is implemented. In the x86 it seems some instructions fall through to a ROM (ENTER et al) and others are special-cased for speed. This is what I would expect.

But they're all microcoded, because they all use a pipeline. Even the Cortex-M3 is pipelined, so yes, it is microcoded too.

How the microcode lookup is implemented is an implementation detail!

James

drudru · Post by **drudru** » Fri Oct 21, 2011 8:42 pm

Sooo.....

Why not just transform the instruction into the equivalent fast preamble??

Is it because of a transmeta patent

Brendan wrote:Hi,

JamesM wrote:
Ironically, on most CPUs ENTER/LEAVE are implemented in micro-code
Every instruction is microcoded on every CPU since the 1990s.
Simple instructions are decoded directly into a small number of micro-ops (typically 1 micro-op). Complex instructions aren't, and (for the sake of over-simplifying) are a little bit like miniature subroutines stored in microcode ROM (or "microcoded") rather than actual instructions that are executed directly (quickly).

From Intel's Optimisation Reference Manual:
"Assembler/Compiler Coding Rule 40. (ML impact, M generality) Avoid using complex instructions (for example, enter, leave, or loop) that have more than 4 uops and require multiple cycles to decode. Use sequences of simple instructions instead.

Complex instructions may save architectural registers, but incur a penalty of 4 uops to set up parameters for the microcode ROM."

Cheers,

Brendan

miker00lz · Post by **miker00lz** » Sat Oct 22, 2011 2:44 am

yep, as has been mentioned it sets up a stack frame. this is the code to handle them from my x86 emu, so you can see how it works:

Code: Select all

case 0xC8: //C8 ENTER (80186+)
    stacksize = getmem16(segregs[regcs], ip); StepIP(2);
    nestlev = getmem8(segregs[regcs], ip); StepIP(1);
    push(getreg16(regbp));
    frametemp = getreg16(regsp);
    if (nestlev) {
        for (temp16=1; temp16<nestlev; temp16++) {
                putreg16(regbp, getreg16(regbp) - 2);
                push(getreg16(regbp));
        }
        push(getreg16(regsp));
    }
    putreg16(regbp, frametemp);
    putreg16(regsp, getreg16(regbp) - stacksize);
    break;

case 0xC9: //C9 LEAVE (80186+)
    putreg16(regsp, getreg16(regbp));
    putreg16(regbp, pop());
    break;

OSDev.org

enter and leave instruction in asm

enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm

Re: enter and leave instruction in asm