OSDev.org

Posted: **Sat Dec 06, 2008 7:44 pm**

Sorry if this has been done to death, but how do you push IP to the stack so I can print it out using my hex printing function? I know you can't just use mov ax,ip.

Thanks,
Troy

Posted: **Sat Dec 06, 2008 7:50 pm**

call label
label:

The call will push the IP, just call the next address.

Posted: **Sat Dec 06, 2008 8:05 pm**

Can't believe I didn't think of that, thanks! When an interrupt is fired, in what order does CS:IP get pushed?

This is for my exception handling in real mode.

Posted: **Sat Dec 06, 2008 8:05 pm**

If you are using NASM, shouldn't something along the lines of "push $" work? That way you can do a + or - offset as well. IP should be in the area of $, right?

But since I'm currently on a PPC I don't have a way to test it.

Posted: **Sat Dec 06, 2008 8:10 pm**

Heh, you posted right after me.

I'm doing this for an interrupt so CS:IP is already pushed.

Posted: **Sat Dec 06, 2008 8:37 pm**

Stack layout in interrupts is covered in the Intel manuals...

Posted: **Sat Dec 06, 2008 9:21 pm**

Asked on IRC and I got the answer, my div0 and invalid opcode handlers are now working!

Posted: **Sat Dec 06, 2008 9:23 pm**

stephenj wrote:If you are using NASM, shouldn't something along the lines of "push $" work? That way you can do a + or - offset as well. IP should be in the area of $, right?

But since I'm currently on a PPC I don't have a way to test it.

NO it will not work because $ is a compile time symbol while IP is a run-time symbol. Or even worst it "might" work in some situations but only by mistake.

Use the "call label" followed by a "label: pop register" method.

Posted: **Sat Dec 06, 2008 9:59 pm**

Hi,

bontanu wrote:
stephenj wrote:If you are using NASM, shouldn't something along the lines of "push $" work? That way you can do a + or - offset as well. IP should be in the area of $, right?
NO it will not work because $ is a compile time symbol while IP is a run-time symbol. Or even worst it "might" work in some situations but only by mistake.

Use the "call label" followed by a "label: pop register" method.

If "push $" doesn't work, then you've got a lot more problems than worrying about whether or not you push the correct value. If "push $" doesn't work then EIP isn't what the assembler/compiler expected, and only relative calls and relative jumps will work - things like indirect calls/jumps (e.g. "call [myTable + eax * 4]" and "call eax") and calls/jumps with a fixed target address won't work.

Also, modern CPUs keep track of return addresses to avoid performance problems when it reaches a RET/IRET instruction (for the same reason they use branch prediction). Doing "call label; label:" will mess this up and you end up with the equivalent of branch mis-predictions for subsequent RET/IRET instructions because the CPU can't guess the target of the RET/IRET correctly anymore. For the same reason, it's also bad to use a fake return (for e.g. doing "push myCS; push myIP; retf" instead of using an indirect far jump).

If you must get EIP without relying on $, then it'd be better to do something like this (to avoid confusing the CPU and causing performance problems):

Code: Select all

    ...
    call getEIP
    ...

getEIP:
    mov eax,[esp]
    ret

Modifying the return address on the stack is also bad (for e.g. "mov dword [esp], newReturnAddress; ret") but it's not as bad because you only confuse the CPU once and subsequent returns would still be correctly predicted.

Cheers,

Brendan

Posted: **Sun Dec 07, 2008 2:02 pm**

Brendan wrote:Hi,
If "push $" doesn't work, then you've got a lot more problems than worrying about whether or not you push the correct value.

Not exactly. As I have said before there is an essential / conceptual difference between $ (a compile time symbol) and IP (a run time symbol).

This is of the essence, "more problems" is just a mind aberration.

If "push $" doesn't work then EIP isn't what the assembler/compiler expected, and only relative calls and relative jumps will work - things like indirect calls/jumps (e.g. "call [myTable + eax * 4]" and "call eax") and calls/jumps with a fixed target address won't work.

Wrong.

First of all you miss the whole point of using such an instruction sequence.
"push $ " is of course an aberration because you can simply write "mov eax,$" if you want the runtime address or in fact use a label like in "mov esi,offset my_label" or even: "my_addr dd $".

After all that is why labels have been invented: to give a symbolic name to an address at compile time. The $ special symbol is normally used only in relative calculation or when you do not need a label name like in: "jmp $" or when obtaining the length of some data like in:

Code: Select all

 
my_string db "A longer string",0 
my_string_len equ $-my_string"

The whole point is that you use such "call delta; delta: pop eax" constructs ONLY when you have to deal with Position Independent Code or "PIC" without relocations. Those are rare occasions but they do exist in practice.

And again: the OS loader can relocate code to a very different address at runtime and it does this so in my OS. Of course using relocations is the most logical option in this case but sometimes having PIC code without relocations is needed.

In this cases the code can be located anywhere in RAM and using $ for this purpose (obtaining run time address) can be fatal since the compiler has no way of knowing the run time location and the result depends on the expression type.

I have implemented both methods in my OS: both PIC and relocations and I favor relocations over tricky PIC but other authors do prefer PIC.

Also, modern CPUs keep track of return addresses to avoid performance problems when it reaches a RET/IRET instruction (for the same reason they use branch prediction). Doing "call label; label:" will mess this up and you end up with the equivalent of branch mis-predictions for subsequent RET/IRET instructions because the CPU can't guess the target of the RET/IRET correctly anymore.

That is true, but since this code is unlikely to be used more than once per application and even so a very limited number of applications or API should ever need to use such tricks then by logical consequences it serves no purpose to optimize such a rare occasion. Besides the CPU will recover fast after one single such miss prediction.

This is the whole idea with optimizations: they give you an advantage in "a majority of cases" and limit your freedom of expression in "the rest of the cases". However people change and mental ideas change over "time" and what is "majority" today might become a "minority" tomorrow.

That is why it was said that: "Optimization are the root of all evil". If I might add: non conceptual optimizations that are based on some specific "hardware trick of the day" are the worst kind of all.

If you use such a tricky construct inside your critical inner loop without understanding it's implications THEN you deserve the consequences.

And let me state that other CPU's do not have this kind of CALL/RET optimizations and some very used CPU's of today do not even have a CALL / RET pair for more than one level

Posted: **Sun Dec 07, 2008 2:35 pm**

Uhh, this has been solved. I just POP'd IP off the stack when my exception handler was fired.

Posted: **Sun Dec 07, 2008 10:13 pm**

Hi,

bontanu wrote:The whole point is that you use such "call delta; delta: pop eax" constructs ONLY when you have to deal with Position Independent Code or "PIC" without relocations. Those are rare occasions but they do exist in practice.

Agreed - only when you have to deal with PIC, but the design is so messed up you can't use adjust segment bases, or use proper relocation (like DLLs and shared libraries), or use RIP relative addressing. That adds up to never (or at least it should)...

bontanu wrote:
Also, modern CPUs keep track of return addresses to avoid performance problems when it reaches a RET/IRET instruction (for the same reason they use branch prediction). Doing "call label; label:" will mess this up and you end up with the equivalent of branch mis-predictions for subsequent RET/IRET instructions because the CPU can't guess the target of the RET/IRET correctly anymore.
That is true, but since this code is unlikely to be used more than once per application and even so a very limited number of applications or API should ever need to use such tricks then by logical consequences it serves no purpose to optimize such a rare occasion. Besides the CPU will recover fast after one single such miss prediction.

It's not one such misprediction - it's every return address that the CPU remembered. For current Intel CPUs there's a 16 entry return address stack so worst case is 16 mispredictions, and for AMD K10 it's 24 entries. Of course in practice it depends on how deep calls are nested, so if the CPU's return address stack is empty (e.g. something like a boot loader's "main" which never returns) then there won't be any mispredictions.

bontanu wrote:This is the whole idea with optimizations: they give you an advantage in "a majority of cases" and limit your freedom of expression in "the rest of the cases". However people change and mental ideas change over "time" and what is "majority" today might become a "minority" tomorrow.

The idea of providing information is that people can think about it and make their own mind up about what is/isn't appropriate for their code.

bontanu wrote:If you use such a tricky construct inside your critical inner loop without understanding it's implications THEN you deserve the consequences.

If you're aware of the consequences, then you deserve the consequences. If you're not aware of the consequences, then finding out about the consequences wouldn't be a bad thing.

bontanu wrote:And let me state that other CPU's do not have this kind of CALL/RET optimizations and some very used CPU's of today do not even have a CALL / RET pair for more than one level

According to sandpile.org AMD CPUs have had a return address stack since the K6, Intel CPUs have since P6, NexGen Nx586 and Nx686 have it, all Cyrix CPUs have it, Rise mP6 has it and all VIA/Centaur CPUs have it. In all cases the return address stack is between 4 and 24 entries (inclusive). I'm not sure about Transmeta (sandpile just says "Branch prediction: Yes" without including any details). There's also a few rare embedded systems and Intel's Atom (couldn't find information for these).

However, this information isn't a good representation - you'd need to account for the number of each of these CPUs that's still in use. For example, computers with Intel Pentium 4 are more likely to be in use than computers with Intel 80486 CPUs. I'd estimate that at least 90% of computers that are still in use have a return address stack that has 8 or more entries.

Cheers,

Brendan

Posted: **Mon Dec 08, 2008 8:54 am**

Hi Brendan,
My comments are below...

Brendan wrote: Agreed - only when you have to deal with PIC, but the design is so messed up you can't use adjust segment bases, or use proper relocation (like DLLs and shared libraries), or use RIP relative addressing. That adds up to never (or at least it should)...

This statement assumes a lot: "The design is so messed up". I will not go into debates here since I also favor relocations but please note that your view is not the only view. Others consider your view to be "very messed up"

.

For one I have heard Agner stating that this delta PIC method is vastly "superior" to the method used by Linux/ Unix for shared library today (GOT, PLT) and once assemblers will be able to make this delta trick then ONLY "delta PIC" should be used ....

So walk carefully here...

Also, segment bases are obsolete on modern hardware since almost everybody uses flat memory models with paging and x64 architecture deprecated segments all together. Again do not get me wrong, I liked segments and considered them an evolution when compared with old paging but apparently the industry goes in another direction.

As for "proper relocations" (yes I use them) they have problems also. For example a DLL once relocated can not be shared anymore because the code is actually changed for that process instance and memory layout. You can get "lucky" with DLL's and relocations but that is just "luck" not "design" and after all that is why Unix/Linux do not favor DLL's and use GOT/PLT.

As for RIP relative addressing: it is available only on x64 but not on x32 and yes it is a very useful feature but the issue with PIC code remains because it is conceptual and it emerges from a need to know your absolute run time location and this can not be known at compile time...

It's not one such misprediction - it's every return address that the CPU remembered. For current Intel CPUs there's a 16 entry return address stack so worst case is 16 mispredictions, and for AMD K10 it's 24 entries. Of course in practice it depends on how deep calls are nested, so if the CPU's return address stack is empty (e.g. something like a boot loader's "main" which never returns) then there won't be any mispredictions.

Wrong.
The hardware used to perform this is associative not a simple stack.

The idea of providing information is that people can think about it and make their own mind up about what is/isn't appropriate for their code.

True.

But here we should be masters of programming and hardware understanding and one should be able to find and understand such issues himself. Spoon feeding people on an OS dev forum is a receipt for later frustrations in your target for "benevolent help"

"The road to hell is paved with good intentions"

Do not try to be nice by giving information. instead try to understand what would be better for making your interlocutor understand by himself and hence what kind of answer will make him think for himself and later one become a creator that is fully independent of your or me and our "knowledge"

Giving information in NOT the best solution. Most of the time the interlocutor is wrong in his original question and giving him the technically correct answer (as you did) is not helping...not helping at all.

For sure the OP never ever thought of the implications we are discussing here. Apparently he was just wanting to get the IP of an exception in an hander and was not aware that in this case is pushed on the stack of the exception handler.

This is a simple issue and the OP did NOT have the dedication to read correctly the Intel manuals on this issue; instead he went on IRC to get a fast / short answer probably being intimidated but the technical things that we explored here on his thread. Hence I foretell great frustrations for him in the future if he keeps this OS dev road.

BTW, for the OP: with your "simple" solution there is a "problem". IF you read the Intel manuals you will find some surprise about some kind of "exceptions" and items pushed on stack

Of course "only" IF you read...

If you're aware of the consequences, then you deserve the consequences. If you're not aware of the consequences, then finding out about the consequences wouldn't be a bad thing.

True, but it is much better to find out yourself if possible. It leaves a long lasting memory and it improves your intelligence a lot. Just being "told" that moves you closer to a "goal" but it reduces your intelligence... pick one of the two options

According to sandpile.org AMD CPUs have had a return address stack since the K6, Intel CPUs have since P6, NexGen Nx586 and Nx686 have it, all Cyrix CPUs have it, Rise mP6 has it and all VIA/Centaur CPUs have it. In all cases the return address stack is between 4 and 24 entries (inclusive). I'm not sure about Transmeta (sandpile just says "Branch prediction: Yes" without including any details). There's also a few rare embedded systems and Intel's Atom (couldn't find information for these).

According to my hands on experience you live in a dream. More exactly "dream land x86". The ARM and other CPUS like Fujitsu, Motorola, and STmicro dominate the embedded world with units sold (and of course programmed for) that vastly outperform any x86 "world". The number of IPods, smart mobile phones, TV set top boxes, hardware routers and firewalls mp3 players, GPRS and other hand held devices are huge.

ARM CPU's normally do not have CALL/RET for more than ONE level. More that that you have to do "by hand" by saving the return address your self.

STMicro have an stack based architecture that resembles Forth machines more than a normal CPU and they jump around in primitives and hence a return is almost never done...The 8 bit or 12 bit automotive CPU's have small programs space and embedded stacks and do not have such predictions...etc etc

And "branch prediction" is NOT the same as CALL/RET optimizations and predictions.

However, this information isn't a good representation - you'd need to account for the number of each of these CPUs that's still in use. For example, computers with Intel Pentium 4 are more likely to be in use than computers with Intel 80486 CPUs. I'd estimate that at least 90% of computers that are still in use have a return address stack that has 8 or more entries.

For Desktop PC's that is surely true.

Nobody really wanted to talk about 386 or 486 CPUS here... only your mind made that trip in the past.

However some have pointed out that "rules" change and in the same way as the ARM /RISC CPU's do not have this kind of hardware optimization for CALL/RET today ... we might live to see a future when this is dropped from x86.

Of course "dreaming" of the future in no way better than dreaming of the past... both are dreams.

However I am starting to make my own CPU here and there is a big probability that it will NOT have this optimizations for CALL/RET. It all depends on how much space I will have on silicon and how to make the best use of this space ... time will tell.

Best regards,
Bogdan

Posted: **Mon Dec 08, 2008 1:10 pm**

Call stack-friendly version:

Code: Select all

...
call mov_eax_eip
...

mov_eax_eip:
 pop eax
 push eax
 ret

Posted: **Mon Dec 08, 2008 4:56 pm**

@bontanu: I think bringing alternative architectures into this discussion is pointless as Brendan is discussing from a specifically x86 perspective. The reason for that bias is the OP question being based on x86.

It's kind of pointless to tell about ARM in a discussion about x86 optomisations, unless there was potential for a switch to ARM (which I don't feel Troy is ready for - please don't take that the wrong way!). To be honest, I prefer ARM development too

OSDev.org

Pushing IP solved

Pushing IP solved

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP solved

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP solved

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP solved

Re: Pushing IP solved

OSDev.org

Pushing IP *solved*

Pushing IP *solved*

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP *solved*

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP *solved*

Re: Pushing IP

Re: Pushing IP

Re: Pushing IP *solved*

Re: Pushing IP *solved*

Pushing IP solved

Pushing IP solved

Re: Pushing IP solved

Re: Pushing IP solved

Re: Pushing IP solved

Re: Pushing IP solved