Hi Brendan,
My comments are below...
Brendan wrote:
Agreed - only when you have to deal with PIC, but the design is so messed up you can't use adjust segment bases, or use proper relocation (like DLLs and shared libraries), or use RIP relative addressing. That adds up to never (or at least it should)...
This statement assumes a lot: "The design is so messed up". I will not go into debates here since I also favor relocations but please note that your view is not the only view. Others consider your view to be "very messed up"
.
For one I have heard Agner stating that this delta PIC method is vastly "superior" to the method used by Linux/ Unix for shared library today (GOT, PLT) and once assemblers will be able to make this delta trick then ONLY "delta PIC" should be used ....
So walk carefully here...
Also, segment bases are obsolete on modern hardware since almost everybody uses flat memory models with paging and x64 architecture deprecated segments all together. Again do not get me wrong, I liked segments and considered them an evolution when compared with old paging but apparently the industry goes in another direction.
As for "proper relocations" (yes I use them) they have problems also. For example a DLL once relocated can not be shared anymore because the code is actually changed for that process instance and memory layout. You can get "lucky" with DLL's and relocations but that is just "luck" not "design" and after all that is why Unix/Linux do not favor DLL's and use GOT/PLT.
As for RIP relative addressing: it is available only on x64 but not on x32 and yes it is a very useful feature but the issue with PIC code remains because it is conceptual and it emerges from a need to know your absolute run time location and this can not be known at compile time...
It's not one such misprediction - it's every return address that the CPU remembered. For current Intel CPUs there's a 16 entry return address stack so worst case is 16 mispredictions, and for AMD K10 it's 24 entries. Of course in practice it depends on how deep calls are nested, so if the CPU's return address stack is empty (e.g. something like a boot loader's "main" which never returns) then there won't be any mispredictions.
Wrong.
The hardware used to perform this is associative not a simple stack.
The idea of providing information is that people can think about it and make their own mind up about what is/isn't appropriate for their code.
True.
But here we should be masters of programming and hardware understanding and one should be able to find and understand such issues himself. Spoon feeding people on an OS dev forum is a receipt for later frustrations in your target for "benevolent help"
"The road to hell is paved with good intentions"
Do not try to be nice by giving information. instead try to understand what would be better for making your interlocutor understand by himself and hence what kind of answer will make him think for himself and later one become a creator that is fully independent of your or me and our "knowledge"
Giving information in NOT the best solution. Most of the time the interlocutor is wrong in his original question and giving him the technically correct answer (as you did) is not helping...not helping at all.
For sure the OP never ever thought of the implications we are discussing here. Apparently he was just wanting to get the IP of an exception in an hander and was not aware that in this case is pushed on the stack of the exception handler.
This is a simple issue and the OP did NOT have the dedication to read correctly the Intel manuals on this issue; instead he went on IRC to get a fast / short answer probably being intimidated but the technical things that we explored here on his thread. Hence I foretell great frustrations for him in the future if he keeps this OS dev road.
BTW, for the OP: with your "simple" solution there is a "problem". IF you read the Intel manuals you will find some surprise about some kind of "exceptions" and items pushed on stack
Of course "only" IF you read...
If you're aware of the consequences, then you deserve the consequences. If you're not aware of the consequences, then finding out about the consequences wouldn't be a bad thing.
True, but it is much better to find out yourself if possible. It leaves a long lasting memory and it improves your intelligence a lot. Just being "told" that moves you closer to a "goal" but it reduces your intelligence... pick one of the two options
According to
sandpile.org AMD CPUs have had a return address stack since the K6, Intel CPUs have since P6, NexGen Nx586 and Nx686 have it, all Cyrix CPUs have it, Rise mP6 has it and all VIA/Centaur CPUs have it. In all cases the return address stack is between 4 and 24 entries (inclusive). I'm not sure about Transmeta (sandpile just says "Branch prediction: Yes" without including any details). There's also a few rare embedded systems and Intel's Atom (couldn't find information for these).
According to my hands on experience you live in a dream. More exactly "dream land x86". The ARM and other CPUS like Fujitsu, Motorola, and STmicro dominate the embedded world with units sold (and of course programmed for) that vastly outperform any x86 "world". The number of IPods, smart mobile phones, TV set top boxes, hardware routers and firewalls mp3 players, GPRS and other hand held devices are huge.
ARM CPU's normally do not have CALL/RET for more than ONE level. More that that you have to do "by hand" by saving the return address your self.
STMicro have an stack based architecture that resembles Forth machines more than a normal CPU and they jump around in primitives and hence a return is almost never done...The 8 bit or 12 bit automotive CPU's have small programs space and embedded stacks and do not have such predictions...etc etc
And "branch prediction" is NOT the same as CALL/RET optimizations and predictions.
However, this information isn't a good representation - you'd need to account for the number of each of these CPUs that's still in use. For example, computers with Intel Pentium 4 are more likely to be in use than computers with Intel 80486 CPUs. I'd estimate that at least 90% of computers that are still in use have a return address stack that has 8 or more entries.
For Desktop PC's that is surely true.
Nobody really wanted to talk about 386 or 486 CPUS here... only your mind made that trip in the past.
However some have pointed out that "rules" change and in the same way as the ARM /RISC CPU's do not have this kind of hardware optimization for CALL/RET today ... we might live to see a future when this is dropped from x86.
Of course "dreaming" of the future in no way better than dreaming of the past... both are dreams.
However I am starting to make my own CPU here and there is a big probability that it will NOT have this optimizations for CALL/RET. It all depends on how much space I will have on silicon and how to make the best use of this space ... time will tell.
Best regards,
Bogdan