Re: Scrolling terminal in software-emulated text mode
Posted: Fri Dec 11, 2020 7:25 pm
Yes it does. See Agner Fog's microarchitecture guide, which explains how the Branch Target Buffer (part of the branch predictor) tracks the destination of both conditional and unconditional jumps and calls, and how some CPUs have an additional branch target predictor for unconditional direct jumps and calls to minimize the penalty when the BTB hasn't already cached the destination address.bzt wrote:"CALL" instruction does not use branch predictions as conditional near jumps like "JE", "JNE" etc.
Or, refer to Intel's optimization manual (which you linked already), section E.2.2.3 on pages E-13 and E-14, which notes that the branch predictor in Sandy Bridge CPUs is used for "Direct calls and jumps" and "Indirect calls and jumps".
I don't see anything about the distance of the call in the pseudocode. "Near" refers to instructions that don't load CS, and "far" refers to instructions that do.bzt wrote:The distance of the call also matters, see Intel Software Developer Manual Vol 2A page 3-126. See section "Operation" with the microcode.
Intel says speculative execution of an indirect near CALL may mispredict the branch destination as the subsequent instruction instead of the actual destination, and LFENCE can be used to prevent those instructions from being speculatively executed while the CPU catches up and determines the correct branch destination. From a performance perspective, a speculative execution barrier will prevent cache pollution in the case where the branch is mispredicted, but it may also hurt performance when the branch is correctly predicted.bzt wrote:Also read about cache handling with and without LFENCE for both near, normal and far calls (hint: there's a difference).
I suspect the rest of the function call overhead will outweigh the CALL instruction, thanks to the branch predictor. Do you have any benchmarks of this?bzt wrote:If that were true, then the compiler wouldn't inline certain functions nor unroll loops for speed optimization. But it does (long before execution, so long that those are done in compile-time).eekee wrote:Note the "prediction" part - this is for conditional branches; unconditional calls must surely have been optimized long before.
Perhaps I'm missing something, but I don't see where the distance between the jump instruction and its destination is mentioned in that paper. It talks a lot about the distance between different jump instructions.bzt wrote:Good read on the topic: http://www.ece.uah.edu/~milenka/docs/mi ... WDDD02.pdf (explains why jump distance matters, and some other things as well)