That's probably what you get for claiming perfect accuracy since 15 years, so I took your 15-year old example. Quite interesting that they couldn't beat the AMDs of that time even though they seemed to have more advanced predictors.Owen wrote:"Because the Pentium M did it that way, all Intel processors do it that way"?
And because I'm already in the mood of bashing concepts containing absolutes, here's the more up to date version:
Let's troll the predictor:Nehalem and above implement Macro-op fusion, in which "dec reg, jnz backwards", "cmp reg, const, jnz backwards" pairs can be fused into one micro-op. Especially in the former case, tell me why it would be difficult for the processor to correctly predict that every time? It would surprise me entirely if Intel weren't predicting that correctly.
Code: Select all
(...)
opcode someplace, ecx
mov ecx, [EBP+4] ; loop counter stashed due to register pressure
dec ecx
jnz .restart_loop ;fused op, prediction already needed to happen before exact value of ecx was known.
My original example showed that adding delayed jumps for those cases where the conditional is known well ahead of the jump, you don't need the the predictor to guess it wrong - especially considering how most architectures force the conditional to be evaluated immediately before the jump. That's something static analysis can give you, but not dynamic.Now: Please tell me how you would beat branch-history predictors while using equal silicon area.
But I agree, I should shut up now.