Then explain why the Mill, with its wide-issue, exposed pipeline, with good support for compiler-specified speculative execution, still has run-ahead transfer prediction in hardware.Combuster wrote:By embedding such algorithms into the program when it can't statically predict what will happen?
I mean, if you have an explicit pipeline (like you'd have seen in delay-slot jump architectures), then you can for pretty much all for-loops explicitly inform the CPU where execution will go and have zero mispredictions - where every other CPU with a branch predictor would fail at least at the exit condition. Similarly, you can explicitly encode branch predictors for if statements, or even run a part of both branches until the jump kicks in. Being VLIW easily gives you the option to do such parallelism for free when you're doing typical dependent computational series anyway.
Bottom line: there's no such thing as the CPU being superior to the compiler.
The Mill: a new low-power, high-performance CPU design
Re: The Mill: a new low-power, high-performance CPU design
Re: The Mill: a new low-power, high-performance CPU design
Because the compiler is not as perfect yet as can be. But the direction is clear - future compilers will eliminate many parts of the current processors. And even it is possible to leave some simple algorithms in the processor, the optimization is not as straight as just dumb removing parts.Rusky wrote:Then explain why the Mill, ... still has run-ahead transfer prediction in hardware.
Re: The Mill: a new low-power, high-performance CPU design
You offered a schema to show your reasoning. I provided an instance of your schema! The reasoning wasn't mine, it was yours! You can read about schemas here.Combuster wrote:Substitute optimum for Y? Like I said, your reasoning is hilarious.bwat wrote:You've got a problem with thinking. By your reasoning, we can state that because there is no better than some optimum there can be no optimum. Just substitute "optimum" for your Y. Nonsense!Combuster wrote: 4: Therefore there exists no X better than Y (Which by induction means there exists no Y, or that the halting problem prevents any form of analysis to be created in the first place)
Again your response to someone having a different opinion is overly emotional and you end up going on the attack. You're free to behave like this if you want but you're not free to avoid the consequences of your actions. People are judging you. Do you want to be known for this?Combuster wrote: 6. You lied.
7. You lied. Again.
Every universe of discourse has its logical structure --- S. K. Langer.
Re: The Mill: a new low-power, high-performance CPU design
I think you really should have a read of the book Principles of Program Analysis I quoted from earlier. From the back cover it says it is aimed at M.Sc. and Ph.D. students. I don't know what sort of CS education you've had but it's pretty readable. The crux of the matter is the first quote from the book I gave and now with boldface to make it even easier:embryo wrote:Because the compiler is not as perfect yet as can be. But the direction is clear - future compilers will eliminate many parts of the current processors. And even it is possible to leave some simple algorithms in the processor, the optimization is not as straight as just dumb removing parts.
You cannot ignore this!One common theme behind all approaches to program analysis is that in order to remain computable one can only provide approximate answers.
Every universe of discourse has its logical structure --- S. K. Langer.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: The Mill: a new low-power, high-performance CPU design
The whole exercise was that I encoded your statements into a schema.You offered a schema to show your reasoning.
But apparently you're always trying to find out new fake arguments to win whatever debate you accidentally got in.
Re: The Mill: a new low-power, high-performance CPU design
In which case you did it incorrectly. Please, reread and try to understand that I have never claimed dynamic analysis is impossible. Never! However you are stating that I did. I have only made claims about decidability (with references).Combuster wrote:The whole exercise was that I encoded your statements into a schema.You offered a schema to show your reasoning.
Again please reread and try to understand. If you don't understand then you can ask questions but don't put words into other people's mouths.
Every universe of discourse has its logical structure --- S. K. Langer.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: The Mill: a new low-power, high-performance CPU design
I should've made that "fake or irrelevant"Combuster wrote:But apparently you're always trying to find out new fake arguments to win whatever debate you accidentally got in.
Re: The Mill: a new low-power, high-performance CPU design
Please ignore my words and focus on the Turing paper and the quotes from the book I gave. They give the same argument. Are you saying these are fake too?Combuster wrote:Combuster wrote:But apparently you're always trying to find out new fake arguments to win whatever debate you accidentally got in.
Every universe of discourse has its logical structure --- S. K. Langer.
Re: The Mill: a new low-power, high-performance CPU design
This is the only forum I know where people start fights over facts rather than opinions.
Could we have a forum rule for that, or should some people just grow up?
Could we have a forum rule for that, or should some people just grow up?
Re: The Mill: a new low-power, high-performance CPU design
Do somebody offer any incomputable solution? We are talking about an algorithm to deal with runtime information. What is wrong with the algorithm? What do the generalization above have in common with the algorithm? And after reading the proposed book ten times I still will ask you same question - if 1+1=2 then why we should care about approximate answers? Can you show us a connection from 1+1=2 to the approximate answer? Should it be 1+1=3?bwat wrote:One common theme behind all approaches to program analysis is that in order to remain computable one can only provide approximate answers.
Re: The Mill: a new low-power, high-performance CPU design
Hi,
For a different trivial example, consider something like this:
Questions:
Cheers,
Brendan
Your example is so trivial that it's entirely idiotic and pointless (any sane compiler has constant folding).embryo wrote:Do somebody offer any incomputable solution? We are talking about an algorithm to deal with runtime information. What is wrong with the algorithm? What do the generalization above have in common with the algorithm? And after reading the proposed book ten times I still will ask you same question - if 1+1=2 then why we should care about approximate answers? Can you show us a connection from 1+1=2 to the approximate answer? Should it be 1+1=3?
For a different trivial example, consider something like this:
Code: Select all
void foo(void) {
int c;
int x = 0;
while( (c = getc()) != EOF) {
switch(c) {
case 'a':
x /= 3;
break;
case '4':
x += 4;
break;
case 'q':
exit(0);
default:
x++;
break;
}
if(x%7 == 0) {
printf("Bork!\n");
}
}
}
- How often will the "if(x%6 == 0)" branch be predicted correctly by a traditional "ahead of time" compiler?
- How often will the "if(x%6 == 0)" branch be predicted correctly by a modern CPU's branch predictors?
- How often will the "if(x%6 == 0)" branch be predicted correctly by a modern "just in time" compiler?
- For a "just in time" compiler, would the overhead of attempting to correctly predict the "if(x%6 == 0)" branch cost far more than a misprediction would have?
- What about the "switch/case" branches?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: The Mill: a new low-power, high-performance CPU design
One example of when the CPU does it better is eliminating false aliasing and false sharing. Only the CPU knows exactly which addresses are being loaded at a given moment. For the compiler to know this in the general case requires solving the equivalent of the halting problem, so instead it has to be conservative.
There are two solutions- changing the language is a partial solution for false aliasing (admittedly, Fortran and C99 are better at this than older C), or change the CPU.
The Mill changes the semantics of the load instruction to provide the value of memory as of instruction retire rather than instruction issue, allowing loads to be hoisted above even potentially aliasing stores even on other cores. Retire stations (where data is loaded into, as there are no registers) intercept actually aliasing stores if/when that actually happens.
This is not to say that the CPU always wins. Indeed, this example is a CPU feature that enables the compiler to win against OoOE with static scheduling. There's absolutely no reason to make this an all-or-nothing contest- the best outcome is often when the CPU and compiler cooperate (as is done by combining compiler-generated branch prediction with runtime branch history).
There are two solutions- changing the language is a partial solution for false aliasing (admittedly, Fortran and C99 are better at this than older C), or change the CPU.
The Mill changes the semantics of the load instruction to provide the value of memory as of instruction retire rather than instruction issue, allowing loads to be hoisted above even potentially aliasing stores even on other cores. Retire stations (where data is loaded into, as there are no registers) intercept actually aliasing stores if/when that actually happens.
This is not to say that the CPU always wins. Indeed, this example is a CPU feature that enables the compiler to win against OoOE with static scheduling. There's absolutely no reason to make this an all-or-nothing contest- the best outcome is often when the CPU and compiler cooperate (as is done by combining compiler-generated branch prediction with runtime branch history).
Re: The Mill: a new low-power, high-performance CPU design
The point was - there are a number of algorithms with relatively low complexity (1+1 is an extreme) and such complexity has no connections with the limit of computability. More complex algorithms can be closer to the limit but the result still can be computed. There is no point in referencing a theoretic conclusion about very general case. Because the cases we discuss are not very general.Brendan wrote:Your example is so trivial that it's entirely idiotic and pointless
Even while this case suits simple branch prediction algorithm of the processor, the compiler can cache code of three branches and then order the processor to execute all three branches simultaneously. This allows to forget about prediction at all. Meanwhile the processor will count branch usage and after small number of loops the branches 'a', '4' and 'q' will never be predicted (it is supposed the flat distribution of input probabilities is in play). Then we can see that the compiler wins again.Brendan wrote:Questions:Code: Select all
void foo(void) { int c; int x = 0; while( (c = getc()) != EOF) { switch(c) { case 'a': x /= 3; break; case '4': x += 4; break; case 'q': exit(0); default: x++; break; } if(x%7 == 0) { printf("Bork!\n"); } } }
- How often will the "if(x%6 == 0)" branch be predicted correctly by a traditional "ahead of time" compiler?
- How often will the "if(x%6 == 0)" branch be predicted correctly by a modern CPU's branch predictors?
- How often will the "if(x%6 == 0)" branch be predicted correctly by a modern "just in time" compiler?
- For a "just in time" compiler, would the overhead of attempting to correctly predict the "if(x%6 == 0)" branch cost far more than a misprediction would have?
- What about the "switch/case" branches?
Re: The Mill: a new low-power, high-performance CPU design
The CPU is provided with some hardwired algorithm. Just one simple algorithm. The compiler can provide any number of algorithms, including the simplest. What is more flexible solution?Rusky wrote:Only the CPU knows exactly which addresses are being loaded at a given moment.
It is more simple if we think about one managing entity (the compiler) instead of two (with processor). Both entities can contain some algorithms, but only one can contain many algorithms, including complex ones. Why we should be bothered with another entity? What for is this extra complexity?Rusky wrote:There's absolutely no reason to make this an all-or-nothing contest- the best outcome is often when the CPU and compiler cooperate
Re: The Mill: a new low-power, high-performance CPU design
What other algorithm are you possibly going to use for detecting aliasing stores? And for that matter, how would you express the one I described in the compiler? It's a hardware thing.
You know what? Show us the code a compiler would output if it did branch prediction itself. Give us an example that's better than the CPU doing it.
You know what? Show us the code a compiler would output if it did branch prediction itself. Give us an example that's better than the CPU doing it.