SpyderTL wrote:I think it would make more sense to write the instructions in a separate location, and then "notify" the CPU that the code at that address (and size) should be run using the data at another address (and size), and let the CPU schedule the work to be done during idle time.
Then the whole process will be hidden from a developer. It will be split into some distantly located parts without easy way to look at it as at a whole thing. Also the section between
long and
start instructions can be easily marked with indentation.
SpyderTL wrote:But, if this approach were to give a significant performance improvement, say 10% or more, then this would have to be brought to Intel or AMD
I suppose it wouldn't give any improvement until smart compilers will use such feature extensively. But first we need those smart compilers. I have no big picture of the compiler area, but as far as I know it is not a very easy task for contemporary compilers to catch important parts of a program and to implement them in a way, that is efficient enough from memory interaction or computation resources allocation point of view. But may be I am wrong an somebody can point to a compiler, that is able to generate efficient code for simple matrix multiplication, comparable to the best result from
here?
SpyderTL wrote:You may, however, want to talk to the Mill Computing guys, as they are discussing a completely new CPU architecture. Not sure how far long they are, but I think they are still in the design/discussion phase...
http://millcomputing.com/forum/the-mill/
I had read about the Mill. But it is a finished design and most likely won't be changed significantly. Also I understand, that my proposal is not very mature and requires more details and discussions.