You couldn't build a 1024 bit adder with appreciably better performance. Never mind multipliers or dividers - heck, division is already slow. At the scaling rate for division that current x86s manage (and most RISC machines are worse!), that division is gonna take you 1032 cycles. Multiplication will probably be 100.
I guess you never did calculate !123456 or other similar things (or math distributed computing projects). Yo don't have to use such big registers - normally using 32 or 64 bit regsters is sufficient.
You cannot build arithmetic structures that big. You're talking something where the sheer size is such that a signal would take a quarter of a clock cycle to traverse it - assuming it was travelling in the metal interconnect, not through independent transistors.
So? even if it is 2 clocks cycles, is it really that big? try doing this kind of calculations on 64bit or even 256bit - I bet that it wll take way much more cycles than simple mov, add (not even speaking about mul)
But your complaint has nothing to do with x86, and everything to do with money. Gee whiz, I wonder why these processors are so relatively slow? Is it perhaps because I'm comparing them with chips 10x the price?
Fastest tilera chips you can buy for ~400$ (without mobo), fastest xeon - ~10k$. Which one is cheaper? As for speed - when calculating single threaded apps, then no doubt - xeon will be faster, but TILERA are mainly targeting web servers, where we can relatively easily parallelize them. I don't know the price of niagara3.
As for GPUs: their performance is highly code dependent, like Cell. Processors can be placed on a an axis of generality vs efficiency, with the normal CPU at one end, Cell SPEs in the middle, and GPUs occupying the other. GPUs are moving left, sure, but there are some things they're still really poor at - a great example is anything which involves heavy branching.
If intel won't speedup their cpu's by the factor of 4, then they will lose market (actually not exactly - still no gpl-ed fully workable cuda drivers; proprietary software will take decades before writing for really parallel environments (multicore cpus, gpu)). GPU's lately are more and more like cpu's - I bet that in one or two decades their branching insructions will be good enough to ditch out CPUs (if current trends will be preserved).
As for performance... nVIDIA are currently out in the lead if you compare like for like (Single GPU device, single precision). In fact, nVIDIA's best device is holding the same performance as ATIs best, despite ATIs being a dual GPU board.
Whatis single precision? it is utterly worthless outside graphics. In HPC most of time it is desirable to use at least double precision (wether predictions, anyone?), and in this nvidia is almost worthless. ATI has also much better string operations (cryptography...).But again, writing for ATI is almost like hell, while nvidia has nice cuda compiler, with support for C and Fortran. And it is free (like beer) unlike ATI's compilers. And because of taht nvidia will have much bigger marketshare, and it will continue to grow.