I wish to add another comment on this point:
rdos wrote:Rendering is an interesting aspect of graphics, and IMO, this is better done in main memory with one or more CPU-cores. That is a scalable solution, and one that is not dependent on who supplies the video card / accelerator.
Right, because GPUs are not at all scaling in performance significantly better than CPUs... and because my 5 year old GPU doesn't at all
still beat the pants off of my brand new CPU for graphics, whether using the cores or IGP on said CPU for rendering, in spite of the fact that that my CPU cost nearly twice as much and is built on a process with a feature size 1/9th that of my GPU (32nm vs 90nm, remember process sizes are linear (widths) while feature sizes are quadratic (areas))
And because you can't at all scale GPUs in the same way as you can CPUs (i.e. upgrade them, or scale out onto multiple cards, multi-chip cards, etc) - and scale them out more efficiently than you can CPUs at that (90% performance improvements from scaling out aren't at all that uncommon, though it is, of course, workload dependent). In fact, you can scale out GPUs more economically - because a motherboard with 2xPCI-E slots is far cheaper than one with 2xCPU slots (plus the price premium on the CPU).
rdos wrote:Ameise wrote:All that rather apocryphal story proves is that your ASM was faster than your C. That doesn't prove anything about C. I assure you that I could make an asm program perform slower than its C analog.
To be fair, it was the C compiler that couldn't do the 4-level lookup without loading the same selector four times. Something that was not needed in the assembler version, as the code knew it was the same selector. There where a couple of other issues as well where the C compiler produced slower code, but the inability to mix flat and segmented pointers in the C compiler was the main reason it lost big time.
Additionally, the GetStringMetrics function, which in the interface returns two values could not even be implemented in C as C cannot return more than one value. In order not to need two segment register loads in that function, I did a trick so C could return both values in a register pair, decoding it in the assembly stub.
So, in other words you're using a a poorly optimizing C compiler using an all but obsolete memory model. And the idea you can't return more than one value from a C function is laughable - you just need to pack the values into a structure, at which point how the structure is returned depends upon the compilers ABI (but again this seems hobbled by an obsolete memory model).