Page 2 of 2

Re: My BoxOn PC

Posted: Sat Sep 13, 2014 2:04 am
by Octocontrabass
embryo wrote:You can read about PADDQ.
CPUID Feature Flag: SSE2

:|

Re: My BoxOn PC

Posted: Mon Sep 15, 2014 1:00 am
by embryo
Octocontrabass wrote:
embryo wrote:You can read about PADDQ.
CPUID Feature Flag: SSE2
A bit of previous messages:
embryo wrote:Your message was about needless MMX registers ...
embryo wrote:I suppose that many OSs use MMX or even more of the late x86 functionality
And one more thing - use of MMX registers allows me to have no need in saving and restoring 128 and/or 256 bit registers during task switch, but still have opportunity to work with 64-bit integers.

But if the talk is about MMX only solutions, then yes, PADDQ requires more than MMX.

Re: My BoxOn PC

Posted: Mon Sep 15, 2014 2:03 am
by Combuster
You complained about MMX support. MMX support is not going to fix your code, because it's actually SSE2.

You never proved there was a compiler that did MMX with no option to disable it. Based on the rest of the arguments, that compiler has been created by your hand - possibly for the sole purpose of this argument, and possibly not even implemented. It certainly has not gained any acceptance beyond you, and it certainly is specific to one processor by it's design: the pentium 4.

The first AMD with SSE2 was the athlon64, and Intel had to follow soon after, leaving the Pentium 4 in the middle. In the later cases, 64-bit mode provides you all 64-bit operations on all general-purpose registers. The pentium4 doesn't support quadword multiplies and divides and needs to write them as a a series of MACs for multiply, and a full implementation of long division, and this wouldn't be any different from their 386 implementations. Before SSE2, adds and subs could trivially be generated as a add-adc and sub-sbb pairs instead, and before MMX similar pairs do so for the binary operators. shrd and shld provide 64-bit shifts without either MMX and SSE, and haven't seen any vector replacements. At any rate, full 64-bit maths have never been possible on an MMX basis.

Summarized:
multiply, divide, shift: have to be implemented 386 style regardless.
add, subtract: can be implemented with SSE2, but might not be optimal when used as MOVQ, PADDQ, MOVQ (3 instructions) over ADD, ADC (386 style, 2 instructions)
and, or, xor: can be implemented with MMX, but also have an 1:1 mapping to 386 versions.
All of the above have native x86_64 equivalents, where deferring to the MMX registers only reduces your instruction possibilities and only complicates matters.

Basically, your hypothetical compiler builds code that makes sense exclusively on a pentium 4, which means you were blaming the OP, and indirectly, the codebase it derived from, for doing a really bad job at not being able to emulate one specific CPU it never intended to emulate. Or more specifically, you blamed the OP for your "mistakes" in your design choices, pretending them to be valid for everyone. You obviously pushed yourself in an sufficient edge case that this is not relevant.
You also gave the OP a sufficiently uninformed request that, even when fulfilled, would not fix your problem.

I consider that bad manners.