Page 1 of 1

Linear algebra using shows different results on different m

Posted: Sun Sep 18, 2022 5:20 am
by devc1
My simple cubic bezier drawing function shows different results on different machines.
For example :
- on my SSE core 2 duo laptop, QEMU and (AVX) VirtualBox it shows the right result.
- on (AVX) VMWare and My main computer (AVX) it shows a straight line on the top of the screen.

Is there something or some SSE,AVX initialization that I miss, or it is something with my code ?
- At startup I set MXCSR to 0x1F80 (Default value), do I need to do the AVX one on AVX machine (vldmxcsr) instead of ldmxcsr ?
(SSE) this function is from my invention,

Code: Select all

_SSE_ComputeBezier: 
 ; for(register UINT k = 1;k < NumCordinates;k++) 
     mov r8, 1 
     mov r9, rdx 
     cvtsi2ss xmm1, r8 
     subss xmm1, xmm2 
     movd eax, xmm1
     mov r11, rax 
     shl r11, 32 
     or rax, r11 
     movq xmm1, rax 
  
     movlhps xmm1, xmm1 
  
     movd eax, xmm2 
     mov r11, rax 
     shl r11, 32 
     or rax, r11 
     movq xmm2, rax 
     movlhps xmm2, xmm2 
  
 .loop0: 
     cmp r8, rdx 
     je .Exit 
     ; for(register UINT i = 0;i<NumCordinates - k;i++) 
     inc r8 
     dec r9 
     xor r11, r11 
     mov r10, rcx 
     .loop1: 
         cmp r11, r9 
         jae .loop0 
         ; beta[i]:XMM0 = (1 - percent) * beta[i] + percent * beta[i + 1]; 
  
         ; XMM0 = (1 - percent) * beta[i] 
         ; XMM0 = XMM1 * XMM3 
         movaps xmm0, xmm1 
         movups xmm3, [r10] 
         mulps xmm0, xmm3 
         ; XMM3 = percent * beta[i + 1] 
         ; XMM3 = XMM2 * XMM4 
         movaps xmm3, xmm2 
         movups xmm4, [r10 + 4] 
         mulps xmm3, xmm4 
         ; XMM0 = XMM0 + XMM3 
         addps xmm0, xmm3 
         movups [r10], xmm0 
         add r10, 0x10 
         add r11, 4 
         jmp .loop1 
 .Exit: 
     movups xmm0, [rcx] 
     cvtss2si rax, xmm0 
     ret
Proper result in SSE QEMU, AVX VirtualBox, SSE Laptop :
Image

Wrong result in AVX VMWARE, AVX Host Computer :
Image
Devc1, got banned for a few weeks :(

Re: Linear algebra using shows different results on differe

Posted: Sun Sep 18, 2022 1:07 pm
by nullplan
What steps have you taken to debug the issue? This is calling out for some exploratory printing at different steps. That would also immediately tip you off if your SSE initialization is inadequate. However, I do think you only need to initialize the AVX unit if you enable it.

Re: Linear algebra using shows different results on differe

Posted: Sun Sep 18, 2022 2:23 pm
by devc1
I discovered that running QEMU with UEFI on an 1920p resolution shows the same problem, and QEMU has no AVX. How do you initialize SSE ?
Debugging : I did not take any steps, just tried the OS on different machines.

Re: Linear algebra using shows different results on differe

Posted: Sun Sep 18, 2022 3:14 pm
by Octocontrabass
You should do some more debugging. What values are you passing to this function?

Re: Linear algebra using shows different results on differe

Posted: Sun Sep 18, 2022 3:32 pm
by devc1
This is the declared functions :

Code: Select all

extern UINT64 __fastcall _SSE_ComputeBezier(float* beta, UINT NumCordinates, float percent);
This is how it is called (currently) :

Code: Select all

UINT XOff = 200;
	UINT YOff = 300;
	float XCords[] = {0, 50, 100, 150};
	float YCords[] = {0, 50, -50, 0};
	float betabuffer[0x10] = {0};
	float IncValue = 0.1;
	float X0 = GetBezierPoint(XCords, betabuffer, 4, 0.1), X1 = GetBezierPoint(XCords, betabuffer, 4, 0.2), Y0 = GetBezierPoint(YCords, betabuffer, 4, 0.1), Y1 = GetBezierPoint(YCords, betabuffer, 4, 0.2);
	double Distance = __sqrt(pow(X1 - X0, 2) + pow(Y1-Y0, 2));
	if(Distance > 2) {
		IncValue /= (Distance - 1);
	}

	// IncValue /= 10;

	SystemDebugPrint(L"Starting...");
	for(UINT c = 0;c<0x10000;c++) { 
	UINT64 LastX = XOff;
	UINT64 LastY = YOff;

	UINT64 X = 0;
	UINT64 Y = 0;
	

	for(float t = 0;t<=1;t+=IncValue) {
		X = XOff + GetBezierPoint(XCords, betabuffer, 4, t);
		Y = YOff + GetBezierPoint(YCords, betabuffer, 4, t);
		// if(X > LastX + 1 || X < LastX - 1 || Y > LastY + 1 || Y < LastY - 1) {
		// 	LineTo(LastX, LastY, X, Y, 0xFF0000);
		// } else {
			*(UINT32*)(InitData.fb->FrameBufferBase + ((UINT64)X << 2) + ((UINT64)Y * InitData.fb->Pitch)) = 0xFF0000;
		// }
		LastX = X;
		LastY = Y;
	}
	}
SystemDebugPrint(L"All done");
GetBezierPoint() :

Code: Select all

UINT64 __fastcall GetBezierPoint(float* cordinates, float* beta, UINT8 NumCordinates, float percent){
	memcpy(beta, cordinates, NumCordinates << 2);
	if(ExtensionLevel == EXTENSION_LEVEL_SSE) {
		return _SSE_ComputeBezier(beta, NumCordinates, percent);
	} else if(ExtensionLevel == EXTENSION_LEVEL_AVX) {
		return _AVX_ComputeBezier(beta, NumCordinates, percent);
	}
	return 0;
	// _SSE_BezierCopyCords(beta, cordinates, NumCordinates);
	// return _SSE_ComputeBezier(beta, NumCordinates, percent);
}

I do not initialize MXCSR, I just reset it to 0x1F80. I don't know anything about that.