Linear algebra using shows different results on different m

devc1 · Post by **devc1** » Sun Sep 18, 2022 5:20 am

My simple cubic bezier drawing function shows different results on different machines.
For example :
- on my SSE core 2 duo laptop, QEMU and (AVX) VirtualBox it shows the right result.
- on (AVX) VMWare and My main computer (AVX) it shows a straight line on the top of the screen.

Is there something or some SSE,AVX initialization that I miss, or it is something with my code ?
- At startup I set MXCSR to 0x1F80 (Default value), do I need to do the AVX one on AVX machine (vldmxcsr) instead of ldmxcsr ?
(SSE) this function is from my invention,

Code: Select all

_SSE_ComputeBezier: 
 ; for(register UINT k = 1;k < NumCordinates;k++) 
     mov r8, 1 
     mov r9, rdx 
     cvtsi2ss xmm1, r8 
     subss xmm1, xmm2 
     movd eax, xmm1
     mov r11, rax 
     shl r11, 32 
     or rax, r11 
     movq xmm1, rax 
  
     movlhps xmm1, xmm1 
  
     movd eax, xmm2 
     mov r11, rax 
     shl r11, 32 
     or rax, r11 
     movq xmm2, rax 
     movlhps xmm2, xmm2 
  
 .loop0: 
     cmp r8, rdx 
     je .Exit 
     ; for(register UINT i = 0;i<NumCordinates - k;i++) 
     inc r8 
     dec r9 
     xor r11, r11 
     mov r10, rcx 
     .loop1: 
         cmp r11, r9 
         jae .loop0 
         ; beta[i]:XMM0 = (1 - percent) * beta[i] + percent * beta[i + 1]; 
  
         ; XMM0 = (1 - percent) * beta[i] 
         ; XMM0 = XMM1 * XMM3 
         movaps xmm0, xmm1 
         movups xmm3, [r10] 
         mulps xmm0, xmm3 
         ; XMM3 = percent * beta[i + 1] 
         ; XMM3 = XMM2 * XMM4 
         movaps xmm3, xmm2 
         movups xmm4, [r10 + 4] 
         mulps xmm3, xmm4 
         ; XMM0 = XMM0 + XMM3 
         addps xmm0, xmm3 
         movups [r10], xmm0 
         add r10, 0x10 
         add r11, 4 
         jmp .loop1 
 .Exit: 
     movups xmm0, [rcx] 
     cvtss2si rax, xmm0 
     ret

Proper result in SSE QEMU, AVX VirtualBox, SSE Laptop :

Wrong result in AVX VMWARE, AVX Host Computer :

Devc1, got banned for a few weeks

nullplan · Post by **nullplan** » Sun Sep 18, 2022 1:07 pm

What steps have you taken to debug the issue? This is calling out for some exploratory printing at different steps. That would also immediately tip you off if your SSE initialization is inadequate. However, I do think you only need to initialize the AVX unit if you enable it.

devc1 · Post by **devc1** » Sun Sep 18, 2022 2:23 pm

I discovered that running QEMU with UEFI on an 1920p resolution shows the same problem, and QEMU has no AVX. How do you initialize SSE ?
Debugging : I did not take any steps, just tried the OS on different machines.

Octocontrabass · Post by **Octocontrabass** » Sun Sep 18, 2022 3:14 pm

You should do some more debugging. What values are you passing to this function?

devc1 · Post by **devc1** » Sun Sep 18, 2022 3:32 pm

This is the declared functions :

Code: Select all

extern UINT64 __fastcall _SSE_ComputeBezier(float* beta, UINT NumCordinates, float percent);

This is how it is called (currently) :

Code: Select all

UINT XOff = 200;
	UINT YOff = 300;
	float XCords[] = {0, 50, 100, 150};
	float YCords[] = {0, 50, -50, 0};
	float betabuffer[0x10] = {0};
	float IncValue = 0.1;
	float X0 = GetBezierPoint(XCords, betabuffer, 4, 0.1), X1 = GetBezierPoint(XCords, betabuffer, 4, 0.2), Y0 = GetBezierPoint(YCords, betabuffer, 4, 0.1), Y1 = GetBezierPoint(YCords, betabuffer, 4, 0.2);
	double Distance = __sqrt(pow(X1 - X0, 2) + pow(Y1-Y0, 2));
	if(Distance > 2) {
		IncValue /= (Distance - 1);
	}

	// IncValue /= 10;

	SystemDebugPrint(L"Starting...");
	for(UINT c = 0;c<0x10000;c++) { 
	UINT64 LastX = XOff;
	UINT64 LastY = YOff;

	UINT64 X = 0;
	UINT64 Y = 0;
	

	for(float t = 0;t<=1;t+=IncValue) {
		X = XOff + GetBezierPoint(XCords, betabuffer, 4, t);
		Y = YOff + GetBezierPoint(YCords, betabuffer, 4, t);
		// if(X > LastX + 1 || X < LastX - 1 || Y > LastY + 1 || Y < LastY - 1) {
		// 	LineTo(LastX, LastY, X, Y, 0xFF0000);
		// } else {
			*(UINT32*)(InitData.fb->FrameBufferBase + ((UINT64)X << 2) + ((UINT64)Y * InitData.fb->Pitch)) = 0xFF0000;
		// }
		LastX = X;
		LastY = Y;
	}
	}
SystemDebugPrint(L"All done");

GetBezierPoint() :

Code: Select all

UINT64 __fastcall GetBezierPoint(float* cordinates, float* beta, UINT8 NumCordinates, float percent){
	memcpy(beta, cordinates, NumCordinates << 2);
	if(ExtensionLevel == EXTENSION_LEVEL_SSE) {
		return _SSE_ComputeBezier(beta, NumCordinates, percent);
	} else if(ExtensionLevel == EXTENSION_LEVEL_AVX) {
		return _AVX_ComputeBezier(beta, NumCordinates, percent);
	}
	return 0;
	// _SSE_BezierCopyCords(beta, cordinates, NumCordinates);
	// return _SSE_ComputeBezier(beta, NumCordinates, percent);
}

I do not initialize MXCSR, I just reset it to 0x1F80. I don't know anything about that.

OSDev.org

Linear algebra using shows different results on different m

Linear algebra using shows different results on different m

Re: Linear algebra using shows different results on differe

Re: Linear algebra using shows different results on differe

Re: Linear algebra using shows different results on differe

Re: Linear algebra using shows different results on differe