Page 1 of 1

GCC/AS bug when compiling for x86-64?

Posted: Thu Apr 17, 2008 8:29 pm
by Zenith
I'm not sure if OS Development is the right forum to put this in, but it is pretty OSdev related...

I've been porting my kernel to x86-64 from x86 (better earlier than later), and I've run into a strange problem with my scrolling text function. Once scrolling is triggered, in the x86-64 kernel only, an invalid opcode occurs. In x86, it scrolls normally.

Here's the non-working part of the scrolling code, which shifts each line to the previous one:

Code: Select all

	if (text_ypos >= TEXT_ROWS)
	{
		// Non-working code:
		for (i = 0; i < TEXT_NUMOFCHARS - TEXT_COLUMNS; i++)
		{
			text_vidmem[i] = text_vidmem[i + TEXT_COLUMNS];
		}
		(more working code)
Remember that this code does work in the x86 kernel, that text_vidmem is an array of uint16_ts which starts at 0xB8000, and that the macros have proper values (TEXT_NUMOFCHARS = TEXT_COLUMNS*TEXT_ROWS, TEXT_COLUMNS = 80, TEXT_ROWS = 25)

Stranger still, when I finally gave up and used inline assembly instead, it actually worked (for both kernels)!

Code: Select all

	if (text_ypos >= TEXT_ROWS)
	{
		asm (
				"mov %0, %%esi;"
				"mov %1, %%edi;"
				"mov %2, %%ecx;"
				"rep movsw;"
				:
				: "n"(TEXT_MEMADDR + (TEXT_COLUMNS * 2)), "n"(TEXT_MEMADDR), "n"(TEXT_NUMOFCHARS - TEXT_COLUMNS)
				: "esi", "edi", "ecx"
		);
		(working fine with this)
I think the weird part is that the exception caused by the original code is an 'invalid opcode' at somewhere in my kernel's virtual address (0xFFFFFFFFC01*****) and it does occur within this for statement, instead of a more common one such as a GPF or a page fault.

Is it my code, or is something in my GNU toolchain messing up?

Posted: Fri Apr 18, 2008 1:15 am
by JamesM
objdump it and have a look?

Posted: Fri Apr 18, 2008 1:37 am
by zaleschiemilgabriel
What do you mean by "invalid opcode"? An invalid instruction opcode in the binary? If so, I think it's obvious: A compiler should NEVER generate an invalid opcode, unless you expect it to (if you use inline assembly and instruct it to generate that opcode).

Posted: Fri Apr 18, 2008 4:36 am
by Combuster
Are you actually using a 64-bit compiler?

Posted: Fri Apr 18, 2008 3:06 pm
by Zenith
Never mind, it doesn't seem like a bug: it's just GCC optimizing a little too much...

Combuster: Yes, I am using an x86_64-pc-elf toolchain ( GCC 4.3.0, binutils 2.18 ).

zaleschiemilgabriel: Well, the opcode itself is valid, its just the processor generates a #UD when executing the instruction.

It's interesting though, what x86_64-pc-elf-objdump -dS shows.

This is the line that generates the #UD exception, according to my fault handler:

Code: Select all

ffffffffc0107d00:	66 41 0f 6f 04 11    	movdqa (%r9,%rdx,1),%xmm0
Looking at the Intel Manuals, it says that a #UD is generated for MOVDQA in 64-bit mode when either CR0.EM = 1, CR4.OSFXSR = 0, CPUID.01H:EDX.SSE2 = 0, or if the LOCK prefix is used. Looking into those now...

Well, I think GCC/AS is just thinking that the opcode is fine to use because it assumes that the environment is set up properly so that it can use this optimization.

So who should we blame? Myself, for writing such 'horrible' code, or GCC/AS for making such assumptions?

(The question's pretty one-sided :wink:)

Posted: Fri Apr 18, 2008 3:12 pm
by bluecode
karekare0 wrote:So who should we blame? Myself, for writing such 'horrible' code, or GCC/AS for making such assumptions?
Yourself for not using the correct gcc switches: -mno-sse is what you seek. :)

Posted: Fri Apr 18, 2008 3:19 pm
by Zenith
Yeah, you're right - but you have to admit, all x86-64 processors should support at least SSE2, so I'd think GCC is safe making this assumption.

Doing CR4.OSFXSR = 1 solved the problem!

Posted: Sat Apr 19, 2008 1:45 am
by bluecode
karekare0 wrote:Doing CR4.OSFXSR = 1 solved the problem!
But remember to save/restore/initialise the state of the SSE registers (which might not be what you want within your kernel) or sometime things silently go b00m.

Posted: Sat Apr 19, 2008 8:46 am
by Zenith
Already done :) . It's FXSAVE and FXRSTOR and a 512-byte memory area, right? I had to add the 'q' suffix so GAS compiles it to use the promoted-operand map.

Thanks for the help!