Link time optimization with cross compiler?

yr · Post by yr » Sun Mar 20, 2016 9:20 am

Is it possible to use link time optimization when building a kernel with a cross compiler?

I'm using gcc 5.3.0, with target i686-elf. Just adding the -flto option (to both the compile and link flags) results in the following error at link time:

Code: Select all

i686-elf-g++ -flto -ffreestanding -fno-rtti -fno-exceptions -O2 -Wall -std=c++14 -pedantic -nostdlib -T bootstrap/link_bootstrap.ld -o bootstrap/bootstrap.elf bootstrap/kernel_loader.o [...]
/var/folders/qn/fm_xw8gn63z7_713508rt4nw0000gn/T//ccWvoQ3s.s: Assembler messages:
/var/folders/qn/fm_xw8gn63z7_713508rt4nw0000gn/T//ccWvoQ3s.s:249: Error: operand type mismatch for `mov'
lto-wrapper: fatal error: i686-elf-g++ returned 1 exit status
compilation terminated.
/usr/local/cross/lib/gcc/i686-elf/5.3.0/../../../../i686-elf/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status

Without the -flto flag, there's no error.

Nable · Post by **Nable** » Sun Mar 20, 2016 9:57 am

It's definitely possible (LTO helps me very much with AVR and ARM targets) but when you build cross-compiler on your own, some things may go wrong. What flags did you use while ./configure'ing binutils and gcc?

yr · Post by yr » Sun Mar 20, 2016 10:35 am

Here's the output from gcc -v:

Code: Select all

~/os$ i686-elf-g++ -v
Using built-in specs.
COLLECT_GCC=i686-elf-g++
COLLECT_LTO_WRAPPER=/usr/local/cross/libexec/gcc/i686-elf/5.3.0/lto-wrapper
Target: i686-elf
Configured with: ../configure --prefix=/usr/local/cross --with-gmp=/usr/local --with-mpc=/usr/local --with-mpfr=/usr/local --target=i686-elf --enable-languages=c,c++ --without-headers --disable-nls
Thread model: single
gcc version 5.3.0 (GCC)

For binutils (v2.25.1) I just used the options from the wiki:

Code: Select all

../configure --prefix=/usr/local/cross --target=i686-elf --enable-multilib --disable-nls --disable-werror

Nable · Post by **Nable** » Sun Mar 20, 2016 3:22 pm

I think one should specify --enable-lto or --enable-languages=c,c++,lto (although documentation states that if you omit both flags, LTO is enabled by default).

Here's the output of cross-compiler from Debian package:

Code: Select all

$ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-none-eabi/4.9.3/lto-wrapper
Target: arm-none-eabi
Configured with: ../src/configure --build=x86_64-linux-gnu --prefix=/usr --includedir='/usr/lib/include' --mandir='/usr/lib/share/man' --infodir='/usr/lib/share/info' --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir='/usr/lib/lib/x86_64-linux-gnu' --libexecdir='/usr/lib/lib/x86_64-linux-gnu' --disable-maintainer-mode --disable-dependency-tracking --mandir=/usr/share/man --enable-languages=c,c++,lto --enable-multilib --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --build=x86_64-linux-gnu --target=arm-none-eabi --with-system-zlib --with-gnu-as --with-gnu-ld --with-pkgversion=15:4.9.3+svn231177-1 --without-included-gettext --prefix=/usr/lib --infodir=/usr/share/doc/gcc-arm-none-eabi/info --htmldir=/usr/share/doc/gcc-arm-none-eabi/html --pdfdir=/usr/share/doc/gcc-arm-none-eabi/pdf --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --disable-libstdc++-v3 --host=x86_64-linux-gnu --with-headers=no --without-newlib --with-multilib-list=armv6-m,armv7-m,armv7e-m,armv7-r CFLAGS='-g -O2 -fstack-protector-strong' CPPFLAGS=-D_FORTIFY_SOURCE=2 CXXFLAGS='-g -O2 -fstack-protector-strong' FCFLAGS='-g -O2 -fstack-protector-strong' FFLAGS='-g -O2 -fstack-protector-strong' GCJFLAGS='-g -O2 -fstack-protector-strong' LDFLAGS=-Wl,-z,relro OBJCFLAGS='-g -O2 -fstack-protector-strong' OBJCXXFLAGS='-g -O2 -fstack-protector-strong' INHIBIT_LIBC_CFLAGS=-DUSE_TM_CLONE_REGISTRY=0 AR_FOR_TARGET=arm-none-eabi-ar AS_FOR_TARGET=arm-none-eabi-as LD_FOR_TARGET=arm-none-eabi-ld NM_FOR_TARGET=arm-none-eabi-nm OBJDUMP_FOR_TARGET=arm-none-eabi-objdump RANLIB_FOR_TARGET=arm-none-eabi-ranlib READELF_FOR_TARGET=arm-none-eabi-readelf STRIP_FOR_TARGET=arm-none-eabi-strip
Thread model: single
gcc version 4.9.3 20150529 (prerelease) (15:4.9.3+svn231177-1)

yr · Post by yr » Sun Mar 20, 2016 5:59 pm

Thanks. I tried it out, but it doesn't seem to make a difference. However, I've managed to narrow down the issue to one function which uses inline assembly and seems to cause the error:

Code: Select all

void* memsetw( void* ptr, uint16_t val, size_t num )
{
	__asm__ __volatile__ (
			"mov %0, %%edi \n\t"
			"mov %1, %%eax \n\t"
			"mov %2, %%ecx \n\t"
			"rep stosw"
			:
			: "g"( ptr ), "g"( val ), "g"( num )
			: "edi", "eax", "ecx", "memory" );
	return ptr;
}

Either changing val to be of type int, or changing the operand constraint from "g" to "m" seems to resolve the error. Not sure why though...

xenos · Post by **xenos** » Mon Mar 21, 2016 2:19 am

This inline assembly is indeed a problem. With the g constraint, the compiler may choose any general purpose register (in addition to memory or immediate operands), including the registers you clobber using the mov instructions. Instead of using mov to fill each register, use the correct constraints to fill each register (a, c, D).

yr · Post by yr » Mon Mar 21, 2016 8:48 pm

Thanks - that's a good suggestion. So the function now becomes:

Code: Select all

void* memsetw( void* ptr, uint16_t val, size_t num )
{
	__asm__ __volatile__ (
			"rep stosw"
			:
			: "D"( ptr ), "a"( val ), "c"( num )
			: "memory" );
	return ptr;
}

However, I still don't understand how the previous version generated an operand type mismatch, as mov should be able to handle anything allowed by the "g" constraint. Can you please explain?

xenos · Post by **xenos** » Tue Mar 22, 2016 1:22 am

To be sure one would have to check the temporary assembler file that is generated, but I would guess that gcc has chosen one of the three registers to be the same as on the other side of the mov instruction (which is perfectly legal by the constraint, since any general purpose register is allowed), but you cannot do a mov from a register to itself.

Octocontrabass · Post by **Octocontrabass** » Tue Mar 22, 2016 1:36 am

XenOS wrote:but you cannot do a mov from a register to itself.

Are you sure? It sounds to me like the issue is one of operand sizes; you cannot mov from a 16-bit register to a 32-bit register.

jnc100 · Post by **jnc100** » Tue Mar 22, 2016 1:36 am

'g' allows gcc to choose the most appropriate available register (or indeed a memory location). It is probably choosing a 16-bit register for your uint16_t value, then the assembly becomes something like 'mov %%bx, %%eax', which is invalid.

If you were using an instruction other than a word-length one (i.e. stosw), you should probably also cast the val to a uint32_t first. This is because when you assign the 'val' variable to the "a" constraint, gcc does not have to use a 32-bit assign and the upper 16-bits of the register may be undefined.

Regards,
John.

edit: crossposted with Octocontrabass

xenos · Post by **xenos** » Tue Mar 22, 2016 6:50 am

Octocontrabass wrote:Are you sure? It sounds to me like the issue is one of operand sizes; you cannot mov from a 16-bit register to a 32-bit register.

Indeed, you're right. mov %ecx, %ecx would be valid, but with different register sizes it won't work.

Kazinsal · Post by **Kazinsal** » Tue Mar 22, 2016 9:40 am

The more explicit you are when writing inline assembly, the more likely things are going to go wrong when mixing operand sizes and similar concepts where the simplest instruction is not the one you necessarily want. See jnc100's example of "movl %bx, %eax" being invalid. You explicitly specified the MOV instruction, so GCC tries to fit in what it can and fails with an operand size mismatch when it can't. The correct instruction in this case would be MOVZX.

The correct paradigm, however, is to let the compiler generate the appropriate instruction sequences for handling the registers you specify in the constraints section.

yr · Post by yr » Tue Mar 22, 2016 8:10 pm

Thanks for the information. Using constraints to fill the registers sorts out this particular error. Given that the register values are changed by "rep stosw", however, should the constraints also include the "+" modifier (see below)? For example, if the function is inlined, seems like the compiler needs to know the side effects to generate correct code.

Code: Select all

void* memsetw( void* ptr, uint16_t val, size_t num )
{
	void* p = ptr;
	const unsigned long lval = val;
	__asm__ __volatile__ (
			"cld; rep stosw"
			: "+D"( p ), "+c"( num )
			: "a"( lval ) 
			: "cc", "memory" );
	return ptr;
}

OSDev.org

Link time optimization with cross compiler?

Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?

Re: Link time optimization with cross compiler?