Link time optimization with cross compiler?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
yr
Member
Member
Posts: 31
Joined: Sat Mar 28, 2015 12:50 pm

Link time optimization with cross compiler?

Post by yr »

Is it possible to use link time optimization when building a kernel with a cross compiler?

I'm using gcc 5.3.0, with target i686-elf. Just adding the -flto option (to both the compile and link flags) results in the following error at link time:

Code: Select all

i686-elf-g++ -flto -ffreestanding -fno-rtti -fno-exceptions -O2 -Wall -std=c++14 -pedantic -nostdlib -T bootstrap/link_bootstrap.ld -o bootstrap/bootstrap.elf bootstrap/kernel_loader.o [...]
/var/folders/qn/fm_xw8gn63z7_713508rt4nw0000gn/T//ccWvoQ3s.s: Assembler messages:
/var/folders/qn/fm_xw8gn63z7_713508rt4nw0000gn/T//ccWvoQ3s.s:249: Error: operand type mismatch for `mov'
lto-wrapper: fatal error: i686-elf-g++ returned 1 exit status
compilation terminated.
/usr/local/cross/lib/gcc/i686-elf/5.3.0/../../../../i686-elf/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status
Without the -flto flag, there's no error.
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Link time optimization with cross compiler?

Post by Nable »

It's definitely possible (LTO helps me very much with AVR and ARM targets) but when you build cross-compiler on your own, some things may go wrong. What flags did you use while ./configure'ing binutils and gcc?
yr
Member
Member
Posts: 31
Joined: Sat Mar 28, 2015 12:50 pm

Re: Link time optimization with cross compiler?

Post by yr »

Here's the output from gcc -v:

Code: Select all

~/os$ i686-elf-g++ -v
Using built-in specs.
COLLECT_GCC=i686-elf-g++
COLLECT_LTO_WRAPPER=/usr/local/cross/libexec/gcc/i686-elf/5.3.0/lto-wrapper
Target: i686-elf
Configured with: ../configure --prefix=/usr/local/cross --with-gmp=/usr/local --with-mpc=/usr/local --with-mpfr=/usr/local --target=i686-elf --enable-languages=c,c++ --without-headers --disable-nls
Thread model: single
gcc version 5.3.0 (GCC)
For binutils (v2.25.1) I just used the options from the wiki:

Code: Select all

../configure --prefix=/usr/local/cross --target=i686-elf --enable-multilib --disable-nls --disable-werror
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Link time optimization with cross compiler?

Post by Nable »

I think one should specify --enable-lto or --enable-languages=c,c++,lto (although documentation states that if you omit both flags, LTO is enabled by default).

Here's the output of cross-compiler from Debian package:

Code: Select all

$ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-none-eabi/4.9.3/lto-wrapper
Target: arm-none-eabi
Configured with: ../src/configure --build=x86_64-linux-gnu --prefix=/usr --includedir='/usr/lib/include' --mandir='/usr/lib/share/man' --infodir='/usr/lib/share/info' --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir='/usr/lib/lib/x86_64-linux-gnu' --libexecdir='/usr/lib/lib/x86_64-linux-gnu' --disable-maintainer-mode --disable-dependency-tracking --mandir=/usr/share/man --enable-languages=c,c++,lto --enable-multilib --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --build=x86_64-linux-gnu --target=arm-none-eabi --with-system-zlib --with-gnu-as --with-gnu-ld --with-pkgversion=15:4.9.3+svn231177-1 --without-included-gettext --prefix=/usr/lib --infodir=/usr/share/doc/gcc-arm-none-eabi/info --htmldir=/usr/share/doc/gcc-arm-none-eabi/html --pdfdir=/usr/share/doc/gcc-arm-none-eabi/pdf --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --disable-libstdc++-v3 --host=x86_64-linux-gnu --with-headers=no --without-newlib --with-multilib-list=armv6-m,armv7-m,armv7e-m,armv7-r CFLAGS='-g -O2 -fstack-protector-strong' CPPFLAGS=-D_FORTIFY_SOURCE=2 CXXFLAGS='-g -O2 -fstack-protector-strong' FCFLAGS='-g -O2 -fstack-protector-strong' FFLAGS='-g -O2 -fstack-protector-strong' GCJFLAGS='-g -O2 -fstack-protector-strong' LDFLAGS=-Wl,-z,relro OBJCFLAGS='-g -O2 -fstack-protector-strong' OBJCXXFLAGS='-g -O2 -fstack-protector-strong' INHIBIT_LIBC_CFLAGS=-DUSE_TM_CLONE_REGISTRY=0 AR_FOR_TARGET=arm-none-eabi-ar AS_FOR_TARGET=arm-none-eabi-as LD_FOR_TARGET=arm-none-eabi-ld NM_FOR_TARGET=arm-none-eabi-nm OBJDUMP_FOR_TARGET=arm-none-eabi-objdump RANLIB_FOR_TARGET=arm-none-eabi-ranlib READELF_FOR_TARGET=arm-none-eabi-readelf STRIP_FOR_TARGET=arm-none-eabi-strip
Thread model: single
gcc version 4.9.3 20150529 (prerelease) (15:4.9.3+svn231177-1) 
yr
Member
Member
Posts: 31
Joined: Sat Mar 28, 2015 12:50 pm

Re: Link time optimization with cross compiler?

Post by yr »

Thanks. I tried it out, but it doesn't seem to make a difference. However, I've managed to narrow down the issue to one function which uses inline assembly and seems to cause the error:

Code: Select all

void* memsetw( void* ptr, uint16_t val, size_t num )
{
	__asm__ __volatile__ (
			"mov %0, %%edi \n\t"
			"mov %1, %%eax \n\t"
			"mov %2, %%ecx \n\t"
			"rep stosw"
			:
			: "g"( ptr ), "g"( val ), "g"( num )
			: "edi", "eax", "ecx", "memory" );
	return ptr;
}
Either changing val to be of type int, or changing the operand constraint from "g" to "m" seems to resolve the error. Not sure why though...
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Link time optimization with cross compiler?

Post by xenos »

This inline assembly is indeed a problem. With the g constraint, the compiler may choose any general purpose register (in addition to memory or immediate operands), including the registers you clobber using the mov instructions. Instead of using mov to fill each register, use the correct constraints to fill each register (a, c, D).
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
yr
Member
Member
Posts: 31
Joined: Sat Mar 28, 2015 12:50 pm

Re: Link time optimization with cross compiler?

Post by yr »

Thanks - that's a good suggestion. So the function now becomes:

Code: Select all

void* memsetw( void* ptr, uint16_t val, size_t num )
{
	__asm__ __volatile__ (
			"rep stosw"
			:
			: "D"( ptr ), "a"( val ), "c"( num )
			: "memory" );
	return ptr;
}
However, I still don't understand how the previous version generated an operand type mismatch, as mov should be able to handle anything allowed by the "g" constraint. Can you please explain?
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Link time optimization with cross compiler?

Post by xenos »

To be sure one would have to check the temporary assembler file that is generated, but I would guess that gcc has chosen one of the three registers to be the same as on the other side of the mov instruction (which is perfectly legal by the constraint, since any general purpose register is allowed), but you cannot do a mov from a register to itself.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
Octocontrabass
Member
Member
Posts: 5587
Joined: Mon Mar 25, 2013 7:01 pm

Re: Link time optimization with cross compiler?

Post by Octocontrabass »

XenOS wrote:but you cannot do a mov from a register to itself.
Are you sure? It sounds to me like the issue is one of operand sizes; you cannot mov from a 16-bit register to a 32-bit register.
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: Link time optimization with cross compiler?

Post by jnc100 »

'g' allows gcc to choose the most appropriate available register (or indeed a memory location). It is probably choosing a 16-bit register for your uint16_t value, then the assembly becomes something like 'mov %%bx, %%eax', which is invalid.

If you were using an instruction other than a word-length one (i.e. stosw), you should probably also cast the val to a uint32_t first. This is because when you assign the 'val' variable to the "a" constraint, gcc does not have to use a 32-bit assign and the upper 16-bits of the register may be undefined.

Regards,
John.

edit: crossposted with Octocontrabass
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Link time optimization with cross compiler?

Post by xenos »

Octocontrabass wrote:Are you sure? It sounds to me like the issue is one of operand sizes; you cannot mov from a 16-bit register to a 32-bit register.
Indeed, you're right. mov %ecx, %ecx would be valid, but with different register sizes it won't work.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
Kazinsal
Member
Member
Posts: 559
Joined: Wed Jul 13, 2011 7:38 pm
Libera.chat IRC: Kazinsal
Location: Vancouver
Contact:

Re: Link time optimization with cross compiler?

Post by Kazinsal »

The more explicit you are when writing inline assembly, the more likely things are going to go wrong when mixing operand sizes and similar concepts where the simplest instruction is not the one you necessarily want. See jnc100's example of "movl %bx, %eax" being invalid. You explicitly specified the MOV instruction, so GCC tries to fit in what it can and fails with an operand size mismatch when it can't. The correct instruction in this case would be MOVZX.

The correct paradigm, however, is to let the compiler generate the appropriate instruction sequences for handling the registers you specify in the constraints section.
yr
Member
Member
Posts: 31
Joined: Sat Mar 28, 2015 12:50 pm

Re: Link time optimization with cross compiler?

Post by yr »

Thanks for the information. Using constraints to fill the registers sorts out this particular error. Given that the register values are changed by "rep stosw", however, should the constraints also include the "+" modifier (see below)? For example, if the function is inlined, seems like the compiler needs to know the side effects to generate correct code.

Code: Select all

void* memsetw( void* ptr, uint16_t val, size_t num )
{
	void* p = ptr;
	const unsigned long lval = val;
	__asm__ __volatile__ (
			"cld; rep stosw"
			: "+D"( p ), "+c"( num )
			: "a"( lval ) 
			: "cc", "memory" );
	return ptr;
}
Post Reply