Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Is it possible to use link time optimization when building a kernel with a cross compiler?
I'm using gcc 5.3.0, with target i686-elf. Just adding the -flto option (to both the compile and link flags) results in the following error at link time:
It's definitely possible (LTO helps me very much with AVR and ARM targets) but when you build cross-compiler on your own, some things may go wrong. What flags did you use while ./configure'ing binutils and gcc?
I think one should specify --enable-lto or --enable-languages=c,c++,lto (although documentation states that if you omit both flags, LTO is enabled by default).
Here's the output of cross-compiler from Debian package:
Thanks. I tried it out, but it doesn't seem to make a difference. However, I've managed to narrow down the issue to one function which uses inline assembly and seems to cause the error:
This inline assembly is indeed a problem. With the g constraint, the compiler may choose any general purpose register (in addition to memory or immediate operands), including the registers you clobber using the mov instructions. Instead of using mov to fill each register, use the correct constraints to fill each register (a, c, D).
However, I still don't understand how the previous version generated an operand type mismatch, as mov should be able to handle anything allowed by the "g" constraint. Can you please explain?
To be sure one would have to check the temporary assembler file that is generated, but I would guess that gcc has chosen one of the three registers to be the same as on the other side of the mov instruction (which is perfectly legal by the constraint, since any general purpose register is allowed), but you cannot do a mov from a register to itself.
'g' allows gcc to choose the most appropriate available register (or indeed a memory location). It is probably choosing a 16-bit register for your uint16_t value, then the assembly becomes something like 'mov %%bx, %%eax', which is invalid.
If you were using an instruction other than a word-length one (i.e. stosw), you should probably also cast the val to a uint32_t first. This is because when you assign the 'val' variable to the "a" constraint, gcc does not have to use a 32-bit assign and the upper 16-bits of the register may be undefined.
The more explicit you are when writing inline assembly, the more likely things are going to go wrong when mixing operand sizes and similar concepts where the simplest instruction is not the one you necessarily want. See jnc100's example of "movl %bx, %eax" being invalid. You explicitly specified the MOV instruction, so GCC tries to fit in what it can and fails with an operand size mismatch when it can't. The correct instruction in this case would be MOVZX.
The correct paradigm, however, is to let the compiler generate the appropriate instruction sequences for handling the registers you specify in the constraints section.
Thanks for the information. Using constraints to fill the registers sorts out this particular error. Given that the register values are changed by "rep stosw", however, should the constraints also include the "+" modifier (see below)? For example, if the function is inlined, seems like the compiler needs to know the side effects to generate correct code.