Page 1 of 1

GCC overhead?

Posted: Fri Jan 16, 2009 9:43 am
by CPLH
Hello all, I have been an assembly programmer for some time.. however I want to try out gcc.. mainly for its optimizations and understandable code. Recently I posted, and found that apparently there are no assemblers that can optimize code in the way that gcc can.

I am trying to test out how small gcc can make code.. In order to do that I have made the following sample code:

Code: Select all

void _start()
{
   __asm__("sti");
}
As you can see, this code it extremely simple. "_start" is used directly by the linker and __asm__ should be directly compiled, so this code requires no libraries at all.
In assembly, this code in binary would only take up 1 byte. In order to avoid problems with symbolic information, I am trying to compile this code into raw binary.
So far, I tried compiling it with this:

Code: Select all

gcc -c test.c -o test.o -ansi -pedantic -Wall -Wextra -Werror -nostdlib -nodefaultlibs -Os -fomit-frame-pointer -finline-functions -fdata-sections -ffunction-sections
ld test.o -o test -Ttext 0 -Tdata 0 -Os --gc-sections -s
nostdlib and nodefaultlibs removes any libraries that gcc loves to attach.
Os optimizes for size.
fomit-frame-pointer removes frame pointers from all functions if they are not needed.
finline-functions tries to optimize functions as if they were macros, if possible.
fdata-sections provides extra information to the linker so that the linker can later remove any extra useless data later.
ffunction-sections provides extra information to the linker so that the linker can later remove any extra useless functions later.
Ttext sets the starting area for code, Tdata sets the starting area for data.
gc-sections removes all useless data provided by fdata-sections and ffunction-sections.
s strips any remaining symbolic information.

Well apparently I'm missing something because the result is a binary file with the size of 4304 bytes..... I don't know why....
And here is the interesting part:
You can get gcc to use ld automatically.. You can pass some types of information to the linker through "-Wl," ...I tried that:

Code: Select all

gcc -c test.c -o test -ansi -pedantic -Wall -Wextra -Werror -nostdlib -nodefaultlibs -Os -fomit-frame-pointer -finline-functions -fdata-sections -ffunction-sections -Wl,-Os -Wl,--gc-sections -Wl,-s
....and I got 715 bytes! :?

I am guessing that the reason behind this is because of either gcc sending extra information to the linker with the second method,
or
I am incorrectly using the first method as gcc is creating an executable and afterwards I try to "link" the executable.

I would really like someone to tell me what I am doing wrong. I originally left for assembly because I didn't understand what is this additional overhead, and I couldn't get rid of it. If I can get rid of most, if not all of the overhead, I will probably switch to programming in c.

Thank you,
Veniamin

Re: GCC overhead?

Posted: Fri Jan 16, 2009 12:21 pm
by bewing
Generally, a C program starts with main().
But the OS does not jump directly to main().
There is some initialization code that is requred by an OS for a standalone program. Sometimes it is compiled in with a label of __init (in asm). The label changes with OS. Even the name that people use to refer to this initialization code changes with OS. The compiler always adds this __init code to every C program that does not contain a pre-defined __init entrypoint.
The code often sets up filedescriptors 0, 1, and 2, and does some form of a malloc (or sbrk) to allocate and zero out the bss memory area. It does any "housekeeping" that is required before a typical C program starts execution. It also contains the __exit function code that cleans up malloc spaces and filedescriptors after program termination, and reports the exit status back to the OS.

The optimized code that the compiler generates is buried inside the output file -- starting at _main.
To get the benefits of that optimization, you need to compile your code, then disassemble the output file, and then edit that output file by hand to get rid of any extra stupid stuff (the output may contain things like "pure" and "impure" data areas). That gets you an optimized ASM file. It is not perfect, though. You can often find ways to tighten up the resulting code (either for speed or for size or for readability) if you become very experienced at ASM coding.

Re: GCC overhead?

Posted: Fri Jan 16, 2009 1:10 pm
by CPLH
From what I understand, the linker first refers to _start, which does the house keeping, then goes to main. I know this because I have linked nasm code before that had to start with "_start". Here I rewrite _start, so there should be no house keeping.
The code shouldn't be able to do any "malloc"s in the first place as from what I understand, a malloc requires the standard library, which I removed.
Is there really no way to strip everything off without resorting to assembly?

Re: GCC overhead?

Posted: Fri Jan 16, 2009 2:44 pm
by CPLH
I figured it out. In based on the code in http://osdever.net/tutorials/brunmar/tutorial_03.php, I did the following:

Code: Select all

gcc -ffreestanding -fomit-frame-pointer -c main.c -o main.o -Os
ld -Ttext 0 -o kernel.o main.o -Os
ld -i -Ttext 0 -o kernel.o main.o -Os
objcopy -R .note -R .comment -S -O binary kernel.o kernel.bin
Output: 2 bytes ..disassembling it gave me sti, and ret. But that's okay.. if I change how it "ends" the "ret" goes away. :)