Page 1 of 1
Getting the intermediate representation in gcc
Posted: Wed Dec 08, 2010 3:21 pm
by Srowen
Is possible to get an intermediate representation of my code compiled with one of the compiler of the gcc suite?
For example, if I have a program written in C and I compile it with gcc, I would like to get an intermediate representation create by the compiler, instead of the elf or object file as output.
I've read the documentation but there isn't a simple option to use to get what I want.
Thanks for the replies!
Re: Getting the intermediate representation in gcc
Posted: Wed Dec 08, 2010 3:59 pm
by JamesM
Hi,
AFAIK, there is no easy way to get the GIMPLE IR - this was due to a policy decision by the GCC developers.
Can anyone remember more about that than me?
James
Re: Getting the intermediate representation in gcc
Posted: Wed Dec 08, 2010 11:18 pm
by skyking
There are various debug options that allow you to produce debugging dumps after various passes, but I'd guess that you have no use for this (otherwise you should have known). The intermediate representation is internal to the compiler and has little practical use outside (beside debugging the compiler) AFAIK.
Maybe you need something else, but without knowledge about what you are trying to do it's hard to tell...
Re: Getting the intermediate representation in gcc
Posted: Thu Dec 09, 2010 2:38 am
by xenos
By "intermediate" representation, do you mean the assembly code? You can obtain that by using the
-S compiler switch when using GCC or by using
objdump -d on an ELF / object file.
Re: Getting the intermediate representation in gcc
Posted: Thu Dec 09, 2010 3:36 am
by Solar
I remember having delved into this sometime around 2001/2002, when I was considering splitting up the GCC frontend and backend to come up with some kind of bytecode / virtual processor architecture. I queried mailing lists about this, probably even gcc-devel or somesuch.
As I was told back then, there is no command-line option or tool to get at the internal representation after the frontend is done. You would have to patch into the GCC sources themselves, and this was discouraged as this representation was considered internal, not too well documented, and subject to change without further notice.
The assembler source generated by the '-S' switch is a
backtranslation of that internal representation. Usually, the backend (assembler) gets passed the internal representation directly, which has already been processed by the compiler frontend beyond the point represented by the '-S' ASM source. I.e., even '-S' does not show you a "true intermediate".
That is what I remember from back then. I might remember wrongly, or things might have changed, so take it with a grain of salt.
Edit / PS: Checking up on the Gimple IR, I realized that my talk back then was about the RTL representation. I don't know if the Gimple stage wasn't implemented, or the people I talked to didn't know about it, or if it's worthless as an intermediate, or whatever. I guess you can scrap this whole post.
Re: Getting the intermediate representation in gcc
Posted: Thu Dec 09, 2010 8:07 am
by JamesM
Hi,
GCC has multiple internal representations - RTL, GIMPLE and another, annotated version of GIMPLE. These are internal to GCC and there is no advertised method of exporting it. The reasons for this is politics as mentioned earlier (stopping others from replacing parts of GCC with non-free software, in a nutshell).
GCC (cc1)'s backend outputs textual assembler. 'as' then takes this and assembles it. The -S switch you see with gcc just avoids the 'as' stage. It is not even slightly an intermediate form - it is very much assembler code only.
OP: have you looked into LLVM?
James
Re: Getting the intermediate representation in gcc
Posted: Thu Dec 09, 2010 8:42 am
by Solar
JamesM wrote:GCC (cc1)'s backend outputs textual assembler. 'as' then takes this and assembles it. The -S switch you see with gcc just avoids the 'as' stage. It is not even slightly an intermediate form - it is very much assembler code only.
That is about the only part of what I wrote above that I am actually sure of remembering correctly: That the output of "gcc -S" does not represent any state that is used in a "normal" compilation.
Searching...
Ah, I got it.
Here it states that the interface between language frontend and backend is the "tree" structure, and that documentation on it is incomplete. It also states that the RTL representation does not have all information about the program.
On the
GIMPLE page it states that "The C and C++ front ends currently convert directly from front end trees to GIMPLE, and hand that off to the back end rather than first converting to GENERIC".
Generally speaking,
9 - Passes and Files of the Compiler is probably the best starting point. It's full of "TODO" remarks... no, GCC does not cater to those who want to get at it's intermediates.
Re: Getting the intermediate representation in gcc
Posted: Sat Dec 11, 2010 4:02 pm
by Srowen
JamesM wrote:
OP: have you looked into LLVM?
LLVM seems to be interesting... I read on their site that there is a front-end for java but it is incomplete and there is no documentation. Have you tried it?
Re: Getting the intermediate representation in gcc
Posted: Sat Dec 11, 2010 4:33 pm
by fronty
Solar wrote:Ah, I got it.
Here it states that the interface between language frontend and backend is the "tree" structure, and that documentation on it is incomplete. It also states that the RTL representation does not have all information about the program.
It is normal that front end generates intermediate representation which can be in a tree form and back end generates target language which can be assembly language. IMO your quote doesn't prove that assembly isn't used in normal compilation process.
Re: Getting the intermediate representation in gcc
Posted: Sun Dec 12, 2010 12:21 am
by eddyb
Solar wrote:JamesM wrote:GCC (cc1)'s backend outputs textual assembler. 'as' then takes this and assembles it. The -S switch you see with gcc just avoids the 'as' stage. It is not even slightly an intermediate form - it is very much assembler code only.
That is about the only part of what I wrote above that I am actually sure of remembering correctly: That the output of "gcc -S" does not represent any state that is used in a "normal" compilation.
Searching...
Ah, I got it.
Here it states that the interface between language frontend and backend is the "tree" structure, and that documentation on it is incomplete. It also states that the RTL representation does not have all information about the program.
On the
GIMPLE page it states that "The C and C++ front ends currently convert directly from front end trees to GIMPLE, and hand that off to the back end rather than first converting to GENERIC".
Generally speaking,
9 - Passes and Files of the Compiler is probably the best starting point. It's full of "TODO" remarks... no, GCC does not cater to those who want to get at it's intermediates.
This may be a dumb reply, but I saw "You can request to dump a C-like representation of the GIMPLE form with the flag -fdump-tree-gimple." on the GIMPLE page.
This is an output from your everyday forkbomb(couldn't come with a better example):
Code: Select all
// Original code
#include <unistd.h>
int main() {
while(1)fork();
return 0;
}
// GIMPLE output
main ()
{
int D.3461;
<D.2257>:
fork ();
goto <D.2257>;
D.3461 = 0;
return D.3461;
}