Page 4 of 4

Re: please help AHHHH ld and gcc problem ?

Posted: Tue Jan 20, 2009 10:51 pm
by Sam111
Ya , I meant the 64 bit one.

Anyway I was wondering if you have Vista or something that could open docx.

I only have windows Xp and linux machines running.And I was wondering if it isn't to much trouble if
you could convert the docx into a doc file that would run on Office 2000 , 2003 ,...etc.
I know you mentioned that you had vista or something along those lines.

Code: Select all

Microsoft Portable Executable and Common Object File Format Specification
http://www.microsoft.com/whdc/system/pl ... Fdwn.mspx?
I believe this is the most uptodate version of the spec's for both 32 and 64 bit on windows machines.

Looking to see if their is any change between the 64 and 32 bit. (must be backwards compatible I would think)

Anyway about what you said about using the stack for all the varibles. That was cool thinking.
But is their any limit on the stack size that the OS allows you to reserve?
The only thing is that if you have alot of string data or varibles you are pushing the values of this stuff so it is going to take up a **** load of space. (i.e if you wanted to say Hello I am using the stack ! you would have to push each letter one by one on the stack) You cann't use references because they are not garrented to be fixed address.
This could get messy but I guess you could set a '\0' to delimit the different string and variables. Then
have a pointer to the begining of all your variables. Either way it is going to take up alot of stack space.
Cool Idea though!

I am curious since we went in depth with the PE/Coff file format for windows.
If the .obj files are in the same format as the PE .exe file maybe the only difference is that the obj files contain the symbol table or something.

Does the linker take the first obj file in the list and combined the other obj like...
obj1 .text obj2 .text obj3 .text and add to the relocation table, entries that obj .text depend on ,...etc etc down the list. I guess it has to check first for unresolved symbols in the other files?
unresolved symbols must be given a certain value that delimits them from resolved symbols?

So it basically just copies the text sections sequentially making sure to modify the relocation table entires for the objects it moved ( normal obj files are compiled so that the are at org 0 ). And Resolve external symbols probably stored in the symbol table (depending on where it copied the variable or function would determine it RVA to call the function). I would say that's all the linker does as well as set the entry point and set some of the header fields stuff ...etc etc

I figured out what the line number field was for in the PE Header. It was for debuging sometimes people want to strip debuging info into it's own file like stab or DWARF. But Line Number is just used for debuging as well.

Either way I am wondering since .obj are roff (relocatable object file format) OMF
I am wondering if RDOFF is the same thing as COFF/PE in windows just named differently to distinigush. Between the input obj's to the linker and the output PE .exe from the linker. I think both linker input is the the same as linker output when we are talking about PE/Coff win32 file format.

Re: please help AHHHH ld and gcc problem ?

Posted: Tue Jan 20, 2009 11:28 pm
by ru2aqare
Sam111 wrote:Ya , I meant the 64 bit one.

Anyway I was wondering if you have Vista or something that could open docx.

I only have windows Xp and linux machines running.And I was wondering if it isn't to much trouble if
you could convert the docx into a doc file that would run on Office 2000 , 2003 ,...etc.
I know you mentioned that you had vista or something along those lines.
Sorry, no Vista, no Office 2k7. I'm sure google will spit out pdf versions of this document. I used to have a pdf one (or did I convert it myself from doc? Don't remember.) Anyway, there is an extension for Office 2oo3 or earlier that enables you to open this new docx file format.
Sam111 wrote: Anyway about what you said about using the stack for all the varibles. That was cool thinking.
But is their any limit on the stack size that the OS allows you to reserve?
The PE header contains a field called initial and maximum stack commit size. The stack is allocated in pages, and the last page (the one that has the smallest virtual address) is a guard page. If the application touches this page, Windows raises a guard page violation exception, which in turn leads to the allocation of another page for the stack, and this new page becomes the guard page. This continues until the stack size reaches the maximum commit size. This is the maximum size of stack Windows lets you have. If you can't fit in this size, it reports a stack overflow exception. As for how much virtual memory can you reserve by fiddling with the value of this field, I have no idea.
Sam111 wrote: The only thing is that if you have alot of string data or varibles you are pushing the values of this stuff so it is going to take up a **** load of space. (i.e if you wanted to say Hello I am using the stack ! you would have to push each letter one by one on the stack) You cann't use references because they are not garrented to be fixed address.
In case of constants, not quite true. You can use the call-pop trick to retrieve the address of anything you can embed into code:

Code: Select all

; some code here
   nop
   nop 
   call skip_over_this_string ; the distance is known at assembly time, so no relocations are required
   db "some random string", 0
skip_over_this_string:
   pop eax ; loads address of instruction (in our case data) following the call instruction
   nop ; do something with it
Be aware that this screws with the processor's return address cache, and probably can cause some slowdown and cache misses and whatnot (I don't know the details, but there is another thread on this issue). It's fine if you use it only once or twice. If you want to use this construct repeatedly, do something like this:

Code: Select all

; some code here
   nop
   nop 
   call skip_over_this_string ; the distance is known at assembly time, so no relocations are required
   db "some random string", 0
skip_over_this_string:
   pop eax ; loads address of instruction (in our case data) following the call instruction
   lea ecx, [eax + length of string incl zero byte + continue - skip_over_string]
   ; adjust the return address to point to continue: label
   push ecx
   retn ; pairs with call from above, keeps CPU happy
continue:
   nop
Sam111 wrote: If the .obj files are in the same format as the PE .exe file maybe the only difference is that the obj files contain the symbol table or something.
They share the same format in the sense that the object files are COFF files, while PE is an extension to COFF. But the similarity ends here.
Sam111 wrote: Does the linker take the first obj file in the list and combined the other obj like...
obj1 .text obj2 .text obj3 .text and add to the relocation table, entries that obj .text depend on ,...etc etc down the list. I guess it has to check first for unresolved symbols in the other files?
unresolved symbols must be given a certain value that delimits them from resolved symbols?
I'm not sure what you mean here. The linker combines all sections together (code with code, data with data and so on). Each section can define external symbols (external to that object file in the sense that the symbol can be found in another object file or some library file), and the linker tries to resolve these symbols by locating the object file that contains a section that defines this external symbol. So the "set of sections to include in the image" grows as more and more iterations of resolving symbols are taken. Once all symbols are resolved, it calculates the address of each section and symbol, and updates references (relocations) to the symbols with their new address. At least my linker works like this, and I think other linkers more or less follow the same approach.
An unresolved symbol is a symbol that one object file references, but there is no object file that defines it - hence it's undefined, and becomes unresolved. All linkers will stop linking and refuse to output an executable file, because it would probably just crash anyway.
Sam111 wrote: Either way I am wondering since .obj are roff (relocatable object file format) OMF
I am wondering if RDOFF is the same thing as COFF/PE in windows just named differently to distinigush. Between the input obj's to the linker and the output PE .exe from the linker. I think both linker input is the the same as linker output when we are talking about PE/Coff win32 file format.
I don't think anyone uses OMF object files anymore (except when writing programs for DOS or real-mode environments). Even Borland gave up on OMF in favor of COFF. As for RDOFF, I haven't encountered a single example of this format, so I can't tell whether it's the same as COFF or not. But it's probably not the same thing.
This applies to a default setup of gcc/ld under Cygwin (and also to the MS toolchain). Though gcc/ld can be compiled to support or use other formats, and under Linux they use the ELF file format instead of COFF.

Re: please help AHHHH ld and gcc problem ?

Posted: Wed Jan 21, 2009 10:21 am
by Sam111

Code: Select all

; some code here
   nop
   nop
   call skip_over_this_string ; the distance is known at assembly time, so no relocations are required
   db "some random string", 0
skip_over_this_string:
   pop eax ; loads address of instruction (in our case data) following the call instruction
   nop ; do something with it
So when issuing the call command it automatically pushes the next address after the call on the stack then jmps to the skip_over_this_string: label/function. Because I don't get why pop eax will contain the address of the "some random string". Does the call instruction push the return address on the stack and the return address is always on the next line after the call statement.

If I am right , then the return address is the string address and when you pop eax then eax contains the address of "some random string" why the nop's ?

Either way so the call instruction is the same as the push instruction if you don't use ret. Their must be something I am missing with the call instruction? I will take a look at intel spec's later.

Ok , but

Code: Select all

call skip_over_this_string
won't this depend on where the exe's loaded so you would need a relocation entry for the skip_over_this_string label?
Remember the injection section is at a given RVA that so the address in the section a relative to the RVA and can change if the BaseImage changes from the prefered one. If this happens I would think you would have skip_over_this_string at skip_over_this_string + (|actual -base |).

So I don't see how it is independent code. The only code that would be independent is to use no variables or functions just raw machine instructions which doesn't do much? Maybe I am misunderstanding.

I think jmp commands are address independent they only resolve to like jmp 25 , or jmp -13 ,....etc etc.
But call would depend on the actual address of the function which is not fix it is only interms of RVA?

I am confused on where you have independent code. I would think the only independent code is not to use jmp or call statements . But if you did this you wouldn't have any function?

Re: please help AHHHH ld and gcc problem ?

Posted: Wed Jan 21, 2009 12:28 pm
by ru2aqare
Sam111 wrote:So when issuing the call command it automatically pushes the next address after the call on the stack then jmps to the skip_over_this_string: label/function. Because I don't get why pop eax will contain the address of the "some random string". Does the call instruction push the return address on the stack and the return address is always on the next line after the call statement.

If I am right , then the return address is the string address and when you pop eax then eax contains the address of "some random string" why the nop's ?
Yes, the call instruction pushes a return address to the stack, which is the address of the next instruction (or in this case, the address of whatever follows the call instruction). Hence if you put a string constant after the call, you get the address of the string for free - without any need for relocations. If you check the Intel manuals, you can see that the address where the call instruction transfers control is encoded using the distance from the instruction after the call instruction. Since this distance is known at compile time, there won't be a base relocation entry for the call instruction - that's the point, to get the address of the string without requiring a relocation. So this code is position independent.
Sam111 wrote: Ok , but

Code: Select all

call skip_over_this_string
won't this depend on where the exe's loaded so you would need a relocation entry for the skip_over_this_string label?

So I don't see how it is independent code. The only code that would be independent is to use no variables or functions just raw machine instructions which doesn't do much? Maybe I am misunderstanding.
The actual address of the string may vary based on where the image is loaded, but that's why the call instruction is there - to get this address.

Anyway, this is way offtopic from the initial post.