please help AHHHH ld and gcc problem ?

ru2aqare · Post by **ru2aqare** » Fri Jan 16, 2009 2:59 pm

Sam111 wrote:
Assuming I have a PE and search it to find that .data , .text begain at

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00008730  000010d0  000010d0  000008d0  2**4
                  CONTENTS, ALLOC, LOAD, CODE
  1 .data         00000e00  00009800  00009800  00009000  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00003400  0000a600  0000a600  00000000  2**2
                  ALLOC
  3 .comment      00000010  0000da00  0000da00  00009e00  2**2
                  CONTENTS, DEBUGGING

So .text begins at 000010d0.

No, it doesn't. It starts at file offset 0x8D0 and is 0x8730 bytes in length. When the load loads this PE file, it allocates a memory range large enough to hold the entire image (all sections together). Let's assume this memory range starts at address 1M. It starts loading the sections one by one at this address (actually, Windows places the PE headers first at this address, so the first section won't be loaded at 1M, but 1M+256 bytes or whatever. But let's ignore this). This address is the base address. The sections will be loaded at base address + LMA, and if paging is used, they will appear at base address + VMA.
If you mean it's virtual address (the address the application "sees"), then it starts at load address + 0x10d0.

However, if you consider an object file (which is NOT a PE file as you wrote), the linker combines all code, data and rdata sections to one big code, data and rdata section respectively (all code segments from all object files will be combined into one large code segment. All data segments from all object files will be combined into one large data segment, and so on). Then the linker calculates the addresses of the symbols, and resolves references to the symbols. But I already wrote that.

Sam111 wrote: Also If I want to load it to some other memory address say 00003456. I would first have to find the entry point symbol in the symbol table then find it's memory size then update it with 00003456. And then we would have the next symbol at 00003456 + size of starting entry = next need to be updated symbol.

You set the load address to 0x3456 and recalculate the base addresses of each section, then copy the sections there and perform base relocations (this is actually the step that gathers most of the hate towards PE). If that wasn't the answer to your question, then I didn't understand what you wrote.

Sam111 wrote: Don't know what all this crap means below
Code: Select all
[731](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 1)... important stuff like start address symbol name...

I would guess symbol 731 is defined in section 1 (-1 means external if I remember correctly), which has flags 0, is at the specified address and has the specified name.

Sam111 · Post by **Sam111** » Sat Jan 17, 2009 7:49 pm

No, it doesn't. It starts at file offset 0x8D0 and is 0x8730 bytes in length. When the load loads this PE file, it allocates a memory range large enough to hold the entire image (all sections together). Let's assume this memory range starts at address 1M. It starts loading the sections one by one at this address (actually, Windows places the PE headers first at this address, so the first section won't be loaded at 1M, but 1M+256 bytes or whatever. But let's ignore this). This address is the base address. The sections will be loaded at base address + LMA, and if paging is used, they will appear at base address + VMA.
If you mean it's virtual address (the address the application "sees"), then it starts at load address + 0x10d0.

However, if you consider an object file (which is NOT a PE file as you wrote), the linker combines all code, data and rdata sections to one big code, data and rdata section respectively (all code segments from all object files will be combined into one large code segment. All data segments from all object files will be combined into one large data segment, and so on). Then the linker calculates the addresses of the symbols, and resolves references to the symbols. But I already wrote that.

Ok, I guess I don't fully understand maybe it's me.
But when I disassembly the code I get the top line of code 000010d0 <start> so the address is at 000010d0.
So I assume you need to put the <start> code starting at 000010d0.

It is the same question as if you had the same exact assembly code but org 0 for one of them and org 0x7C0 for the other.
And you where loading the code into 0000:07C0. The one that had org 0 won't work because the address's are started off relative to 0000:0000 mov ax, bx (first instruction in code) . As opposed to 0000:07c0 so if I had
0000:07C0 jmp address
0000:07C2 db myvarable 10.
address: mov ax , myvarable
Try creating a com file with org something other then 100h you won't get it to work. At least I know whenever I write a bootloader I have to do org 0x7C0 to get it to work or at least do when I need to display a string using int 13h I need to point [ds:dx] to myvarable by mov dx , 0x7c2 then calling int 13h. But if I use org 0x7C0 I don't have this problem when I do mov dx , myvarable. Because When nasm assemblies it it assumes the varables or code addresses start of at org directive.

Now Back to changing a PE into a bin. I copy the .text and .data section out of the PE. But start is at 000010d0 when disassemblied I would think this means that it is like org 000010d0. So I would have to load it at that memory address and the jmp 000010d0 .

file offset 0x8D0 and is 0x8730 bytes in length

Yes I know this but that is just where it is located in the PE.
When the loader load's it I would think it has to be placed exactly at LMA. Or then you run into the same org problem I had with my bootloader program. Unless of course their is an easy way to shift all the address's acordingly?

What I don't get is how you recalcuate the address from the symbol table. Since the .text section is just numbers how do you know from the update symbol table what need's to be updated. Like say you have call 0x5678 in the .text section how do you know if it was 0x5678 in the symbol table or just a fix address that is not in the symbol table but needs to be updated with 0x5678 + (new memory address - old starting memory address ).

I guess I just don't know how to traverse thru the .text section (code ) and update it with the correct info in the symbol table? I get how to recalculate the symbol's in the symbol table I just don't get how you update the code accordingly.

In theory if I load the code section followed by the data section into memory 000010d0 and jump to it it would work. I believe but not sure. If I wanted to load the extracted code section and data section into memory other then 000010d0.
I would be equivalently changing org 000010d0 to a new memory starting address.

I hope you get what I am getting at. Basically how do you change the code starting at some address to the code starting at another address. Without screwing up the code inside the functions etc etc...
You would have to some how update everything with the orginal address + difference of orginal address and new address.
I don't get how all the symbols can be use to update the .text (code) . Would you just need to look for the address of the symbol in the .text section and replace it with the updated version in the symbol table? I am still unsure if you had a
jmp <functionstartaddress + 25> in the funtion you would also have to update this instruction but it isn't a symbol in the symbol table.

ru2aqare · Post by **ru2aqare** » Sun Jan 18, 2009 1:59 am

Sam111 wrote: Ok, I guess I don't fully understand maybe it's me.
But when I disassembly the code I get the top line of code 000010d0 <start> so the address is at 000010d0.
So I assume you need to put the <start> code starting at 000010d0.

Oh, sorry, it seems I misunderstood the question.

Sam111 wrote: Now Back to changing a PE into a bin. I copy the .text and .data section out of the PE. But start is at 000010d0 when disassemblied I would think this means that it is like org 000010d0. So I would have to load it at that memory address and the jmp 000010d0 .
file offset 0x8D0 and is 0x8730 bytes in length
Yes I know this but that is just where it is located in the PE.
When the loader load's it I would think it has to be placed exactly at LMA. Or then you run into the same org problem I had with my bootloader program. Unless of course their is an easy way to shift all the address's acordingly?

Sam111 wrote: I hope you get what I am getting at. Basically how do you change the code starting at some address to the code starting at another address. Without screwing up the code inside the functions etc etc...

There is an easy way to do that. Just load the PE file, and perform a base relocation. That is, calculate the address you loaded the file at (0x10d0 or whatever), get the preferred load address from the PE file header (let's assume it says 0x1000, but almost always it is 64K aligned). Then you get the difference (0xd0) which is the amount you have to subtract from every base relocation to get the executable to run correctly. The thing is, every time you write