And i have new questions: What does exactly linking does? I don't understand it. Where I can read about this?
Let make a pretend machine language:
A Pretend Instruction Language
ADD 0x0(byte) <reg>(nibble) <reg>(nibble)
SUB 0x1(byte) <reg>(nibble) <reg>(nibble)
DIV 0x2(byte) <reg>(nibble) <reg>(nibble)
MUL 0x3(byte) <reg>(nibble) <reg>(nibble)
SHL 0x4(byte) <reg>(nibble) <reg>(nibble)
SHR 0x5(byte) <reg>(nibble) <reg>(nibble)
MOV 0x6(byte) <reg>(nibble) <reg>(nibble)
CALL 0x7(byte) <reg>(byte)
RET 0x8(byte)
On each instruction if the
reg operand is zero then it is a memory address. This allows each operation to perform a operation for:
(using the ADD instruction for a examples)
ADD register, register
Bytes: 0x0,0x23
The first is 0x0 which is a add instruction. The next is the two nibbles 0x2 and 0x3, a nibble is a half byte or four bits. Together they look like just one single byte. You remember is a nibble is not zero then it is a register. We can address fifteen registers when leaving zero out for a memory address. This means
ADD register
2 with register
3 and store the result in register
2.
ADD register, memory
Bytes: 0x0,0x20,0x11223344
This says
ADD (0x0) register
2 (0x
20) with memory
0 (0x2
0) and store the result in register
2. The problem is what memory address, right? Well. We append the memory addresses in order at the end of the instruction. Since this is a mock up of a thirty-two bit machine we will use four bytes (thirty-two bits) for each memory address. Look at the bytes again:
00 20 11 22 33 44
The 00 is ADD, 20 is register and memory, 11 22 33 44 is the memory address.
ADD memory, register
Bytes: 0x0,0x02,0x11223344
Notice the 0x02 is reversed from 0x20 which shows we want to ADD a memory location (0x
02) with a register (0x0
2) and store the result in the memory location (0x
02). The memory location is once again appended at the end as 11 22 33 44 (0x11223344).
ADD memory, memory
Bytes: 0x0,0x00,0x11223344,0x88772233
You can see that we have 0x0 for the ADD instruction, and 0x00 for the operands. This means we need two memory addresses:
0x
00 = 0x11223344
0x0
0 = 0x88772233
This effect applies to all of our instructions that we can use to make a program.
A Program Using Our Pretend Instruction Language
06 10 00 00 00 30 = MOV register1, memory(0x00000030) = move 4 bytes of data from memory location 0x30 into register1.
00 10 00 00 00 34 = ADD register1, memory(0x00000034) = add register1 with 4 bytes of data as memory location 0x34.
07 00 00 FF 00 00 = CALL a function that has been compiled in machine instruction at memory address 0x00FF0000.
Here we load a value from memory into a register, add it with another memory location, and finally call another function. Lets look at this other function.
04 10 00 00 00 38 = shift the register left by the number of bits stored at memory location 0x38.
08 = return (to where the CALL instruction was made)
Here is what these two functions might have looked like in C.
File: other.c
Code: Select all
unsigned long gvar3 = SOMETHING;
unsigned long b(unsigned long v)
{
return v << gvar3; // SHIFT LEFT.. and RET.
}
File: main.c
Code: Select all
unsigned long gvar1 = SOMETHING;
unsigned long gvar2 = SOMETHING;
void a()
{
b(gvar1 + gvar2); // MOV..ADD... and CALL.
return; // RET (which was not shown above)
}
These function are compiled and placed in a object file such as: main.o and other.o. The instructions might look like this:
06 10
00 00 00 00 = MOV register1, memory(0x00000030) = move 4 bytes of data from memory location 0x30 into register1.
00 10
00 00 00 00 = ADD register1, memory(0x00000034) = add register1 with 4 bytes of data as memory location 0x34.
07 00
00 00 00 00 = CALL a function that has been compiled in machine instruction at memory address 0x00FF0000.
If you notice the memory locations for the instructions are all zeros.. why? Well. It can be a complicated answer, but the simplest one I can give at this moment is because it allows the linker to decide where to place the function and global variables because:
- You might be linking multiple programs from some of the same sources file. This keeps you from having to recompile each source file.
- It allows you to reorder where the instructions and data are placed into memory instead of doing so inside the code which makes it messy.
- It allows each source file to be compiled separately; making the potential for only compiling the files changed in a large project with hundreds of source files. What has already been compiled can stay in object form..
But, I still have not answered how and hopefully I have not got too deep and confusing for you.. but anyway I will try...
Not only are the instructions stored in each object file, but also a symbol table. The symbols are:
gvar1,
gvar2,
gvar3, and
b. Also in the object file is a relocation table which associates relocations with symbols. The text section and relocation table might look something like this:
Text Section (section .text)
06 10
00 00 00 00 = MOV register1, memory(0x00000030) = move 4 bytes of data from memory location 0x30 into register1.
00 10
00 00 00 00 = ADD register1, memory(0x00000034) = add register1 with 4 bytes of data as memory location 0x34.
07 00
00 00 00 00 = CALL a function that has been compiled in machine instruction at memory address 0x00FF0000.
Relocation Table (section .reloc)
00 02 'gvar1'
00 08 'gvar2'
00 0E 'b'
Linking main.o and other.o
ld main.o other.o -o myprogram
The linker will look for
gvar1 and
gvar2 which will be found in the current object file. It will then use the relocation information of
00 02,
00 08, and
00 0E to insert the actual address of where these symbols will live in memory when the program is executed. It is the linker that choose to place
gvar1 at 0x30 and
gvar2 at 0x34.
One symbol will not be found in the current object file by the linker:
b. It will find this symbol in
other.o. When found the linker will compute where in memory the symbol
b from
other.o will be. Of course going from our example above it
had decided that it will be at 0xFF0000. So it will know where to write this address at in the
.text section..
I dunno. I most likely confused it much more than I should have. If someone has a link to resource that explains this much better it would be nice, since I feel like it could take quite a few pages to do a through explanation of this topic.