Octocontrabass wrote: The stack still contains many addresses that still refer to the original identity-mapped portion of memory, including return addresses, so your EIP does not stay in the higher half for long.
If you want to run C(++) code before you've switched to the higher half mapping, you have maybe two choices.
The sensible choice is to have a separate "startup" section linked to execute at its load address, instead of in the higher half. This sort of design is easy to port to other architectures (e.g. 64-bit) later on, but the "startup" section and the higher-half section can't reference each others' symbols. It works something like this:
1. Your bootloader loads the kernel to 1MB.
2. Your kernel's assembly entry point gets executed.
3. It does some stuff and then jumps to C++ code in the startup section.
4. The C++ code in the startup section creates the initial page tables.
5. The C++ code returns to the assembly entry point code.
6. The assembly entry point code enables paging and jumps to the main higher-half C++ code.
7. The identity mapping gets removed.
The ridiculous choice, suggested in that screenshot you posted, is to use segmentation to wrap addresses around the 4GB mark so that virtual address 0xC0000000 maps to physical address 0x00100000 in your segments. This sort of design is specific to 32-bit x86, and it's strange enough that some CPUs might misbehave. It works something like this:
1. Your bootloader loads the kernel to 1MB.
2. Your kernel's assembly entry point gets executed.
3. It prepares a GDT with ridiculous descriptors, and loads the data and stack segments.
4. It performs a far jump to your C++ code, effectively using segmentation to put you in the higher half. (Virtual addresses are in the higher half, but linear addresses are not.)
5. Paging is initialized and enabled.
6. Some external assembly function is used to reload the segments, including CS, with flat segments. (Both virtual and linear addresses are now in the higher half.)
7. The identity mapping gets removed.
That is brilliant thinking, I completely forgot about the stack values, not just its location.simeonz wrote:Linux does something similar to what you describe. There are some details that need to be handled differently:You need to compile the "higher-half" and "lower-half" parts separately in a different set of object files and split their sections to different output ranges during your final link. They cannot call each other. If you do have common library code, you must either compile it twice to produce non-conflicting symbol names and use weakref aliases in your header files, or build intermediate relocatable file for each set of object files, linking against a static library and use symbol hiding (probably more elegant).
- My bootloader loads my kernel to 1 MB.
- My kernel gets executed.
- It does some stuff and then calls the lower-half C++ code.
- Other systems get initialized.
- Paging gets enabled, the kernel gets simultaneously mapped to both 1 MB and 3 GB.
- You return to the assembly code.
- You jump into the higher-half C++ code.
- Identity mapping gets removed.
If you want to share the code between the lower-half and upper-half mapping, without separating the object files, you have to make it position-independent indeed. You have to use the -fpie option when compiling and "-shared -Bstatic -Bsymbolic -pie" when linking (may be check this post.) Then, you need to offset the GOT entries before jumping to the higher half mapping. Linux does this, but in a rather convoluted way. There is stub code, which decompresses the actual kernel image. The stub is itself relocatable and the decompressed kernel image is relocated separately. Still, I am going to illustrate how that is done for the stub code, because it is simpler. In the stub's linker script, two markers, namely _got and _egot, are placed around the .got.plt and .got sections (see here). After the stub is moved to its final location, corresponding in your case to the point where the upper half mapping is already established, but before jumping into it, the stub iterates the range between _got and _egot and fixes the entries, adding a rebasing offset (see here). This cannot be considered proper ELF relocation handling, but it works if your output contains only R_386_RELATIVE relocations targeting the GOT. Assuming that you are not importing symbols (which is why you use -pie and not -pic), there either should be no relocations or only this type of relocations. Namely, there shouldn't be any R_386_GLOB_DAT and R_386_JMP_SLOT. You will want to assert that in your build script.Octacone wrote:Could I fix anything by making my code position independent?
I am really glad that it can be done.
So here is a question for both of you, how do I actually do something like this, I mean double linking, how can my code be both "lower" and "higher" at the same time?
As far as I can tell you guys have two different approaches:
1.One set of object files, no shared code, different code sections (how?).
2.Two sets of object files, no shared code (without using GOT and complex linking), somehow link it together?
First idea seems more appealing to me. What I am interested in is the technical aspect of this.
How can an executable file (ELF in this case, but doesn't matter) have two different internal locations (1 MB offsets and 3 GB offsets) at the same time?
Is my way of thinking good:
Kernel.asm:
Code: Select all
section .lower_half_text
Pre_Kernel_Main_Low:
...Zero BSS...
...Set up stack pointer... (to offset it by 3 GB or not?)
...Call paging initialization code, ID map the kernel + 3 GB map it...
...return to here...
...call Higher_Half_Main...
section .higher_half_text
Pre_Kernel_Main_High:
...push bootinfo pointer... (my custom multiboot header equivalent, physical address, do I need to offset this by 3 GB?)
...call Kernel_Main... (and remove ID mapping in it)
Code: Select all
ENTRY(Pre_Kernel_Main_Low)
OUTPUT_FORMAT("elf32-i386")
SECTIONS
{
. = 0x100000;
.lower_half_text :
{
*(.lower_half_text)
}
.higher_half_text: AT(ADDR(.higher_half_text) - 0xC0000000)
{
*(.higher_half_text)
}
//Do I need to touch other sections?
.rodata :
{
start_constructors = .;
*(SORT(.ctors*))
end_constructors = .;
*(.rodata)
}
.data :
{
*(.data)
}
.bss :
{
bss_start = .;
*(.bss)
*(COMMON)
bss_end = .;
}
kernel_end = .;
}