Page 1 of 1
How to find entrypoint of ELF loaded to random address
Posted: Wed Nov 25, 2020 5:41 am
by kotovalexarian
I have Multiboot 2 kernel loaded with GRUB2. I have succeeded in loading raw binary modules, switching to usermode, jumping to that modules and invoking some syscalls from them. Now I want to change module format to ELF. ELF usually depends on virtual memory because location of symbols is fixed during linking. However, GRUB2 doesn't understand ELF and virtual memory, it just puts binary blobs to random places in memory.
I have paging enabled with identity mapping from virtual to physical addresses. And I don't want to configure virtual memory mapping for now. I just want to jump to module's entrypoint. How can I calculate real memory entrypoint using known module address and it's virtual entrypoint from ELF header? Maybe I can fix it's virtual position in linker? However, I want more common solution.
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 2:24 am
by kotovalexarian
I've found a solution:
Code: Select all
const unsigned long real_entrypoint = kinfo.modules[0].base + elf_header->entrypoint;
tasks_switch_to_user(real_entrypoint);
However, my ELFs are strange because entrypoint is always
0x0:
Code: Select all
$ readelf -h procman/procman | grep -i 'Entry point'
Entry point address: 0x0
The code of the executable is suddenly successfully executed despite it starts executing from ELF header in memory. I expect that it should fail because of invalid opcode or something like this. Why doesn't this happen?
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 9:49 am
by nullplan
What is the file type? ET_DYN does not have to have an entry point. Are you specifying an entry point when linking? By default, ld will assume the entry point is _start. If that symbol is not present, and the linker is creating a shared object, it might just leave the field set to 0. If your entry point has a different name, specify it with -e.
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 10:55 am
by foliagecanine
At least for me GRUB2 loads the kernel at physical address 0x100000 (1MiB).
It looks like this address is specified in the ELF header:
Code: Select all
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0xc0100000 0x00100000 0x1de7c 0x1de7c R E 0x1000
LOAD 0x01f000 0xc011e000 0x0011e000 0x0069c 0x435f1 RW 0x1000
You'll have to enable paging yourself after GRUB loads though.
If you are planning on identity mapping the kernel, this will be easy since you just have to set both the VirtAddr and PhysAddr to wherever you want your code to start, and won't need to use assembly to set up paging beforehand.
I don't personally know how to set the preferred physical address in the linker, but hopefully this helps.
What compiler/linker and compiler options are you using?
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 1:43 pm
by kotovalexarian
nullplan wrote:What is the file type?
Executable:
Code: Select all
$ readelf -h procman/procman | grep 'Type'
Type: EXEC (Executable file)
nullplan wrote:Are you specifying an entry point when linking?
Yes:
Code: Select all
$ cat procman/linker.ld
OUTPUT_ARCH("i386")
OUTPUT_FORMAT("elf32-i386")
ENTRY(_start)
SECTIONS
{
. = 0x0;
.text BLOCK(4K) : ALIGN(4K)
{
*(.text)
}
.rodata BLOCK(4K) : ALIGN(4K)
{
*(.rodata)
}
.data BLOCK(4K) : ALIGN(4K)
{
*(.data)
}
.bss BLOCK(4K) : ALIGN(4K)
{
*(COMMON)
*(.bss)
}
}
And the symbol does exist in ELF:
Code: Select all
$ readelf -s procman/procman
Symbol table '.symtab' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 0000005c 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 FILE LOCAL DEFAULT ABS start.c
5: 00000000 0 FILE LOCAL DEFAULT ABS
6: 00000094 0 OBJECT LOCAL DEFAULT 2 _GLOBAL_OFFSET_TABLE_
7: 00000000 90 FUNC GLOBAL DEFAULT 1 _start
foliagecanine wrote:At least for me GRUB2 loads the kernel at physical address 0x100000 (1MiB).
It looks like this address is specified in the ELF header:
It seems you're talking about kernel ELF while I'm talking about module ELF.
foliagecanine wrote:What compiler/linker and compiler options are you using?
For module it's
$(LD) -o $@ -T linker.ld $^:
Code: Select all
$ cat procman/Makefile
CCPREFIX = i686-elf-
AS = $(CCPREFIX)as
CC = $(CCPREFIX)gcc
LD = $(CCPREFIX)ld
PROCMAN = procman
CFLAGS = -std=gnu99 -ffreestanding -nostdinc -fno-builtin -fno-stack-protector -Wall -Wextra
all: $(PROCMAN)
clean:
rm -f $(PROCMAN) start.o
$(PROCMAN): start.o
$(LD) -o $@ -T linker.ld $^
start.o: start.c
$(CC) -c $< -o $@ $(CFLAGS)
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 1:53 pm
by bzt
kotovalexarian wrote:IHowever, my ELFs are strange because entrypoint is always 0x0:
...
The code of the executable is suddenly successfully executed despite it starts executing from ELF header in memory. I expect that it should fail because of invalid opcode or something like this. Why doesn't this happen?
That's perfectly valid. Entry point is an absolute memory address within the text segment in memory and isn't a file offset. If you don't include the ELF header in the text segment, and the text segment's base address is 0, then it's perfectly valid to have an entry point of 0.
To add the ELF header in the text segment, you should use a PHDRS block in the linker script and start the text segment at ELF header's size:
Code: Select all
PHDRS
{
text PT_LOAD FILEHDR PHDRS;
...
}
SECTIONS
{
.text . + SIZEOF_HEADERS : {
...
This way you don't have to copy out the segment from the middle of the ELF, but you can load the file directly at a given address (however you don't notice this because GRUB does copy the segment for you). Then entry point will be virtaddress + file offset.
Cheers,
bzt
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 3:51 pm
by Korona
nullplan wrote:What is the file type? ET_DYN does not have to have an entry point.
Actually, that is not true. For example:
Code: Select all
$ /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]
You have invoked `ld.so', the helper program for shared library executables.
PIEs are also linked as ET_DYN.
EDIT: nevermind, I misunderstood your post, nullplan, and you did not claim that ET_DYN *cannot* have an entry point.
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 4:09 pm
by kotovalexarian
bzt wrote:That's perfectly valid. Entry point is an absolute memory address within the text segment in memory and isn't a file offset. If you don't include the ELF header in the text segment, and the text segment's base address is 0, then it's perfectly valid to have an entry point of 0.
Thank you for noticing this. I expected it to work this way, but missed it in the spec.
However, I don't jump to the base of text segment + entry point. I jump to the base of loaded module + entry point. This should be exactly ELF header. And I don't understand why it doesn't cause Invalid Opcode interrupt or something like this.
Re: How to find entrypoint of ELF loaded to random address
Posted: Thu Nov 26, 2020 11:07 pm
by foliagecanine
Well, look at the ELF header:
Code: Select all
0x7F E L F =
0: 7f 45 jg 0x47
2: 4c dec esp
3: 46 inc esi
I'm pretty sure it's designed to act as an entrypoint.
Re: How to find entrypoint of ELF loaded to random address
Posted: Fri Nov 27, 2020 8:33 am
by bzt
kotovalexarian wrote:This should be exactly ELF header. And I don't understand why it doesn't cause Invalid Opcode interrupt or something like this.
Because there's no ELF header at the loaded module's address (unless you use that special linker script). GRUB does not load the ELF file as-is, instead it
iterates the program headers and loads only the segments from the middle of the file (see there's a "seek" in line 146 before the "read"). Go on, dump the memory at loaded module's address, you'll see only the text segment there and no ELF header at all.
kotovalexarian wrote:I jump to the base of loaded module + entry point.
If your ELF is an executable then don't add the loaded module's address, you should simply jump to the entry point. For shared libraries that contain position independent code and therefore the entry point is a text-segment relative offset, you might need to add the loaded module's address, but only if you load the ELF yourself. I believe this shouldn't be needed when the module is loaded by GRUB, as it should take care of the relocations too. My advice is, print out the
loaded module's address, the
entry point's address, and also dump the memory at both
entry point and
load address + entry point to see which corresponds to your actual code (you can check the machine code that should be there with objdump).
Cheers,
bzt
Re: How to find entrypoint of ELF loaded to random address
Posted: Fri Nov 27, 2020 9:43 am
by nexos
What is your linker script, if you have one?
Re: How to find entrypoint of ELF loaded to random address
Posted: Fri Nov 27, 2020 9:58 am
by bzt
nexos wrote:What is your linker script, if you have one?
The OP has already posted the linker script
here. It does not include the ELF header in the text segment.
Cheers,
bzt
Re: How to find entrypoint of ELF loaded to random address
Posted: Fri Nov 27, 2020 1:35 pm
by Octocontrabass
foliagecanine wrote:Well, look at the ELF header:
[...]
I'm pretty sure it's designed to act as an entrypoint.
How would that work with all of the the dozens of CPU architectures ELF supports? (I'm pretty sure it's designed to be a human-readable identifier that's likely to survive being mangled by unintended transformations and unlikely to be mistaken for plain text even after being mangled.)
Either way, x86 machine code is pretty dense, so it's unlikely for the ELF header to contain any invalid opcodes.
bzt wrote:GRUB does not load the ELF file as-is, instead it
iterates the program headers and loads only the segments from the middle of the file (see there's a "seek" in line 146 before the "read"). Go on, dump the memory at loaded module's address, you'll see only the text segment there and no ELF header at all.
I think you might be confusing GRUB modules with Multiboot modules. GRUB does not parse Multiboot modules, it just loads them into memory.
It sounds like what's necessary here is a position-independent executable (and maybe some linker magic so the segments in the file are already positioned the way they need to be in memory). That way, you only need to calculate the offset between the virtual address specified in the headers and the actual virtual address where the segments were loaded, then add that offset to the entry point to find where to jump.