GCC cross-compiler mcmodel=large?

mariuszp · Post by **mariuszp** » Thu Dec 11, 2014 1:20 pm

I have made a cross-compiler targeting my OS, the target string being "x86_64-glidix". I have a few problems with it thought:

1) It outputs 2MB pages (i.e. the p_align property of program headers is 2MB), but my elf_x86_64_glidix.sh file in the LD source code says:

Code: Select all

MAXPAGESIZE="0x1000"
COMMONPAGESIZE="0x1000"

2) When I try linking object files compiled by GCC, it 'truncates relocations against .rodata', unless I pass the -mcmodel=large option to GCC. This will cause problems when porting software.

Any pointers to what might be wrong?

AndrewAPrice · Post by **AndrewAPrice** » Thu Dec 11, 2014 6:38 pm

Are you trying to build a upper half or lower half kernel? What compiler errors are you getting?

mariuszp · Post by **mariuszp** » Fri Dec 12, 2014 7:08 am

It is a lower half kernel, and the kernel compiles without problems. The problem is with compiling userspace applications. The compiler outputs no errors, but when I try to link the '.o' files to create executable, it sais that it has to truncate relocations against '.rodata'.

However, when those files are produces with -mcmodel=large passed to GCC, they link without errors. Is it possible to edit the source of GCC so that it always uses that option? Am I missing something?

Icee · Post by **Icee** » Fri Dec 12, 2014 7:15 am

What the diagnostic means is that some of your symbols wind up above 2G. Enabling the large model allows to address these symbols but this has its implications, i.e. addresses will have to be loaded into registers prior to access because no instructions (with an exception of one encoding of MOV) are able to use 64-bit displacements.

mariuszp · Post by **mariuszp** » Fri Dec 12, 2014 7:20 am

Oh yes, the TEXT_START_ADDR is at the 523 GB mark. So yes, the mcmodel needs to be large, but can I modify the linker/compiler (e.g. with emulparams) to always use the large model?

Icee · Post by **Icee** » Fri Dec 12, 2014 8:12 am

Well, I suspect that changing the default in gcc/config/i386/i386.opt should suffice but I would strongly discourage you to follow this design of yours. I fail to see any advantage in the memory layout setup your system uses, while there are at least two problems with the large cmodel: (1) you will need more instructions in your binaries and, therefore, more instructions that go through the CPU frontend (which alone will result in measurably larger programs and somewhat slower code); and (2) the register pressure will measurably increase which is a pain for x86-64 where only 16 GPRs (at most, this is without taking into account that RSP, and RBP, and sometimes RBX are special) are available for the optimiser.

TL;DR: don't do it.

mariuszp · Post by **mariuszp** » Fri Dec 12, 2014 10:32 am

Icee wrote:Well, I suspect that changing the default in gcc/config/i386/i386.opt should suffice but I would strongly discourage you to follow this design of yours. I fail to see any advantage in the memory layout setup your system uses, while there are at least two problems with the large cmodel: (1) you will need more instructions in your binaries and, therefore, more instructions that go through the CPU frontend (which alone will result in measurably larger programs and somewhat slower code); and (2) the register pressure will measurably increase which is a pain for x86-64 where only 16 GPRs (at most, this is without taking into account that RSP, and RBP, and sometimes RBX are special) are available for the optimiser.

TL;DR: don't do it.

Oh, thanks for the advice. So where should my kernel preferably be? My memory map is currently:

1) The first 512GB reserved for kernel code, data and bss.
2) The next 512GB is for userspace code, data, bss, etc.
3) The next 512GB is for the kernel heap.

Is it good enough to swap the userspace and kernelspace area or would there still be downsides to the layout?

Combuster · Post by **Combuster** » Fri Dec 12, 2014 12:59 pm

Within a 32-bit range, you have 4GB available. You'll want to have all symbols pointing within that region - that means the heap and stack can be anywhere since binaries store no pointers there. You'll also have to assign this 4GB to both the kernel and userspace, otherwise one of the two ends up with the slower code anyway.

The net result is that often 2G is given to the kernel - for everything that's part of it's binaries, and the other 2G to userspace. Any allocations by either can then happen in the remainder of userspace. By the time an application becomes large enough that all code and data sections (libraries included, but not the heap) together are larger than 2G, you already have an exceptional case, and in only that case would it make sense to use mcmodel=large.

mariuszp · Post by **mariuszp** » Fri Dec 12, 2014 1:57 pm

Combuster wrote:Within a 32-bit range, you have 4GB available. You'll want to have all symbols pointing within that region - that means the heap and stack can be anywhere since binaries store no pointers there. You'll also have to assign this 4GB to both the kernel and userspace, otherwise one of the two ends up with the slower code anyway.

The net result is that often 2G is given to the kernel - for everything that's part of it's binaries, and the other 2G to userspace. Any allocations by either can then happen in the remainder of userspace. By the time an application becomes large enough that all code and data sections (libraries included, but not the heap) together are larger than 2G, you already have an exceptional case, and in only that case would it make sense to use mcmodel=large.

Forcing everything below 4GB would make the use of 64-bit architecture pointless.

Owen · Post by **Owen** » Fri Dec 12, 2014 2:05 pm

mariuszp wrote:
Combuster wrote:Within a 32-bit range, you have 4GB available. You'll want to have all symbols pointing within that region - that means the heap and stack can be anywhere since binaries store no pointers there. You'll also have to assign this 4GB to both the kernel and userspace, otherwise one of the two ends up with the slower code anyway.

The net result is that often 2G is given to the kernel - for everything that's part of it's binaries, and the other 2G to userspace. Any allocations by either can then happen in the remainder of userspace. By the time an application becomes large enough that all code and data sections (libraries included, but not the heap) together are larger than 2G, you already have an exceptional case, and in only that case would it make sense to use mcmodel=large.
Forcing everything below 4GB would make the use of 64-bit architecture pointless.

You put the kernel at -2GB (-mcmodel=kernel). You put any statically linked userspace binaries in the bottom 2GB.

You can stick anything else in the bit inbetween. Traditionally, the kernel stuff all goes in the top half, the userspace stuff in the bottom half. There are exceptions (e.g. Solaris); they tend to be bad ideas (e.g. the fact that Solaris hands userspace addresses in the upper half is why a 64-bit Firefox build isn't possible on that platform)

eryjus · Post by **eryjus** » Fri Dec 12, 2014 7:57 pm

Owen wrote:You put the kernel at -2GB (-mcmodel=kernel).

I would encourage you to also think about where everything goes and make a memory map for your kernel before you start. I made the mistake of not doing this step and had a couple of false starts. I also had to do a major memory location do-over once I started considering recursive mapping in the paging tables.

I would take the time to make a memory map so you know where everything will live. It might be a living document for a while, but it's better to start with a plan.

Brendan · Post by **Brendan** » Fri Dec 12, 2014 10:19 pm

Hi,

eryjus wrote:I would take the time to make a memory map so you know where everything will live. It might be a living document for a while, but it's better to start with a plan.

I agree. It would begin something like this:

0x0000000000000000 to 0x000000007FFFFFFF = area for "2 GiB or smaller" executable files
0x0000000080000000 to 0x00007FFFFFFFFFFF = area for dynamically allocated memory (heap, stack, etc)
0x0000800000000000 to 0xFFFF7FFFFFFFFFFF = non-canonical hole (unusable)
0xFFFF800000000000 to 0x00007FFFFFFFFFFF = area for dynamically allocated memory (heap, stack, etc)
0xFFFFFFFF80000000 to 0xFFFFFFFFFFFFFFFF = area for "2 GiB or smaller" kernel files

Of course then you'd split up each area into specific pieces (text, data, bss; shared library area, whatever).

Cheers,

Brendan

Combuster · Post by **Combuster** » Sat Dec 13, 2014 5:36 am

mariuszp wrote:Forcing everything below 4GB

Before that, I wrote:have all symbols pointing within that region - that means the heap and stack can be anywhere

xenos · Post by **xenos** » Sat Dec 13, 2014 7:40 am

Brendan wrote:Of course then you'd split up each area into specific pieces (text, data, bss; shared library area, whatever).

I'm quite curious how exactly you would split these things up. Do you have any particular locations for user mode libraries? Or page tables, PL0-stacks, other kernel relevant stuff?

mariuszp · Post by **mariuszp** » Sat Dec 13, 2014 12:19 pm

OK, I tried moving my kernel to higher memory (0xFFFF800000000000+) and I have a very strange TSS problem. The structure of the TSS segment (in the GDT) is:

Code: Select all

	; The TSS
	.TSS_limitLow: dw 0
	.TSS_baseLow: dw 0
	.TSS_baseMiddle: db 0
	.TSS_Access: db 11101001b
	.TSS_limitHigh: dw 0
	.TSS_baseMiddleHigh: db 0
	.TSS_baseHigh: dd 0
	dd 0

Which I got from the Intel manuals. The base and limit fields are filled in later by some assembly code, and after loading the GDT once the TSS segment was filled in, I use a Bochs magic breakpoint, then I type "info gdt" and it says:

Code: Select all

GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0x00000000, Execute-Only, Non-Conforming, Accessed, 64-bit
GDT[0x02]=Data segment, base=0x00000000, limit=0x00000000, Read-Only, Accessed
GDT[0x03]=Code segment, base=0x0f000000, limit=0x0000ffff, Execute-Only, Non-Conforming, 64-bit
GDT[0x04]=Data segment, base=0x00000000, limit=0x00000000, Read/Write
GDT[0x05]=32-Bit TSS (Available) at 0x001001b0, length 0x000c0
GDT[0x06]=??? descriptor hi=0x000000ff, lo=0xff800000
You can list individual entries with 'info gdt [NUM]' or groups with 'info gdt [NUM] [NUM]'

I don't understand why it thinks the TSS segment is 32-bit... according to the Intel manuals, the type which I use (1001) is automatically considered 64-bit in 64-bit mode. However, Bochs is showing me that the TSS segment is 32-bit, and the high 32 bits of my TSS address become part of segment 6, which is undefined, and therefore the TSS base becomes invalid, and then it says the TSS base is non-canonical! What am I doing wrong here?

OSDev.org

GCC cross-compiler mcmodel=large?

GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?

Re: GCC cross-compiler mcmodel=large?