Page 1 of 2
Upper half kernel
Posted: Tue Aug 28, 2007 8:25 pm
by vhg119
Hi everyone. I'm reading this (
http://www.osdev.org/wiki/Higher_Half_With_GDT tutorial) and I'm trying to wrap my mind around how it works.
This is the part that I'm confused about:
Code: Select all
start:
; here's the trick: we load a GDT with a base address
; of 0x40000000 for the code (0x08) and data (0x10) segments
lgdt [trickgdt]
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
; jump to the higher half kernel
jmp 0x08:higherhalf
higherhalf:
; from now the CPU will translate automatically every address
; by adding the base 0x40000000
mov esp, sys_stack ; set up a new stack for our kernel
call kmain ; jump to our C kernel ;)
The trickgdt entries place the base of the segment at 0x4000000.
The author loaded the trickgdt and used it before he even set up paging.
How is this possible? Wouldn't the processor try to access the physical address 0x40000000, which would result in some error?
Vince
Posted: Wed Aug 29, 2007 12:26 am
by jnc100
vhg119 wrote:Wouldn't the processor try to access the physical address 0x40000000, which would result in some error?
Yes, assuming you ever try to access offset 0x0 from you code. A higher half kernel (actually in this case its the upper 1GB, so higher quarter is probably a more appropriate term) is linked such that the code and data starts at 0xC0000000. In a segment starting at base 0x40000000, offset 0xC0000000 = virtual (physical without paging) address 0x0, because 0x40000000 + 0xC0000000 causes a carry in a 32-bit integer back to 0x0.
Later on, of course, you accomplish the same with paging, and can set your segments bases back to 0.
Regards,
John.
Posted: Wed Aug 29, 2007 8:37 am
by JAAman
jnc100 is correct, though i have to say, i dont like the 'GDT trick' method, as it is a waste (its unnecessary) -- the only reason to use it, is to reduce the amount of ASM code you have to write -- but it generally requires more ASM than skipping it altogether
How is this possible? Wouldn't the processor try to access the physical address 0x40000000, which would result in some error?
you will never get an error from trying to read/write memory that doesnt exist -- this is important to understand, you will not get valid results, but you wont get an error -- although, if you miss RAM you may hit hardware, which can (although rarely) permanently destroy your computer -- so make sure you know exactly where your RAM is located, and dont allow anything to write to areas if you dont know what is located there (this is why i dont allow my Os to use more than 64MB (less on older systems) on any system that doesnt support e820 -- all the other memory detection methods are allowed to return misleading results under certain (rare) conditions
Posted: Wed Aug 29, 2007 9:42 am
by vhg119
I guess my questions is, how would the code in that tutorial work?
When the code does this
jmp 0x08:higherhalf
and the descriptor at 0x08 indicates that the base is at 0x40000000, wouldn't he be jumping effectively to 0x4000000:higherhalf?
Looking at the code, it doesn't look like he copied the kernel to 0x40000000, so there isn't anything at that address.
Furthermore, paging isn't enabled yet, so 0x40000000 is mapped directly to the identical physical address.
And, 'higherhalf' is in the same module as start and I don't see any 'org' statements anywhere, I don't think he used a linker trick.
Posted: Wed Aug 29, 2007 11:15 am
by jnc100
Look at the linker script again...
Regards,
John.
Posted: Wed Aug 29, 2007 12:41 pm
by vhg119
I'm looking at it again right now.
If I'm interpreting this correctly, and I probably am not, he sets the VMA of the .text section to 0xC0000000. Then he sets the LMA of the .text section to 0x100000 + sizeOfSetupSection.
I'm reading this to learn more about LD:
http://jamesthornton.com/redhat/linux/E ... tions.html
However, I couldn't find a simple (for me to understand) explanation of what the VMA and the LMA is.
Could someone please explain?
Posted: Wed Aug 29, 2007 2:06 pm
by vhg119
I just saw that the output format is ELF. I was under the impression that kernels needed be a binary format. Otherwise, what system is loading the kernel specified by the ELF header?
Posted: Wed Aug 29, 2007 2:08 pm
by AJ
Posted: Wed Aug 29, 2007 2:17 pm
by vhg119
Crap. That makes sense. I'm using my own bootloader though.
Is there a way to make a Higher Half kernel using binary output file format?
Posted: Thu Aug 30, 2007 6:09 am
by AJ
I think that if you want to do that, you will need to set up the GDT trick in your boot loader (or why not just set up paging?), then jump to your higher-half kernel.
IIRC, if you try to make a flat binary with bits linked to run at (say) 0x100000 and 0xC0000000, the assembler will attempt to pad the space in between (all 3GB worth) with zeros.
Cheers,
Adam
Posted: Thu Aug 30, 2007 9:44 am
by vhg119
AJ wrote:if you try to make a flat binary with bits linked to run at (say) 0x100000 and 0xC0000000, the assembler will attempt to pad the space in between (all 3GB worth) with zeros.
That sucks
Posted: Thu Aug 30, 2007 9:47 am
by vhg119
I think I'm gonna give up on the higher half kernel for now and just make it a lower half kernel.
I just don't know enough about LD and the linking process and theories yet.
Posted: Thu Aug 30, 2007 5:57 pm
by AndrewAPrice
vhg119 wrote:AJ wrote:if you try to make a flat binary with bits linked to run at (say) 0x100000 and 0xC0000000, the assembler will attempt to pad the space in between (all 3GB worth) with zeros.
That sucks
Use ELF

Posted: Fri Aug 31, 2007 7:28 pm
by frank
AJ wrote:I think that if you want to do that, you will need to set up the GDT trick in your boot loader (or why not just set up paging?), then jump to your higher-half kernel.
IIRC, if you try to make a flat binary with bits linked to run at (say) 0x100000 and 0xC0000000, the assembler will attempt to pad the space in between (all 3GB worth) with zeros.
Cheers,
Adam
Not True. I use a binary kernel. The first 4kb of my kernel is an assembly stub that sets up protected mode and paging and jumps to the rest of the kernel. Here is my linker script:
Code: Select all
ENTRY( kernel_start )
OUTPUT_FORMAT( binary )
SECTIONS
{
. = 0x1000;
.start :
{
*(.start)
. = ALIGN( 4096 );
}
.text 0xD0000000 + SIZEOF( .start ) : AT( ADDR( .start ) + SIZEOF( .start ) )
{
*(.text)
. = ALIGN( 4096 );
}
.data ADDR( .text ) + SIZEOF( .text ) : AT( LOADADDR( .text ) + SIZEOF( .text ) )
{
*(.data)
}
. = ALIGN( 4096 );
.bss ALIGN( ADDR( .data ) + SIZEOF( .data ), 4096 ): AT( LOADADDR( .data ) + SIZEOF( .data ) )
{
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
}
}
As I see it VMA is the location where the code thinks its going to be running at and LMA is the actually location where it will be located. Of course most of the time they are the same. You only really need to worry about the LMA is when you need to influence the placement of the sections in the output, which is useful when you are trying to avoid a 3gb file.
Posted: Sat Sep 01, 2007 11:57 am
by vhg119
ENTRY( kernel_start )
OUTPUT_FORMAT( binary )
SECTIONS
{
. = 0x1000;
.start :
{
*(.start)
. = ALIGN( 4096 );
}
.text 0xD0000000 + SIZEOF( .start ) : AT( ADDR( .start ) + SIZEOF( .start ) )
{
*(.text)
. = ALIGN( 4096 );
}
.data ADDR( .text ) + SIZEOF( .text ) : AT( LOADADDR( .text ) + SIZEOF( .text ) )
{
*(.data)
}
. = ALIGN( 4096 );
.bss ALIGN( ADDR( .data ) + SIZEOF( .data ), 4096 ): AT( LOADADDR( .data ) + SIZEOF( .data ) )
{
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
}
}
Thanks, Frank. I'm still trying to learn more about LD. Does the part I've highlighted above basically say...
Bind address resolutions for this section as if the address began at 0xD0000000 + sizeof(.start), BUT produce the output where the .text section is positioned right after .start... In other words, don't pad the space between .start and .text?