Page 1 of 2

questions on high half kernel

Posted: Thu May 11, 2006 3:37 am
by asmboozer
in the linker script file of http://www.osdev.org/osfaq2/index.php/HigherHalfWithGdt
1,
OUTPUT_FORMAT("elf32-i386")
ENTRY(start)

SECTIONS
{
. = 0x100000;

.setup :
{
*(.setup)
}

. += 0xC0000000;

.text : AT(ADDR(.text) - 0xC0000000)
{
*(.text)
}

.data ALIGN (4096) : AT(ADDR(.data) - 0xC0000000)
{
*(.data)
*(.rodata*)
}

.bss ALIGN (4096) : AT(ADDR(.bss) - 0xC0000000)
{
*(COMMON*)
*(.bss*)
}
}
when will the eip be 0x c0 10 00 00?

may i subsititute

Code: Select all

.text : AT(0x10 00 00)
with

Code: Select all

.text : AT(ADDR(.text) - 0xC0000000)
what's usage of .text here?
tell the loader loads the kernel at 0x10 00 00?



I think the code is loaded before the paging has been enabled.
thus i guess the code is loaded at 0x10 00 00.





2,why the first pagedir pagedir[0] and pagedir[KERNEL_BASE>>22] should be set? i think only the pagedir indexed by kernel_base_addr >> 22 should be set.


3,
when CS:(E)IP adds the base address in the selector and the eip(offset)? when CS:(E)IP generate the logical address as
CS<< 4 + (E)IP?

I see the documention from intel's , there has two case, one in segmet model , another is real address mode model.

thanks for any suggestion.

Re:questions on high half kernel

Posted: Thu May 11, 2006 6:57 am
by YeXo
1. The linker script is not for the loader, but for the linker ;) It tells the linker to link your kernel to be loaded at that address.

3. I thought it generates the logical address by just adding eip to cs. If you're still in 16 bit mode, it generates the logical address by doing cs<<4+ip

Re:questions on high half kernel

Posted: Thu May 11, 2006 8:37 am
by JAAman
you have to load 2 entries in the page tables because the code is executing at the lower address until you jump to the higher address, therefore both must be paged in (at the same physical address) because the code is run at both addresses (after your jump, you can then remove the lower mapping)

Re:questions on high half kernel

Posted: Thu May 11, 2006 8:55 am
by asmboozer
JAAman wrote: you have to load 2 entries in the page tables because the code is executing at the lower address until you jump to the higher address, therefore both must be paged in (at the same physical address) because the code is run at both addresses (after your jump, you can then remove the lower mapping)

Code: Select all

[BITS 32]      ; 32 bit code
[global start] ; make 'start' function global
[extern kmain] ; our C kernel main

; Multiboot constants
MULTIBOOT_PAGE_ALIGN    equ 1<<0
MULTIBOOT_MEMORY_INFO   equ 1<<1
MULTIBOOT_HEADER_MAGIC  equ 0x1BADB002
MULTIBOOT_HEADER_FLAGS  equ MULTIBOOT_PAGE_ALIGN | MULTIBOOT_MEMORY_INFO
MULTIBOOT_CHECKSUM      equ -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)

; Multiboot header (needed to boot from GRUB)
ALIGN 4
multiboot_header:
        dd MULTIBOOT_HEADER_MAGIC
        dd MULTIBOOT_HEADER_FLAGS
        dd MULTIBOOT_CHECKSUM

; the kernel entry point
start:
        ; here's the trick: we load a GDT with a base address
        ; of 0x40000000 for the code (0x08) and data (0x10) segments
        lgdt [trickgdt]
        mov ax, 0x10
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax
        mov ss, ax

        ; jump to the higher half kernel
        jmp 0x08:higherhalf           
<<<<<<< before paging enabled ,it jumps into higher half address >>>>>>>>

Code: Select all

higherhalf:
        ; from now the CPU will translate automatically every address
        ; by adding the base 0x40000000

        mov esp, sys_stack ; set up a new stack for our kernel

        call kmain ; jump to our C kernel ;)

        ; just a simple protection...
        jmp $

[global gdt_flush] ; make 'gdt_flush' accessible from C code
[extern gp]        ; tells the assembler to look at C code for 'gp'

; this function does the same thing of the 'start' one, this time with
; the real GDT
gdt_flush:
        lgdt [gp]
        mov ax, 0x10
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax
        mov ss, ax
        jmp 0x08:flush2

flush2:
        ret

[section .setup] ; tells the assembler to include this data in the '.setup' section

trickgdt:
        dw gdt_end - gdt - 1 ; size of the GDT
        dd gdt ; linear address of GDT

gdt:
        dd 0, 0                                                 ; null gate
        db 0xFF, 0xFF, 0, 0, 0, 10011010b, 11001111b, 0x40      ; code selector 0x08: base 0x40000000, limit 0xFFFFFFFF, type 0x9A, granularity 0xCF
        db 0xFF, 0xFF, 0, 0, 0, 10010010b, 11001111b, 0x40      ; data selector 0x10: base 0x40000000, limit 0xFFFFFFFF, type 0x92, granularity 0xCF

gdt_end:

[section .bss]

resb 0x1000
sys_stack:
        ; our kernel stack

after,
in the kmain function, it calls init_paging,

Code: Select all

void kmain()
{
        // FIRST enable paging and THEN load the real GDT!
        init_paging();
        gdt_install();

        // We clear the screen and print our welcome message
        cls();
        helloworld();

        // Hang up the computer
        for (;;);
}
so I think the first page directory is not useful.

Re:questions on high half kernel

Posted: Thu May 11, 2006 9:02 am
by asmboozer
YeXo wrote: 1. The linker script is not for the loader, but for the linker ;) It tells the linker to link your kernel to be loaded at that address.
what's the difference between the first . = 0x00 10 00 00 and . += 0x C0 00 00 00?

I could not understand the point of the second statement here.

3. I thought it generates the logical address by just adding eip to cs. If you're still in 16 bit mode, it generates the logical address by doing cs<<4+ip
still not very clear about it. thanks

Re:questions on high half kernel

Posted: Thu May 11, 2006 10:15 am
by YeXo
The first .=0x00100000 makes sure the linker links that code to run at address 1mb (or 0x00100000). The second .=+0xC0000000 makes the linker link the code after there to run at the address before + 3gb. So the meaning of the second .+=0xC0000000 is to let the kernel run at address 3gb+1mb+the length of startup code.

Re:questions on high half kernel

Posted: Thu May 11, 2006 11:02 am
by asmboozer
YeXo wrote: The first .=0x00100000 makes sure the linker links that code to run at address 1mb (or 0x00100000). The second .=+0xC0000000 makes the linker link the code after there to run at the address before + 3gb. So the meaning of the second .+=0xC0000000 is to let the kernel run at address 3gb+1mb+the length of startup code.

may i subsititute

Code: Select all

.text : AT(0x10 00 00)
 


with

Code: Select all

.text : AT(ADDR(.text) - 0xC0000000)
 


what's usage of .text here?
does it tell the loader loads the code section at 0x10 00 00?

if it doest tell the loader loads the text section at 0xc0 00 00 00, before the paging enabled, what will happen if your pc doesn't installed so many amount of memory?

Re:questions on high half kernel

Posted: Thu May 11, 2006 1:20 pm
by bkilgore
It's not really about telling the loader anything. It's the linker. It translates all of the addresses so that the code is suitable to run at the specified address. Basically, the linker script is saying that the code in the startup section is going to run at the 1MB mark, and that the rest of the kernel is going to run at an address above 0xc0000000. Thus the startup code can run before paging is enabled, and the rest of the code can run at a high virtual address.

Re:questions on high half kernel

Posted: Fri May 12, 2006 4:42 am
by asmboozer
bkilgore wrote: It's not really about telling the loader anything. It's the linker. It translates all of the addresses so that the code is suitable to run at the specified address. Basically, the linker script is saying that the code in the startup section is going to run at the 1MB mark, and that the rest of the kernel is going to run at an address above 0xc0000000. Thus the startup code can run before paging is enabled, and the rest of the code can run at a high virtual address.
why the setup section only contains the data structure:
[section .setup] ; tells the assembler to include this data in the '.setup' section

trickgdt:
dw gdt_end - gdt - 1 ; size of the GDT
dd gdt ; linear address of GDT

gdt:
dd 0, 0 ; null gate
db 0xFF, 0xFF, 0, 0, 0, 10011010b, 11001111b, 0x40 ; code selector 0x08: base 0x40000000, limit 0xFFFFFFFF, type 0x9A, granularity 0xCF
db 0xFF, 0xFF, 0, 0, 0, 10010010b, 11001111b, 0x40 ; data selector 0x10: base 0x40000000, limit 0xFFFFFFFF, type 0x92, granularity 0xCF

gdt_end:

[section .bss]
if it is as you said, I think the code

Code: Select all

start:
        ; here's the trick: we load a GDT with a base address
        ; of 0x40000000 for the code (0x08) and data (0x10) segments
        lgdt [trickgdt]
        mov ax, 0x10
        mov ds, ax
        mov es, ax
        mov fs, ax
        mov gs, ax
        mov ss, ax

        ; jump to the higher half kernel
        jmp 0x08:higherhalf
should be in this setup section. not the normal text section.

Re:questions on high half kernel

Posted: Fri May 12, 2006 5:00 am
by Solar
??

Your first code block is in section .setup because you told the assembler so ([tt][section .setup][/tt]). Your second code block is in section .text because that's the default and there's nothing telling the assembler otherwise.

Have you understood that the linker script is not telling the loader anything? The assembler turns your ASM source into object code, in which symbols (like [tt]start[/tt] or [tt]gdt[/tt]) don't have an absolute address yet. The linker joins all the object files you feed to it, and gives the final, absolute addresses to the symbols. What these addresses are is determined by the linker script.

This still has nothing to do with the code that, upon booting, loads the binary into memory. In fact, if the linker didn't link the binary in the way the loader expects, execution will fail because a jump to symbol [tt]start[/tt] will not end up in the intended position.

One more hint: There is a social technique called "mirroring". If someone tells you something you didn't know, repeat what you understood using your own words. This tells the other person whether you understood correctly, and gives the chance to nail down misunderstandings.

Re:questions on high half kernel

Posted: Fri May 12, 2006 6:40 am
by asmboozer
Solar wrote: ??

Your first code block is in section .setup because you told the assembler so ([tt][section .setup][/tt]). Your second code block is in section .text because that's the default and there's nothing telling the assembler otherwise.

Have you understood that the linker script is not telling the loader anything? The assembler turns your ASM source into object code, in which symbols (like [tt]start[/tt] or [tt]gdt[/tt]) don't have an absolute address yet. The linker joins all the object files you feed to it, and gives the final, absolute addresses to the symbols. What these addresses are is determined by the linker script.

This still has nothing to do with the code that, upon booting, loads the binary into memory. In fact, if the linker didn't link the binary in the way the loader expects, execution will fail because a jump to symbol [tt]start[/tt] will not end up in the intended position.

One more hint: There is a social technique called "mirroring". If someone tells you something you didn't know, repeat what you understood using your own words. This tells the other person whether you understood correctly, and gives the chance to nail down misunderstandings.
I haven't read the docs on linker script, so I have tons of question of it.
where could I have the docs on the linker script?

previously I assume the symbol absolute addresses are 0x7c00+offset_of_that_symbol or the 0x10 00 00 + offset_from_0x100000.

if so, i wonder whether the eip begin at 0x100000 or at 0xC0000000 if I dump the registers vaule?

when would the EIP address change to 0xC0000000?




did you mean

Code: Select all

 . += 0xC0000000;

        .text : AT(ADDR(.text) - 0xC0000000)
        {
                *(.text)
        }

the code in the .text in fact begins at ADDR(.text)-0xc0000000? which is same as 0x100000?
but the symbols address would be at 0xC0000000+offset_from_0x100000?

Re:questions on high half kernel

Posted: Fri May 12, 2006 7:21 am
by Solar
asmboozer wrote:
I haven't read the docs on linker script, so I have tons of question of it.
where could I have the docs on the linker script?
Binutils homepage at http://www.gnu.org/software/binutils/ links to the binutils documentation including the ld docs containing a section on linker scripts.

As I was still confused on what you are talking about, I jumped back to your first post. I think I figured out some of the misunderstandings here:
  • "when will the eip be 0x c0 10 00 00" - You are confused about the .text section starting at 0xc0100000 instead of 0xc0000000, right? (I am wondering about this, too.)
  • "may i subsititute..." - I have no idea where you got the "AT(0x10 00 00)" from. You don't have to substitute that with ".text : AT(ADDR(.text) - 0xC0000000)" because the latter is already what is in the original code.
  • "what's usage of .text here?" - are you asking about how to use .text (a question that doesn't make sense to me), or are you asking why .text is used that way (no idea myself, perhaps any of the linker script gurus knowing this)?
  • "when CS:(E)IP adds the base address in the selector and the eip(offset)? when CS:(E)IP generate the logical address as
    CS<< 4 + (E)IP?" - Google for the DOS edition of "The Art of Assembly", which explains this rather nicely. In short, when the CPU starts (at boot) it is in 16bit "Real Mode", which uses CS<<4 + IP. When you switch to 32bit "Protected Mode", this becomes (selector base) + EIP.
I hope this does nail it down a bit.

Re:questions on high half kernel

Posted: Fri May 12, 2006 8:24 am
by JAAman
im surprised no one corrected this misleading statement:
<<<<<<< before paging enabled ,it jumps into higher half address >>>>>>>>
you are at higher half logical but still at lower half linear

you cannot switch halves before paging is enabled -- because paging is where the translation happens

you use an altered GDT to make the addresses appear to be at high addresses (a high logical address, but still a low linear address) you cannot have a high linear address before paging is enabled

then you enable paging -- you are still running in low page, not high page (high logical, but low linear) -- so page in low memory must be mapped

then you change the GDT (reasign it to a 0 base) -- this essentially skips the segmentation, and makes logical==linear
which will require the pages to be valid at the top of linear address space -- only now is the trasition to upper half complete, and the lower half page mapping can be removed


if you are confused about the relationship between logical and linear, check Intel vol3 section 3.1

Re:questions on high half kernel

Posted: Fri May 12, 2006 10:59 am
by asmboozer
first ,sorry for the confusing expression conveyed here.
Solar wrote:
Binutils homepage at http://www.gnu.org/software/binutils/ links to the binutils documentation including the ld docs containing a section on linker scripts.
thanks.
As I was still confused on what you are talking about, I jumped back to your first post. I think I figured out some of the misunderstandings here:
  • "when will the eip be 0x c0 10 00 00" - You are confused about the .text section starting at 0xc0100000 instead of 0xc0000000, right? (I am wondering about this, too.)
no, I think where I would set breakpoint 0xC0000000 or 0x00100000?
or if I set the two breakpoint in bochs at the same time, on which one would it first break?
[*] "may i subsititute..." - I have no idea where you got the "AT(0x10 00 00)" from. You don't have to substitute that with ".text : AT(ADDR(.text) - 0xC0000000)" because the latter is already what is in the original code.
sorry for the wrong usage ,I used,of the word 'substitute',

maybe I want to say, may I substitute AT(ADDR(.text) - 0XC0000000) with AT(0x0010 0000),

I think the .text is loaded at 0x0010 0000 in physical memory.
the logical address 0xC000 0000 after page translate is the same thing as 0x0010 0000.
[*] "what's usage of .text here?" - are you asking about how to use .text (a question that doesn't make sense to me), or are you asking why .text is used that way (no idea myself, perhaps any of the linker script gurus knowing this)?
because in the script, there are one address 0x0010 0000 and the other one 0xc000 0000,

what I understand in your previous post, is the first 0x10 0000 is address of .setup section,
the address of 0xC000 0000 is referenced by other sections.

I think indeed, there are two form address can be used to reference the symbol in the other sections.
1,the logical address started from 0xC000 0000,which would be used when page enabled.
2.the physical address started from 0x0010 0000,since the other sections have to be loaded 0x0010 0000 in fact.



[*] "when CS:(E)IP adds the base address in the selector and the eip(offset)? when CS:(E)IP generate the logical address as
CS<< 4 + (E)IP?" - Google for the DOS edition of "The Art of Assembly", which explains this rather nicely. In short, when the CPU starts (at boot) it is in 16bit "Real Mode", which uses CS<<4 + IP. When you switch to 32bit "Protected Mode", this becomes (selector base) + EIP.
i like the in short the sentence.
[/list]

I hope this does nail it down a bit.

Re:questions on high half kernel

Posted: Fri May 12, 2006 11:11 am
by asmboozer
JAAman wrote: im surprised no one corrected this misleading statement:
<<<<<<< before paging enabled ,it jumps into higher half address >>>>>>>>
you are at higher half logical but still at lower half linear
yes , I understand it.
you cannot switch halves before paging is enabled -- because paging is where the translation happens
if I do not misunderstand your statement.

as i know, the grub doesn't set paging enabled.

and from the source code, I don't know paging is enabled before

Code: Select all

; jump to the higher half kernel
        jmp 0x08:higherhalf
<<<<<<<<<<<<< it jumps here before paging enabled. >>>>>>>>>>>>>>>>>

Code: Select all

higherhalf:
        ; from now the CPU will translate automatically every address
        ; by adding the base 0x40000000

        mov esp, sys_stack ; set up a new stack for our kernel

        call kmain ; jump to our C kernel ;)

it's enabled in the init_paging function called by kmain.
you can read the last line in the init_paging() function. i have quoted it below.

Code: Select all

void init_paging()
{
        // Pointers to the page directory and the page table
        void *kernelpagedirPtr = 0;
        void *lowpagetablePtr = 0;
        int k = 0;

        kernelpagedirPtr = (char *)kernelpagedir + 0x40000000;  // Translate the page directory from
                                                                // virtual address to physical address
        lowpagetablePtr = (char *)lowpagetable + 0x40000000;    // Same for the page table

        // Counts from 0 to 1023 to...
        for (k = 0; k < 1024; k++)
        {
                lowpagetable[k] = (k * 4096) | 0x3;     // ...map the first 4MB of memory into the page table...
                kernelpagedir[k] = 0;                   // ...and clear the page directory entries
        }

        // Fills the addresses 0...4MB and 3072MB...3076MB of the page directory
        // with the same page table

        kernelpagedir[0] = (unsigned long)lowpagetablePtr | 0x3;
        kernelpagedir[768] = (unsigned long)lowpagetablePtr | 0x3;

        // Copies the address of the page directory into the CR3 register and, finally, enables paging!

        asm volatile (  "mov %0, %%eax\n"
                        "mov %%eax, %%cr3\n"
                        "mov %%cr0, %%eax\n"
                        "orl $0x80000000, %%eax\n"
                        "mov %%eax, %%cr0\n" :: "m" (kernelpagedirPtr));
}
you use an altered GDT to make the addresses appear to be at high addresses (a high logical address, but still a low linear address) you cannot have a high linear address before paging is enabled

then you enable paging -- you are still running in low page, not high page (high logical, but low linear) -- so page in low memory must be mapped

then you change the GDT (reasign it to a 0 base) -- this essentially skips the segmentation, and makes logical==linear
which will require the pages to be valid at the top of linear address space -- only now is the trasition to upper half complete, and the lower half page mapping can be removed


if you are confused about the relationship between logical and linear, check Intel vol3 section 3.1