OSDev.org

Posted: **Mon Apr 26, 2021 4:04 am**

Hi everyone,

first some background info:

What i'm trying to do is a boot loader code that can set-up the kernel to either set-up 2mb pages, or 4kb pages, i was thinking of doing it using some nasm macro conditions.
The problem is that i'm not very fluent in writing assembly code (i have written the boot part, can read more or less it, but i'm still full of doubt when i have to add new stuff o start to do more complex things.
I was thinking it could have been a good asm exercise to add that feature, but... (See the point above XD)
The overall idea is that the page size is decided before compiling the os, so defining a flag (or setting a value for it) will decide which page size to use
Currently i'm using 2mb pages (when writing the boot code, i was reading a tutorial, and it was using 2mb pages, it looked quicker to implement, so at that time i was happy with it)
I have to admit that i'm not expecting much from my OS, so probably i could even decide to not implement this feature (but i think could be nice to have)

Ok so here the status:

From what i understood, the PML4 and PDPR tables will remain the same so for them i don't need to change the code.

And technically i think the PD mapping code can be used for the Page table mapping code with few changes handled by some macro conditions

What i need if i want to support 4k pages is:

The function that maps the page dir now will map the page table in the 4kb pages case
And now i need to add the mapping of the ptables into the page dirs

So what i thought to do was something like using an outer cycle in the that count up to X for now (where X is the number of page dirs i want to map, i am thinking to reserve at least 4mb for the kernel so it should be least 2), and do the mapping of the pd_entry and then use the inner cycle (called .map_p2_table that was used for the PD mapping in 2 mb) and fill the page table.

And here start all my doubts and problems

I defined two variables for holding the 2 ptable i want to use:

Code: Select all

pt1_table:
    resb 4096
pt2_table
    resb 4096

But the problem with this approach is that i need to find a way to identify what table to use in the outer cycle, so maybe i was thinking could be better do something like:

Code: Select all

pt_tables:
    resb 9128

And was thinking of doing 2 nested cycles, and use the counter from the outer cycle in the mov instruction in something like that:

Code: Select all

%ifdef SMALL_PAGES
mov ebx, 0
.outer_pd_cycle:
 ;pd loading stuff
%endif

 .inner_pt_cycle:
    ;all the related stuff
    %ifdef SMALL_PAGES
        mov [(pt_table  - KERNEL_VIRTUAL_ADDR) + edx * 0x1000 + ecx * 8], eax
    %elifndef
        mov [(pt_table  - KERNEL_VIRTUAL_ADDR) + ecx * 8], eax
    %endif
    inc ecx,
    cmp ecx, 512
    jne .inner_pt_cycle
%ifdef SMALL_PAGES
 inc ecx
 cmp ebx, 2
 jne .outer_pd_cycle
%endif

But in this case i'm getting an error that says i can't use two index registers (so i suppose this scenario can't be used as well).

The problem is that i need to compute the address and it is composed by two variables, and if i can't use two registers to do it, how should i do it?
Doing something like that should be the right approach?

Code: Select all

mov eax, ebx ; assuming ebx contains the outer cycle counter
mul 0x1000 ; that should do ebx * 0x1000
push eax ; if i'm not wrong the result is in edx:eax 
mov eax, ecx
mul 8 
add eax, pt_tables - KERNEL_VIRTUAL_ADDR ; Is possible that?
pop edx ; popping ebx * 0x1000
add eax, edx

(is just a draft, so there could be errors and as i said i'm not very fluent with asm, so i hope i haven't written any horror XD)

Otherwise what can be the easiest/best way to compute the address? (that so far is my blocker problem).
The actual bootloader code without is here https://github.com/dreamos82/Dreamos64/ ... asm/boot.s (it doesn't have any of the changes i'm trying to implement)

So is my idea bad? It is going to work? it worth the hassle of supporting both, or probably i'm just wasting my time (is nearly a week i'm already trying to figure out a solution, but at least i learned some new stuff about assembly)

One last question, i'm currently just mapping part of the framebuffer (just 2mb) in the bootloader, i'm wondering if i should map the whole framebuffer in the bootloader, or is better to map it piece by piece handling the #PF (i should just compute the size of it before) BTW i was expecting to find the framebuffer address on the grub MMAP data but apparently is missing, is there a reson why?)

Thanks for any help guys!

Posted: **Mon Apr 26, 2021 5:27 am**

finarfin wrote:But in this case i'm getting an error that says i can't use two index registers (so i suppose this scenario can't be used as well).

Because this line is not valid x86 code:

Code: Select all

mov [(pt_table  - KERNEL_VIRTUAL_ADDR) + edx * 0x1000 + ecx * 8], eax

You're only allowed to scale one of the two registers in the memory operand, and the scale factor may only be 1, 2, 4, or 8. You're trying to scale both EDX and ECX, and you're trying to scale EDX by 0x1000.

If your page tables are contiguous in memory, you can just let ECX count up past 512 and get rid of EDX.

finarfin wrote:One last question, i'm currently just mapping part of the framebuffer (just 2mb) in the bootloader, i'm wondering if i should map the whole framebuffer in the bootloader, or is better to map it piece by piece handling the #PF (i should just compute the size of it before)

It'll work the same either way, so choose whatever makes the most sense for your OS.

finarfin wrote:BTW i was expecting to find the framebuffer address on the grub MMAP data but apparently is missing, is there a reson why?

The framebuffer is MMIO. Most MMIO will not be listed in the memory map.

Posted: **Mon Apr 26, 2021 9:10 am**

Octocontrabass wrote:
finarfin wrote:But in this case i'm getting an error that says i can't use two index registers (so i suppose this scenario can't be used as well).
Because this line is not valid x86 code:
Code: Select all
mov [(pt_table  - KERNEL_VIRTUAL_ADDR) + edx * 0x1000 + ecx * 8], eax
You're only allowed to scale one of the two registers in the memory operand, and the scale factor may only be 1, 2, 4, or 8. You're trying to scale both EDX and ECX, and you're trying to scale EDX by 0x1000.

If your page tables are contiguous in memory, you can just let ECX count up past 512 and get rid of EDX.

Yeah that can be an idea, and i thought about it too, but i still then have one problem if i want to have maximum flexibility and mappign page dir items within a loop (in a similar way done for the page table).

To recap the page table should be fine, if i use the approach u suggested, and the mov can be kept as it is:

Code: Select all

mov [(pt_table  - KERNEL_VIRTUAL_ADDR) + ecx * 8], eax

the only difference will be in the counter in ecx (1024 for 2 page tables, instead of 512 for one page dir).
But now i have to map the page dir entries (at least 2), and currently this is done in this way:

Code: Select all

    mov eax, p2_table - KERNEL_VIRTUAL_ADDR
    or eax, PRESENT_BIT | WRITE_BIT
    mov dword[(p3_table_hh - KERNEL_VIRTUAL_ADDR) + 510 * 8], eax

And the problem is that i can't do something like:

Code: Select all

mov eax, p2_table - KERNEL_VIRTUAL_ADDR + ebx * 0x1000 ;that should be the address of the page table if they are contigous

So what i should need in this case? Somethign like:

Code: Select all

mov ebx, 0
.map_pdir
 push ebx,
 mov eax, ebx 
 mul eax, 0x1000
 push eax
 mov eax, p2_tables - KERNEL_VIRTUAL_ADDR
 pop ebx ; Value of counter x 1000
 add eax, ebx ;Address of the page table
 pop ebx ; Should be the counter value
inc ebx
cmp ebx, ITEMS_TO_MAP
jne .map_dir

Or maybe something simpler like;

Code: Select all

mov ebx, 0
mov eax, pt_tables - KERNEL_VIRTUAL_ADDRESS
,map_pdir:
  or eax, PRESENT_BIT | WRITE_BIT
  mov dword[(p3_table_hh - KERNEL_VIRTUAL_ADDR) + ebx * 8], eax
  add eax, 0x1000 * 8
  inc ebx,
  cmp ebx, 2 ; number of items i want to map for the page dir
  jne .map_pdir

then do the map_ptables in a separate loop?

And is the 0x1000 * 8 allowed in assembly ?

(so far it doesn't complain when compiling

)

Otherwise the easiest way is to manually map the first 2/3 items of the page dir manually! XD

-- EDIT --
I saw that a lea instruction exist, that is load effective address, and i can du some math between registers, quiestion is: is that useful for my purpose, or not?

Posted: **Tue Apr 27, 2021 9:45 pm**

finarfin wrote:And is the 0x1000 * 8 allowed in assembly ?

Yes, as long as the assembler can simplify the expression into a valid x86 instruction. There's no problem doing that in an immediate value.

finarfin wrote:Otherwise the easiest way is to manually map the first 2/3 items of the page dir manually!

Yeah, a couple of MOV instructions with compile-time constants would be a lot easier than a loop. There's not much point in writing the loop unless you're expecting to come back later and increase the number of iterations.

finarfin wrote:I saw that a lea instruction exist, that is load effective address, and i can du some math between registers, quiestion is: is that useful for my purpose, or not?

The LEA instruction performs effective address calculation. It's only useful when the math you want to perform can be represented as an x86 effective address. I don't think it will help you with this.

Posted: **Thu Apr 29, 2021 5:01 am**

Thanks for your answer @Octocontrabass

Btw before i got a reply frmo this thread i tried to implement the loop, and hopefully i'm close to a solution.

Now if i try to use the 4kb pages flag, the os is loading more or less correctly but i got guess what? A #PF lol,

Is kind of strainge because apparently what triggers it is that instruction:

Code: Select all

RSDPDescriptor *descriptor = (RSDPDescriptor *)(++tag);

If the instruction is not present the kernel boots correctly (at least apparently XD). But what is more strange is that is not the instruction itself, i mean after it if i do other stuff (that instruction is in a switch statement https://github.com/dreamos82/Dreamos64/ ... main.c#L47 i commented out the two instructions below) like printing the content of the deescriptor, it's address, doing 10/12 prints whatever it still does that, but it mess up things when leaving the switch, and start a new iterataion of the outer loop. With a strange address causing the #PF:

0xFFFFFFFFA063E1E8

But the kernel is ending at: FFFFFFFF8011805C, and before on the last iteration of the loop the tag structure was pointing here: FFFFFFFF80117008

The Error code is 0 so it means a READ of a Non-Present page.

Now the updated boot loader code is:

Code: Select all

mov eax, p2_table - KERNEL_VIRTUAL_ADDR
    or eax, PRESENT_BIT | WRITE_BIT
    mov dword[(p3_table_hh - KERNEL_VIRTUAL_ADDR) + 510 * 8], eax
    %ifdef SMALL_PAGES 
    mov ebx, 0
    mov eax, pt_tables - KERNEL_VIRTUAL_ADDR
    .map_pd_table:
        or eax, PRESENT_BIT | WRITE_BIT
        mov dword[(p2_table - KERNEL_VIRTUAL_ADDR) + ebx * 8], eax
        add eax, 0x1000 ; I'm assuming that the pt_tables are contiguous
        inc ebx
        cmp ebx, 2
        jne .map_pd_table
    %endif
    ; Now let's prepare a loop...
    mov ecx, 0  ; Loop counter

    .map_p2_table:
        mov eax, PAGE_SIZE  ; Size of the page
        mul ecx             ; Multiply by counter
        or eax, PAGE_TABLE_ENTRY ; We set: huge page bit, writable and present 

        ; Moving the computed value into p2_table entry defined by ecx * 8
        ; ecx is the counter, 8 is the size of a single entry
        %ifdef SMALL_PAGES
        mov [(pt_tables - KERNEL_VIRTUAL_ADDR) + ecx * 8], eax
        %elifndef
        mov [(p2_table - KERNEL_VIRTUAL_ADDR) + ecx * 8], eax
        %endif

        inc ecx             ; Let's increase ecx
        cmp ecx, LOOP_LIMIT        ; have we reached 512 ?
                            ; each table is 4k size. Each entry is 8bytes
                            ; that is 512 entries in a table
        
        jne .map_p2_table   ; if ecx < 512 then loop

And the bss section:

Code: Select all

section .bss

align 4096
p4_table: ;PML4
    resb 4096
p3_table: ;PDPR
    resb 4096
p3_table_hh: ;PDPR
    resb 4096 
p2_table: ;PDP
    resb 4096
%ifdef SMALL_PAGES
; if SMALL_PAGES is defined it means we are using 4k pages
; For now the first 8mb will be mapped for the kernel.
pt_tables:
    resb 8192[url][/url]
fdd_pt_tables:
    resb 8192
%endif

With this code the kernel is loaded correctly in the higher half, it does the jump and start it's execution, and i'm sure the exceptions are wokring since the function that handles the #PF called is the kernel one. And if i remove the descriptor line, the kernel boot until the end. So it looks like the new way to map stuff is kind of working. But for some reason it is causing that #PF. I checked the kernel map output from ld and the function containing the offending instruction is mapped at 0x0xffffffff80101580.

Any idea of what can be the issue?
This is the full bootloader code: https://github.com/dreamos82/Dreamos64/ ... asm/boot.s
And this is the main kernel file: https://github.com/dreamos82/Dreamos64/ ... nel/main.c
I have to admit i'm not sure if i will keep this feature or not, and still thinking about moving stuff manually, even because i don't think the kernel will grow much bigger in the short time... (or maybe just quitting this idea and going to support only one page size since i'm spending too much time on it, even if i don't like this idea, admitting failure XD), but at least i want to try to make it work more as an exercise.

Posted: **Fri Apr 30, 2021 3:21 pm**

finarfin wrote:Is kind of strainge because apparently what triggers it is that instruction:
Code: Select all
RSDPDescriptor *descriptor = (RSDPDescriptor *)(++tag);

It's not strange at all. You're modifying the "tag" variable so it no longer points to a boot information tag, which means your loop will calculate the wrong address when it tries to find the next tag. Perhaps try "tag+1" instead of "++tag".

Posted: **Fri Apr 30, 2021 5:52 pm**

Oh my...

i spent days staring at the asm code thinking it was wrong, looking at what i did it wrong, where the error could have been and i wasn't even slightly thinking that it was in the C code, even because i wasn't having any kind of error using 2mb pages (but with 2 mb pages i had mapped 1:1 the first gb of memory, so this obviously was preventing the Page Fault to happen..

Btw now it apparently works (some functions are still causing few page faults, but is just because the addresses used are not mapped with 4k pages), so looks like i did it! YAY

Thanks for the help!!!

OSDev.org

Trying to support both VM Page size (not mixed)

Trying to support both VM Page size (not mixed)

Re: Trying to support both VM Page size (not mixed)

Re: Trying to support both VM Page size (not mixed)

Re: Trying to support both VM Page size (not mixed)

Re: Trying to support both VM Page size (not mixed)

Re: Trying to support both VM Page size (not mixed)

Re: Trying to support both VM Page size (not mixed)