Page directory switching while task switching.

ExeTwezz · Post by **ExeTwezz** » Sat Nov 15, 2014 8:19 am

Hi,

I've written a task switching function that saves the stack pointer of the current task, selects the next task, gets its kernel stack pointer and returns it. The IRQ0 handler, that calls the task switching function, loads this value into ESP, and pops registers. In short, I've a task switching function with only kernel-mode tasks (I don't pay attention to the name yet, but I'd prefer to call it threads).

So, I want my tasks to be also user-mode. I know that I'll need one TSS in my GDT to switch to userspace and know that each task needs to have its own address space (page directory). But the problem is that when I tried to switch a page directory in the task switching function, I got page fault since my kernel is not higher half. Also, I know that I'll need two additional segments: ring3 code and data segments, and that I'll need to change CS and data segments to the userspace ones while task switching.

I think I can just map the kernel not to the high memory, but to the first physical 4 MB or more depending on my kernel size (as I mapped it when I was initializing paging: mapped the first virtual 4 MB to the physical first 4 MB). And, have a user program after the kernel (e.g. at 2 or 4 MB since my kernel is small yet). Is it a normal idea?

Combuster · Post by **Combuster** » Sat Nov 15, 2014 12:33 pm

I got page fault since my kernel is not higher half.

Nonsense. I have paging, but not a higher half kernel. You only got a page fault because you don't have a clue what you're doing.

Get yourself pen and paper, and write down where in physical memory and each virtual address spaces everything has to go.

ExeTwezz · Post by **ExeTwezz** » Sat Nov 15, 2014 12:48 pm

Combuster wrote:
I got page fault since my kernel is not higher half.
Nonsense. I have paging, but not a higher half kernel. You only got a page fault because you don't have a clue what you're doing.

Get yourself pen and paper, and write down where in physical memory and each virtual address spaces everything has to go.

But how did you do this? It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.

Brendan · Post by **Brendan** » Sat Nov 15, 2014 1:18 pm

Hi,

ExeTwezz wrote:It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.

It's logical to assume that the task switching code needs to be mapped into both virtual addresses the same. For most kernels; the entire "kernel space" is mapped into every address space.

ExeTwezz wrote:I think I can just map the kernel not to the high memory, but to the first physical 4 MB or more depending on my kernel size (as I mapped it when I was initializing paging: mapped the first virtual 4 MB to the physical first 4 MB). And, have a user program after the kernel (e.g. at 2 or 4 MB since my kernel is small yet). Is it a normal idea?

It's not abnormal to have the idea. The problem is that if you want to change the amount of space the kernel uses it breaks existing applications.

For example, imagine that you've got 3 kernels:

A 32-bit kernel intended for small/embedded/old systems, that only uses 16 MiB of space (from 0x00000000 to 0x00FFFFFF)
A normal 32-bit kernel intended for desktop/server, that uses 512 MiB of space (from 0x00000000 to 0x1FFFFFFF)
A normal 64-bit kernel intended for desktop/server, that uses 512 GiB of space (from 0x0000000000000000 to 0x0000007FFFFFFFFF)

Now imagine you're compiling a 32-bit application that should run on all 3 kernels. Which address do you compile this application for?

Now imagine that you've got 3 kernels:

A 32-bit kernel intended for small/embedded/old systems, that only uses 16 MiB of space (from 0xFF000000 to 0xFFFFFFFF)
A normal 32-bit kernel intended for desktop/server, that uses 512 MiB of space (from 0xE0000000 to 0xFFFFFFFF)
A normal 64-bit kernel intended for desktop/server, that uses 512 GiB of space (from 0xFFFFFF8000000000 to 0xFFFFFFFFFFFFFFFF

In this case, you can compile the 32-bit application to start at 0x00000000 (or maybe something like 0x00010000 to help catch null pointers). This is no problem at all - the only thing that changes is how much space the application can use for its dynamically allocated heap.

Please note that while your kernel may be a small binary, you will need a lot of space for things like buffers and caches (especially if a VFS cache is involved), and various other data structures. Also note that this is space, not necessarily RAM.

Cheers,

Brendan

ExeTwezz · Post by **ExeTwezz** » Sat Nov 15, 2014 1:28 pm

Brendan wrote:It's not abnormal to have the idea. The problem is that if you want to change the amount of space the kernel uses it breaks existing applications.

Why does it break? Say the programs are in the ELF format, so the kernel may load them always after itself. And, in the ELF executable there are symbols, but not addresses of variables, so you can load it anywhere.

Brendan · Post by **Brendan** » Sat Nov 15, 2014 2:27 pm

Hi,

ExeTwezz wrote:
Brendan wrote:It's not abnormal to have the idea. The problem is that if you want to change the amount of space the kernel uses it breaks existing applications.
Why does it break? Say the programs are in the ELF format, so the kernel may load them always after itself. And, in the ELF executable there are symbols, but not addresses of variables, so you can load it anywhere.

Depending on how it's been compiled; there are "position dependent" ELF files and "position independent" ELF files.

For position dependent ELF files it must be loaded at address it was compiled/linked to run at.

For "position independent" ELF files, you can load it at any address, but you pay for this with overhead (it's slow, especially for 32-bit 80x86 code where there is no "RIP relative" addressing).

There is another alternative, which is to use segmentation and have different GDT/LDT descriptors for each process. This is also slow (every kernel API function that takes a pointer from user space ends up diddling with "user space segments", every switch between CPL=0 and CPL=3 involves slow segment register loads, you can't use SYSCALL/SYSENTER, etc). This will also not work for the 64-bit kernel because you can't use a 64-bit segment base for a 32-bit code or data segment.

Basically, you have 4 choices:

A compatibility disaster, where changing kernel space size breaks applications
Slow, with all applications using position independent code
Slow, with segmentation
Putting the kernel at the higher end of the virtual address space, where there are no disadvantages

Cheers,

Brendan

Nable · Post by **Nable** » Sat Nov 15, 2014 2:38 pm

Brendan wrote:Depending on how it's been compiled; there are "position dependent" ELF files and "position independent" ELF files.

It seems to me that there's one more option: relocatable files. It's not an easy way but I think that it should be mentioned for completeness.

Brendan · Post by **Brendan** » Sat Nov 15, 2014 3:59 pm

Hi,

Nable wrote:
Brendan wrote:Depending on how it's been compiled; there are "position dependent" ELF files and "position independent" ELF files.
It seems to me that there's one more option: relocatable files. It's not an easy way but I think that it should be mentioned for completeness.

You're right.

For relocatable files, you fix up the relocations when the application is loaded. This avoids the overhead of position independence.

However, because you modify the executable you have to load all of the code into "writeable" pages of RAM, then fix up all the relocations, then change the pages to "read-only" and invalidate effected TLB entries. This adds overhead to process startup time. In addition; because you've modified it, you can no longer memory map the executable file's code. This means you can't just load the pieces that are actually used when they're actually needed (to improve process startup time and avoid wasting RAM) and also means that if you're running low on RAM you can't free the pages of code and load them back from the file on disk if/when they're needed again (and would have to write them to swap space before freeing the physical page, and load the data from swap space if the page is needed again).

To minimise these problems; it's probably possible to do "lazy relocation" where the executable's code is memory mapped, and the relocations are fixed in the page fault handler (after the data is loaded from disk but before returning to the code that triggered the page fault). This is messy (the page fault handling in a typical OS is complicated enough without the hassle of handling relocations), and is still not free of additional overhead (which would include the RAM consumed by keeping the information needed to do the relocations somewhere that the page fault handler can access it).

While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.

Cheers,

Brendan

Combuster · Post by **Combuster** » Sat Nov 15, 2014 4:23 pm

ExeTwezz wrote:But how did you do this? It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.

It's a horrible assumption. Intel actually forbids you from changing the currently executing code by means of paging changes. The memory that holds the move to CR3 (or any control register) instruction must be the exact same memory before and after the move. Therefore, in the case of changing page tables the code you use to perform that move must be present in both address spaces.

Brendan wrote:While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.

And then you want to add security using ASLR, and you end up doing this anyway.

ExeTwezz · Post by **ExeTwezz** » Sat Nov 15, 2014 4:42 pm

Hi,

So, OK, I decided to have a higher half kernel. I'm not pretty good at accessing and editing memory in Assembly, but I coded for an hour and got this code:

Code: Select all

; Initialize paging.
init_paging:
    ; Identity map the first 4 MB.
    mov eax, 0                  ; Contains the address of the physical page.
    mov ebx, 0                  ; Needed for the loop.
    .loop1:
        mov ecx, eax
        and ecx, 0xFFFFF000     ; 4MB-align.
        or ecx, 011b            ; End with 011 (supervisor, r/w, present).
        mov dword [0x9A000+ebx], ecx ; Store the entry in the page table.

        cmp ebx, 4096
        jge .loop1_end
        add eax, 4096
        add ebx, 4
        jmp .loop1
    .loop1_end:

    ; Identity map the second 4 MB.
    mov eax, 4194304
    mov ebx, 0                  ; Needed for the loop
    .loop2:
        mov ecx, eax
        and ecx, 0xFFFFF000     ; 4MB-align.
        or ecx, 011b            ; End with 011 (supervisor, r/w, present).
        mov dword [0x9B000+ebx], ecx ; Store the entry in the page table.

        cmp ebx, 4096
        jge .loop2_end
        add eax, 4096
        add ebx, 4
        jmp .loop2
    .loop2_end:

    ; Map the 256th 4 MB (3 GB) to the first 4 MB.
    mov eax, 0                  ; The first 4 MB.
    mov ebx, 0                  ; Needed for the loop.
    .loop3:
        mov ecx, eax
        and ecx, 0xFFFFF000     ; 4MB-align.
        or ecx, 011b            ; End with 011 (supervisor, r/w, present).
        mov dword [0x9C000+ebx], ecx

        cmp ebx, 4096
        jge .loop3_end
        add eax, 4096
        add ebx, 4
        jmp .loop3
    .loop3_end:

    ; Zero the page directory.
    mov eax, 0x90000
    mov ebx, 0
    .loop4:
        mov dword [eax], 0

        cmp ebx, 4096
        jge .loop4_end
        add eax, 4
        add ebx, 4
        jmp .loop4
    .loop4_end:

    ; Fill in the page directory.
    mov eax, 0x90000                ; The address of the page directory.
    mov dword ebx, 0x9A000
    or ebx, 011b
    mov dword [eax], ebx            ; The 1st entry.
    mov dword ebx, 0x9B000
    or ebx, 011b
    mov dword [eax+4], ebx          ; The 2nd entry.
    mov dword ebx, 0x9C000
    or ebx, 011b
    mov dword [eax+3072], ebx       ; The 256th entry.

    ; Load the page directory.
    mov eax, 0x90000
    mov cr3, eax

    ; Enable the paging by setting the 31th bit in CR0.
    mov eax, cr0
    or eax, 0x80000000
    mov cr0, eax

    ; Jump to the higher half.
    call .get_eip
    add eax, 0xC0000000
    jmp eax

    ret

    ; Get the value of the EIP register.
    .get_eip:
        mov eax, [esp]
        ret

As you can see, I want to go to the higher half, but QEMU is rebooting after jumping to the higher half and says that instruction is not available because page is not present. I think it is because I incorrectly setup the page directory, but I couldn't find the error. Can you help me, please?

Update: Found. I've changed 0xC0000000 to 0xC0000004 and everything works well.

Combuster · Post by **Combuster** » Sat Nov 15, 2014 4:46 pm

Get bochs, learn to debug.

Brendan · Post by **Brendan** » Sat Nov 15, 2014 5:21 pm

Hi,

Combuster wrote:
Brendan wrote:While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.
And then you want to add security using ASLR, and you end up doing this anyway.

For 32-bit; ASLR doesn't work. There isn't enough entropy, so the end result of a brute force attack is either multiple attempts causing undefined behaviour and/or crashes followed by a successful attack (if the process restarts); or none or more attempts that cause undefined behaviour (but not a crash) followed by either a successful attack (less likely) or a successful denial of service (if the process doesn't restart).

For 64-bit, there's much more entropy, but the end results are still "completely failing to actually fix any problem".

Cheers,

Brendan

Octocontrabass · Post by **Octocontrabass** » Sat Nov 15, 2014 6:47 pm

Combuster wrote:
ExeTwezz wrote:But how did you do this? It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.
It's a horrible assumption. Intel actually forbids you from changing the currently executing code by means of paging changes. The memory that holds the move to CR3 (or any control register) instruction must be the exact same memory before and after the move. Therefore, in the case of changing page tables the code you use to perform that move must be present in both address spaces.

Intel did allow it at one point, but ExeTwezz's code doesn't meet the old requirements either.

Antti · Post by **Antti** » Sun Nov 16, 2014 12:07 am

Nable wrote:It seems to me that there's one more option: relocatable files.

In general, I would avoid them as much as possible. Usually there is a preferred base address and if it is possible to use that specific address, relocations are not needed. The problem is right here. If relocations are not needed 99 percent of the time, it would make the remaining 1 percent very unstable (untested). If relocations are applied every time, it is slightly better.

OSDev.org

Page directory switching while task switching.

Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.

Re: Page directory switching while task switching.