Page directory switching while task switching.
Page directory switching while task switching.
Hi,
I've written a task switching function that saves the stack pointer of the current task, selects the next task, gets its kernel stack pointer and returns it. The IRQ0 handler, that calls the task switching function, loads this value into ESP, and pops registers. In short, I've a task switching function with only kernel-mode tasks (I don't pay attention to the name yet, but I'd prefer to call it threads).
So, I want my tasks to be also user-mode. I know that I'll need one TSS in my GDT to switch to userspace and know that each task needs to have its own address space (page directory). But the problem is that when I tried to switch a page directory in the task switching function, I got page fault since my kernel is not higher half. Also, I know that I'll need two additional segments: ring3 code and data segments, and that I'll need to change CS and data segments to the userspace ones while task switching.
I think I can just map the kernel not to the high memory, but to the first physical 4 MB or more depending on my kernel size (as I mapped it when I was initializing paging: mapped the first virtual 4 MB to the physical first 4 MB). And, have a user program after the kernel (e.g. at 2 or 4 MB since my kernel is small yet). Is it a normal idea?
I've written a task switching function that saves the stack pointer of the current task, selects the next task, gets its kernel stack pointer and returns it. The IRQ0 handler, that calls the task switching function, loads this value into ESP, and pops registers. In short, I've a task switching function with only kernel-mode tasks (I don't pay attention to the name yet, but I'd prefer to call it threads).
So, I want my tasks to be also user-mode. I know that I'll need one TSS in my GDT to switch to userspace and know that each task needs to have its own address space (page directory). But the problem is that when I tried to switch a page directory in the task switching function, I got page fault since my kernel is not higher half. Also, I know that I'll need two additional segments: ring3 code and data segments, and that I'll need to change CS and data segments to the userspace ones while task switching.
I think I can just map the kernel not to the high memory, but to the first physical 4 MB or more depending on my kernel size (as I mapped it when I was initializing paging: mapped the first virtual 4 MB to the physical first 4 MB). And, have a user program after the kernel (e.g. at 2 or 4 MB since my kernel is small yet). Is it a normal idea?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Page directory switching while task switching.
Nonsense. I have paging, but not a higher half kernel. You only got a page fault because you don't have a clue what you're doing.I got page fault since my kernel is not higher half.
Get yourself pen and paper, and write down where in physical memory and each virtual address spaces everything has to go.
Re: Page directory switching while task switching.
But how did you do this? It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.Combuster wrote:Nonsense. I have paging, but not a higher half kernel. You only got a page fault because you don't have a clue what you're doing.I got page fault since my kernel is not higher half.
Get yourself pen and paper, and write down where in physical memory and each virtual address spaces everything has to go.
Re: Page directory switching while task switching.
Hi,
For example, imagine that you've got 3 kernels:
Now imagine that you've got 3 kernels:
Please note that while your kernel may be a small binary, you will need a lot of space for things like buffers and caches (especially if a VFS cache is involved), and various other data structures. Also note that this is space, not necessarily RAM.
Cheers,
Brendan
It's logical to assume that the task switching code needs to be mapped into both virtual addresses the same. For most kernels; the entire "kernel space" is mapped into every address space.ExeTwezz wrote:It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.
It's not abnormal to have the idea. The problem is that if you want to change the amount of space the kernel uses it breaks existing applications.ExeTwezz wrote:I think I can just map the kernel not to the high memory, but to the first physical 4 MB or more depending on my kernel size (as I mapped it when I was initializing paging: mapped the first virtual 4 MB to the physical first 4 MB). And, have a user program after the kernel (e.g. at 2 or 4 MB since my kernel is small yet). Is it a normal idea?
For example, imagine that you've got 3 kernels:
- A 32-bit kernel intended for small/embedded/old systems, that only uses 16 MiB of space (from 0x00000000 to 0x00FFFFFF)
- A normal 32-bit kernel intended for desktop/server, that uses 512 MiB of space (from 0x00000000 to 0x1FFFFFFF)
- A normal 64-bit kernel intended for desktop/server, that uses 512 GiB of space (from 0x0000000000000000 to 0x0000007FFFFFFFFF)
Now imagine that you've got 3 kernels:
- A 32-bit kernel intended for small/embedded/old systems, that only uses 16 MiB of space (from 0xFF000000 to 0xFFFFFFFF)
- A normal 32-bit kernel intended for desktop/server, that uses 512 MiB of space (from 0xE0000000 to 0xFFFFFFFF)
- A normal 64-bit kernel intended for desktop/server, that uses 512 GiB of space (from 0xFFFFFF8000000000 to 0xFFFFFFFFFFFFFFFF
Please note that while your kernel may be a small binary, you will need a lot of space for things like buffers and caches (especially if a VFS cache is involved), and various other data structures. Also note that this is space, not necessarily RAM.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Page directory switching while task switching.
Why does it break? Say the programs are in the ELF format, so the kernel may load them always after itself. And, in the ELF executable there are symbols, but not addresses of variables, so you can load it anywhere.Brendan wrote:It's not abnormal to have the idea. The problem is that if you want to change the amount of space the kernel uses it breaks existing applications.
Re: Page directory switching while task switching.
Hi,
For position dependent ELF files it must be loaded at address it was compiled/linked to run at.
For "position independent" ELF files, you can load it at any address, but you pay for this with overhead (it's slow, especially for 32-bit 80x86 code where there is no "RIP relative" addressing).
There is another alternative, which is to use segmentation and have different GDT/LDT descriptors for each process. This is also slow (every kernel API function that takes a pointer from user space ends up diddling with "user space segments", every switch between CPL=0 and CPL=3 involves slow segment register loads, you can't use SYSCALL/SYSENTER, etc). This will also not work for the 64-bit kernel because you can't use a 64-bit segment base for a 32-bit code or data segment.
Basically, you have 4 choices:
Cheers,
Brendan
Depending on how it's been compiled; there are "position dependent" ELF files and "position independent" ELF files.ExeTwezz wrote:Why does it break? Say the programs are in the ELF format, so the kernel may load them always after itself. And, in the ELF executable there are symbols, but not addresses of variables, so you can load it anywhere.Brendan wrote:It's not abnormal to have the idea. The problem is that if you want to change the amount of space the kernel uses it breaks existing applications.
For position dependent ELF files it must be loaded at address it was compiled/linked to run at.
For "position independent" ELF files, you can load it at any address, but you pay for this with overhead (it's slow, especially for 32-bit 80x86 code where there is no "RIP relative" addressing).
There is another alternative, which is to use segmentation and have different GDT/LDT descriptors for each process. This is also slow (every kernel API function that takes a pointer from user space ends up diddling with "user space segments", every switch between CPL=0 and CPL=3 involves slow segment register loads, you can't use SYSCALL/SYSENTER, etc). This will also not work for the 64-bit kernel because you can't use a 64-bit segment base for a 32-bit code or data segment.
Basically, you have 4 choices:
- A compatibility disaster, where changing kernel space size breaks applications
- Slow, with all applications using position independent code
- Slow, with segmentation
- Putting the kernel at the higher end of the virtual address space, where there are no disadvantages
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Page directory switching while task switching.
It seems to me that there's one more option: relocatable files. It's not an easy way but I think that it should be mentioned for completeness.Brendan wrote:Depending on how it's been compiled; there are "position dependent" ELF files and "position independent" ELF files.
Re: Page directory switching while task switching.
Hi,
For relocatable files, you fix up the relocations when the application is loaded. This avoids the overhead of position independence.
However, because you modify the executable you have to load all of the code into "writeable" pages of RAM, then fix up all the relocations, then change the pages to "read-only" and invalidate effected TLB entries. This adds overhead to process startup time. In addition; because you've modified it, you can no longer memory map the executable file's code. This means you can't just load the pieces that are actually used when they're actually needed (to improve process startup time and avoid wasting RAM) and also means that if you're running low on RAM you can't free the pages of code and load them back from the file on disk if/when they're needed again (and would have to write them to swap space before freeing the physical page, and load the data from swap space if the page is needed again).
To minimise these problems; it's probably possible to do "lazy relocation" where the executable's code is memory mapped, and the relocations are fixed in the page fault handler (after the data is loaded from disk but before returning to the code that triggered the page fault). This is messy (the page fault handling in a typical OS is complicated enough without the hassle of handling relocations), and is still not free of additional overhead (which would include the RAM consumed by keeping the information needed to do the relocations somewhere that the page fault handler can access it).
While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.
Cheers,
Brendan
You're right.Nable wrote:It seems to me that there's one more option: relocatable files. It's not an easy way but I think that it should be mentioned for completeness.Brendan wrote:Depending on how it's been compiled; there are "position dependent" ELF files and "position independent" ELF files.
For relocatable files, you fix up the relocations when the application is loaded. This avoids the overhead of position independence.
However, because you modify the executable you have to load all of the code into "writeable" pages of RAM, then fix up all the relocations, then change the pages to "read-only" and invalidate effected TLB entries. This adds overhead to process startup time. In addition; because you've modified it, you can no longer memory map the executable file's code. This means you can't just load the pieces that are actually used when they're actually needed (to improve process startup time and avoid wasting RAM) and also means that if you're running low on RAM you can't free the pages of code and load them back from the file on disk if/when they're needed again (and would have to write them to swap space before freeing the physical page, and load the data from swap space if the page is needed again).
To minimise these problems; it's probably possible to do "lazy relocation" where the executable's code is memory mapped, and the relocations are fixed in the page fault handler (after the data is loaded from disk but before returning to the code that triggered the page fault). This is messy (the page fault handling in a typical OS is complicated enough without the hassle of handling relocations), and is still not free of additional overhead (which would include the RAM consumed by keeping the information needed to do the relocations somewhere that the page fault handler can access it).
While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Page directory switching while task switching.
It's a horrible assumption. Intel actually forbids you from changing the currently executing code by means of paging changes. The memory that holds the move to CR3 (or any control register) instruction must be the exact same memory before and after the move. Therefore, in the case of changing page tables the code you use to perform that move must be present in both address spaces.ExeTwezz wrote:But how did you do this? It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.
And then you want to add security using ASLR, and you end up doing this anyway.Brendan wrote:While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.
Re: Page directory switching while task switching.
Hi,
So, OK, I decided to have a higher half kernel. I'm not pretty good at accessing and editing memory in Assembly, but I coded for an hour and got this code:
As you can see, I want to go to the higher half, but QEMU is rebooting after jumping to the higher half and says that instruction is not available because page is not present. I think it is because I incorrectly setup the page directory, but I couldn't find the error. Can you help me, please?
Update: Found. I've changed 0xC0000000 to 0xC0000004 and everything works well.
So, OK, I decided to have a higher half kernel. I'm not pretty good at accessing and editing memory in Assembly, but I coded for an hour and got this code:
Code: Select all
; Initialize paging.
init_paging:
; Identity map the first 4 MB.
mov eax, 0 ; Contains the address of the physical page.
mov ebx, 0 ; Needed for the loop.
.loop1:
mov ecx, eax
and ecx, 0xFFFFF000 ; 4MB-align.
or ecx, 011b ; End with 011 (supervisor, r/w, present).
mov dword [0x9A000+ebx], ecx ; Store the entry in the page table.
cmp ebx, 4096
jge .loop1_end
add eax, 4096
add ebx, 4
jmp .loop1
.loop1_end:
; Identity map the second 4 MB.
mov eax, 4194304
mov ebx, 0 ; Needed for the loop
.loop2:
mov ecx, eax
and ecx, 0xFFFFF000 ; 4MB-align.
or ecx, 011b ; End with 011 (supervisor, r/w, present).
mov dword [0x9B000+ebx], ecx ; Store the entry in the page table.
cmp ebx, 4096
jge .loop2_end
add eax, 4096
add ebx, 4
jmp .loop2
.loop2_end:
; Map the 256th 4 MB (3 GB) to the first 4 MB.
mov eax, 0 ; The first 4 MB.
mov ebx, 0 ; Needed for the loop.
.loop3:
mov ecx, eax
and ecx, 0xFFFFF000 ; 4MB-align.
or ecx, 011b ; End with 011 (supervisor, r/w, present).
mov dword [0x9C000+ebx], ecx
cmp ebx, 4096
jge .loop3_end
add eax, 4096
add ebx, 4
jmp .loop3
.loop3_end:
; Zero the page directory.
mov eax, 0x90000
mov ebx, 0
.loop4:
mov dword [eax], 0
cmp ebx, 4096
jge .loop4_end
add eax, 4
add ebx, 4
jmp .loop4
.loop4_end:
; Fill in the page directory.
mov eax, 0x90000 ; The address of the page directory.
mov dword ebx, 0x9A000
or ebx, 011b
mov dword [eax], ebx ; The 1st entry.
mov dword ebx, 0x9B000
or ebx, 011b
mov dword [eax+4], ebx ; The 2nd entry.
mov dword ebx, 0x9C000
or ebx, 011b
mov dword [eax+3072], ebx ; The 256th entry.
; Load the page directory.
mov eax, 0x90000
mov cr3, eax
; Enable the paging by setting the 31th bit in CR0.
mov eax, cr0
or eax, 0x80000000
mov cr0, eax
; Jump to the higher half.
call .get_eip
add eax, 0xC0000000
jmp eax
ret
; Get the value of the EIP register.
.get_eip:
mov eax, [esp]
ret
Update: Found. I've changed 0xC0000000 to 0xC0000004 and everything works well.
Last edited by ExeTwezz on Sat Nov 15, 2014 4:47 pm, edited 1 time in total.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Page directory switching while task switching.
Get bochs, learn to debug.
Re: Page directory switching while task switching.
Hi,
For 64-bit, there's much more entropy, but the end results are still "completely failing to actually fix any problem".
Cheers,
Brendan
For 32-bit; ASLR doesn't work. There isn't enough entropy, so the end result of a brute force attack is either multiple attempts causing undefined behaviour and/or crashes followed by a successful attack (if the process restarts); or none or more attempts that cause undefined behaviour (but not a crash) followed by either a successful attack (less likely) or a successful denial of service (if the process doesn't restart).Combuster wrote:And then you want to add security using ASLR, and you end up doing this anyway.Brendan wrote:While this might be the "least slow" option; it's also over-complicated, and it's far simpler (and still faster) to just put the kernel where it should be to begin with.
For 64-bit, there's much more entropy, but the end results are still "completely failing to actually fix any problem".
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 5590
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Page directory switching while task switching.
Intel did allow it at one point, but ExeTwezz's code doesn't meet the old requirements either.Combuster wrote:It's a horrible assumption. Intel actually forbids you from changing the currently executing code by means of paging changes. The memory that holds the move to CR3 (or any control register) instruction must be the exact same memory before and after the move. Therefore, in the case of changing page tables the code you use to perform that move must be present in both address spaces.ExeTwezz wrote:But how did you do this? It is logical to assume, that when you'll switch to a task page directory, you'll not have task switching code, but you may have another code or data or not-mapped page at the EIP position.
Re: Page directory switching while task switching.
In general, I would avoid them as much as possible. Usually there is a preferred base address and if it is possible to use that specific address, relocations are not needed. The problem is right here. If relocations are not needed 99 percent of the time, it would make the remaining 1 percent very unstable (untested). If relocations are applied every time, it is slightly better.Nable wrote:It seems to me that there's one more option: relocatable files.