Adding 64-bit support to RDOS
Adding 64-bit support to RDOS
I think I'll change to a different thread now that PAE paging is done.
So, the first few steps have already been taken:
1. Physical memory allocation uses 64-bits
2. A new physical memory interface that can keep large amounts of physical memory below 4G linear address space based on bitmaps is implemented
3. PAE paging can be used instead of 32-bit paging.
4. A method to make syscalls from applications based on the SYSENTER interface rather than call-gates is implemented
The next step has to do with the SYSENTER interface. Even if support for this interface is added, it also requires device-driver support, and currently only the APIC module supports SYSENTER. In order for device-drivers to work with the SYSENTER interface, they are not allowed to reference the stack as a 16-bit stack, and some device-drivers currently do that to save things on the stack. The SYSENTER interface uses a flat kernel stack in order to not have to reload SS.
After that is done, the fun begins. I need to chose a proper assembler that supports 64-bit, and that can handle my include-files without trouble (I suspect NASM faills here). Then I need to write default exception handlers that dumps register contents to screen and halts the system. The crash debugger needs to become 64-bit aware so it can be invoked from long mode and display the register contents in the proper formats. It must be able to handle some core being in long mode and some in protected mode.
So, the first few steps have already been taken:
1. Physical memory allocation uses 64-bits
2. A new physical memory interface that can keep large amounts of physical memory below 4G linear address space based on bitmaps is implemented
3. PAE paging can be used instead of 32-bit paging.
4. A method to make syscalls from applications based on the SYSENTER interface rather than call-gates is implemented
The next step has to do with the SYSENTER interface. Even if support for this interface is added, it also requires device-driver support, and currently only the APIC module supports SYSENTER. In order for device-drivers to work with the SYSENTER interface, they are not allowed to reference the stack as a 16-bit stack, and some device-drivers currently do that to save things on the stack. The SYSENTER interface uses a flat kernel stack in order to not have to reload SS.
After that is done, the fun begins. I need to chose a proper assembler that supports 64-bit, and that can handle my include-files without trouble (I suspect NASM faills here). Then I need to write default exception handlers that dumps register contents to screen and halts the system. The crash debugger needs to become 64-bit aware so it can be invoked from long mode and display the register contents in the proper formats. It must be able to handle some core being in long mode and some in protected mode.
Re: Adding 64-bit support to RDOS
Hi,
Cheers,
Brendan
For 64-bit code, SYSENTER won't work on AMD CPUs and you have to use SYSCALL. Also note that if a 64-bit process uses SYSCALL (or SYSENTER on an Intel CPU) the application's stack is likely to be above 4 GiB.rdos wrote:4. A method to make syscalls from applications based on the SYSENTER interface rather than call-gates is implemented
NASM has supported 64-bit code for a long time now (since version 2.0 was released in 2007 I think). YASM is compatible with NASM (same syntax, preprocessor, etc) and has supported 64-bit for a little longer than NASM.rdos wrote:I need to chose a proper assembler that supports 64-bit, and that can handle my include-files without trouble (I suspect NASM faills here).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Adding 64-bit support to RDOS
Yes, I know, but that should not be a problem. 64-bit code would use SYSCALL, and then end up in a long mode handler. The long mode handler will call the same protected mode procedure as the SYSENTER handler (a retf out of long mode), which is why I need to port all APIs to SYSENTER. The position of the call stack above 4G makes no difference as syscalls don't use callframes, rather are register-based. But pointers must be converted so they reside below 4G which would be done with paging. I need to add pointer-translations to the syscall definitions as well, and might just as well do this at the same time as I add support for SYSENTER.Brendan wrote:Hi,
For 64-bit code, SYSENTER won't work on AMD CPUs and you have to use SYSCALL. Also note that if a 64-bit process uses SYSCALL (or SYSENTER on an Intel CPU) the application's stack is likely to be above 4 GiB.rdos wrote:4. A method to make syscalls from applications based on the SYSENTER interface rather than call-gates is implemented
Maybe it would be better to do these modifications later, and first see how the SYSCALL procedure would look like, and make sure it works properly?
Re: Adding 64-bit support to RDOS
SYSCALL works very similar with SYSENTER, with the hidden trouble for kernel stack handling for nested interrupt, etc; that we talked about a few months ago.
Re: Adding 64-bit support to RDOS
Hi,
Next I'd implement an (almost empty) "64-bit kernel stub" and map that into all 64-bit virtual address spaces when they're created.
After that I'd provide some way for the 32-bit kernel to switch to long mode and pass control to the "64-bit kernel stub". The "64-bit kernel stub" (running at CPL=0) would pass control to the 64-bit executable (running at CPL=3).
The next piece would be adding a 64-bit IDT and "dummy" interrupt handlers; so that the 64-bit IRQ handlers switch back to protected mode and cause the normal IRQ handlers to be executed and then switch back to long mode. For some of these I'd be tempted to do native 64-bit interrupt handlers (e.g. the page fault handler because 32-bit code isn't going to handle a 64-bit CR2 or long mode paging tables; whatever interrupt you use for the "multi-CPU TLB shootdown" IPI; etc).
I wouldn't worry about supporting SYSCALL (for 64-bit applications) until after all of the above is done. Once SYSCALL is working with a "do nothing" pretend kernel function; I'd start adding support for "switch to protected mode and call the legacy/32-bit kernel API function and then switch back to long mode" (which would be sort of similar to the interrupt handling). Of course there will be kernel API functions that don't make any sense as 32-bit code. For example, half the virtual memory management. For all of these I'd handle the kernel API function with native 64-bit code in the "64-bit kernel stub".
Once all of that is working; I'd start optimising it by shifting more code from the legacy/32-bit kernel into native 64-bit code in the "64-bit kernel stub"; to avoid the overhead of constantly switching between long mode and protected mode (and completely screwing up all TLB entries in both directions). This would eventually include all interrupt handlers, all kernel API functions, etc.
Once enough has been shifted to 64-bit to allow 64-bit applications to run without any silly switching between long mode and protected mode, I'd port drivers to 64-bit. Then I'd create a completely separate/different "stripped down" version of the OS that only supports 64-bit applications and 64-bit drivers (e.g. the "64-bit stub" with no legacy/protected mode kernel at all).
Finally, I'd start adding support for 32-bit "flat" applications to the "64-bit stripped down version of the OS"; and remove the (now completely unnecessary) support for 64-bit applications from the legacy/protected mode kernel.
The end result would be a completely rewritten OS that is "good" (that supports 64-bit applications and drivers, and 32-bit "flat" applications and 32-bit "flat" drivers); and a completely separate OS that is the same as the what you have now.
However; someone once said that the shortest path between 2 points is a straight line; and I have a feeling that there might be a much faster/easier way to get to the same "2 completely different versions of the OS" end result.
Cheers,
Brendan
I'd start by writing a dummy 64-bit executable (e.g. an almost empty piece of 64-bit code that does nothing more than "jmp $"). You've probably already got some sort of kernel code to start an executable. I'd extend that "start an executable" code so that it detects if the executable is 64-bit or not; and if the executable is 64-bit it would construct a 64-bit virtual address space for it.rdos wrote:Maybe it would be better to do these modifications later, and first see how the SYSCALL procedure would look like, and make sure it works properly?
Next I'd implement an (almost empty) "64-bit kernel stub" and map that into all 64-bit virtual address spaces when they're created.
After that I'd provide some way for the 32-bit kernel to switch to long mode and pass control to the "64-bit kernel stub". The "64-bit kernel stub" (running at CPL=0) would pass control to the 64-bit executable (running at CPL=3).
The next piece would be adding a 64-bit IDT and "dummy" interrupt handlers; so that the 64-bit IRQ handlers switch back to protected mode and cause the normal IRQ handlers to be executed and then switch back to long mode. For some of these I'd be tempted to do native 64-bit interrupt handlers (e.g. the page fault handler because 32-bit code isn't going to handle a 64-bit CR2 or long mode paging tables; whatever interrupt you use for the "multi-CPU TLB shootdown" IPI; etc).
I wouldn't worry about supporting SYSCALL (for 64-bit applications) until after all of the above is done. Once SYSCALL is working with a "do nothing" pretend kernel function; I'd start adding support for "switch to protected mode and call the legacy/32-bit kernel API function and then switch back to long mode" (which would be sort of similar to the interrupt handling). Of course there will be kernel API functions that don't make any sense as 32-bit code. For example, half the virtual memory management. For all of these I'd handle the kernel API function with native 64-bit code in the "64-bit kernel stub".
Once all of that is working; I'd start optimising it by shifting more code from the legacy/32-bit kernel into native 64-bit code in the "64-bit kernel stub"; to avoid the overhead of constantly switching between long mode and protected mode (and completely screwing up all TLB entries in both directions). This would eventually include all interrupt handlers, all kernel API functions, etc.
Once enough has been shifted to 64-bit to allow 64-bit applications to run without any silly switching between long mode and protected mode, I'd port drivers to 64-bit. Then I'd create a completely separate/different "stripped down" version of the OS that only supports 64-bit applications and 64-bit drivers (e.g. the "64-bit stub" with no legacy/protected mode kernel at all).
Finally, I'd start adding support for 32-bit "flat" applications to the "64-bit stripped down version of the OS"; and remove the (now completely unnecessary) support for 64-bit applications from the legacy/protected mode kernel.
The end result would be a completely rewritten OS that is "good" (that supports 64-bit applications and drivers, and 32-bit "flat" applications and 32-bit "flat" drivers); and a completely separate OS that is the same as the what you have now.
However; someone once said that the shortest path between 2 points is a straight line; and I have a feeling that there might be a much faster/easier way to get to the same "2 completely different versions of the OS" end result.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Adding 64-bit support to RDOS
That seems pretty backwards from my point of view. You cannot do anything with your empty 64-bit executable without support for 64-bit since we start with an 32-bit OS. The 32-bit OS cannot create the 64-bit environment (in fact the 64-bit loader must reside in a long mode driver).Brendan wrote:Hi,
I'd start by writing a dummy 64-bit executable (e.g. an almost empty piece of 64-bit code that does nothing more than "jmp $"). You've probably already got some sort of kernel code to start an executable. I'd extend that "start an executable" code so that it detects if the executable is 64-bit or not; and if the executable is 64-bit it would construct a 64-bit virtual address space for it.rdos wrote:Maybe it would be better to do these modifications later, and first see how the SYSCALL procedure would look like, and make sure it works properly?
Next I'd implement an (almost empty) "64-bit kernel stub" and map that into all 64-bit virtual address spaces when they're created.
After that I'd provide some way for the 32-bit kernel to switch to long mode and pass control to the "64-bit kernel stub". The "64-bit kernel stub" (running at CPL=0) would pass control to the 64-bit executable (running at CPL=3).
The next piece would be adding a 64-bit IDT and "dummy" interrupt handlers; so that the 64-bit IRQ handlers switch back to protected mode and cause the normal IRQ handlers to be executed and then switch back to long mode. For some of these I'd be tempted to do native 64-bit interrupt handlers (e.g. the page fault handler because 32-bit code isn't going to handle a 64-bit CR2 or long mode paging tables; whatever interrupt you use for the "multi-CPU TLB shootdown" IPI; etc).
Thus, the first step must be to be able to run ordinary 32-bit applications in long mode using IA32e. This in turn require functional 64-bit exception handlers and IRQ handlers.
I think I'll start testing if NASM can generate an ordinary 32-bit device-driver. This device-driver can then start a kernel thread, which could make the switch to long mode and setup the default exception handlers and stuff, tear down the environment and return.
Re: Adding 64-bit support to RDOS
Some troubles with NASM, but I finally got my 32-bit driver using NASM binary output to work. First, it was tricky to fool NASM to start with offset 0 after the header (could be achieved with org 0xFFFFFFEE). Second, it appears that NASM cannot handle constants defined with assignment (=) operator, so I had to generate new header-files with gate-numbers for NASM. However, after that, it works pretty well.
Next, I'll try to switch to long mode and back to protected mode without generating tripple faults.
There is an additional complication involved. In order to switch between protected mode and long mode (and the reverse), I need to go through a stage with paging disabled. That means the switch must be made in a unity-mapped section of code, preferently at the bottom of physical memory. I think reserving 16 pages at the lower end of physical memory during boot-up, and then copying the NASM-based device-driver there unity-mapped could achieve the goal.
I'll also need to create a new kernel process, not a kernel thread, so I can manipulate the paging environment without affecting other kernel-mode threads.
Edit: I can now turn off paging and enable long mode, but the processor will tripple fault when paging is turned on for long mode. I changed CR3 by adding another level to the page translation scheme from PAE-paging, but this doesn't seem to work.
Test code: (when the inner enable / disable paging is removed, the code works)
Next, I'll try to switch to long mode and back to protected mode without generating tripple faults.
There is an additional complication involved. In order to switch between protected mode and long mode (and the reverse), I need to go through a stage with paging disabled. That means the switch must be made in a unity-mapped section of code, preferently at the bottom of physical memory. I think reserving 16 pages at the lower end of physical memory during boot-up, and then copying the NASM-based device-driver there unity-mapped could achieve the goal.
I'll also need to create a new kernel process, not a kernel thread, so I can manipulate the paging environment without affecting other kernel-mode threads.
Edit: I can now turn off paging and enable long mode, but the processor will tripple fault when paging is turned on for long mode. I changed CR3 by adding another level to the page translation scheme from PAE-paging, but this doesn't seem to work.
Test code: (when the inner enable / disable paging is removed, the code works)
Code: Select all
mov ax,flat_sel
mov es,ax
mov edi,12000h
mov eax,cr3
or ax,3
stosd
xor eax,eax
mov ecx,1023
rep stosd
;
mov ebp,cr3
cli
mov eax,cr0
and eax,7FFFFFFFh
mov cr0,eax
;
mov ecx,IA32_EFER
rdmsr
or eax,0x100
wrmsr
;
mov eax,12000h
mov cr3,eax
;
mov eax,cr0
or eax,80000000h
mov cr0,eax
;
mov edx,12345h
mov ecx,98765h
;
mov eax,cr0
and eax,7FFFFFFFh
mov cr0,eax
;
mov ecx,IA32_EFER
rdmsr
and eax,0xFFFFFEFF
wrmsr
;
mov cr3,ebp
;
mov eax,cr0
or eax,80000000h
mov cr0,eax
sti
int 3
Re: Adding 64-bit support to RDOS
Now I know why the code above fails. It is not the switch to long mode that fails (it works perfectly well), but instead the problem is when reentering protected mode in PAE-mode. The definitions of the page table ptr entries differs somewhat between IA32e and PAE, especially bit 5 is the accessed bit in IA32e and is reserved and must be 0 in PAE, which means that loading CR3 will trigger a protection fault as the lowest page table ptr is always accessed. This is not so interesting for practical purposes as there will be no switches to / from PAE using the same CR3, but in the test there is a need to clear the accessed bits in the page table ptr entries before reenabling PAE paging.
- Griwes
- Member
- Posts: 374
- Joined: Sat Jul 30, 2011 10:07 am
- Libera.chat IRC: Griwes
- Location: Wrocław/Racibórz, Poland
- Contact:
Re: Adding 64-bit support to RDOS
That's why you generally determine paging mode at boot and stick to it later... oh, I forgot RDOS is not a "general" OS, sorry.
Reaver Project :: Repository :: Ohloh project page
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
Re: Adding 64-bit support to RDOS
There are some oddities in the non-paged mode that I cannot understand (at least on my dual core AMD). It returns strange things when reading unity-mapped memory. Because of this I changed the code so it uses two page table ptrs (the IA32e is created by copying the PAE version, and setting the lowest 3 bits). After that change, everything works very well, and I've now entered and left IA32e mode without any faults.
New code:
New code:
Code: Select all
;
mov edx,2000h
xor ebx,ebx
mov eax,cr3
or al,67h
OsGate set_page_entry ; map PAE CR3 to linear address 2000h
;
mov ax,flat_sel
mov ds,ax
mov es,ax
;
mov esi,2000h
mov edi,11000h
mov ecx,400h
rep movsd ; copy page table ptr to to IA32e version at linear and physical address 11000h
;
mov edi,11000h
mov al,7
stosb ; patch to rd/wr and user mode
;
add edi,7
stosb
;
add edi,7
stosb
;
add edi,7
stosb
;
mov edi,12000h ; create IA32e CR3 block
mov eax,11007h
stosd
xor eax,eax
mov ecx,1023
rep stosd
;
mov edi,cr3
;
cli
mov eax,cr0
and eax,7FFFFFFFh
mov cr0,eax
;
mov ecx,IA32_EFER
rdmsr
or eax,0x100
wrmsr
;
mov eax,12000h
mov cr3,eax
;
mov eax,cr0
or eax,80000000h
mov cr0,eax
jmp longmode_code_sel:flush1
flush1:
mov eax,2000h
mov ebx,[eax]
mov ebp,[eax+8]
;
mov eax,cr0
and eax,7FFFFFFFh
mov cr0,eax
;
mov ecx,IA32_EFER
rdmsr
and eax,0xFFFFFEFF
wrmsr
;
mov cr3,edi
;
mov eax,cr0
or eax,80000000h
mov cr0,eax
sti
int 3
Re: Adding 64-bit support to RDOS
Some further thoughts.
I think I'll add a new mode to CreateProcess, which currently supports protected mode or V86 mode. The new mode would be long mode, and it will create a new CR3 for long mode, and set some flag (long mode flag) in the thread-control block that indicates the process should run in IA32e mode rather than in protected mode. This flag would be inherited by threads created in the process.
The scheduler will then need to be updated so it can switch mode when it reloads CR3 when switching between processes. It will do an xor between for the long mode flag between the current thread and the new thread, and if the result is 0 it can just reload CR3. Otherwise, it will either call the long-mode driver and make it switch from long mode to protected mode (and reload CR3 at the same time), or the reverse. After that is done, it can just do the ordinary thing.
I think I'll add a new mode to CreateProcess, which currently supports protected mode or V86 mode. The new mode would be long mode, and it will create a new CR3 for long mode, and set some flag (long mode flag) in the thread-control block that indicates the process should run in IA32e mode rather than in protected mode. This flag would be inherited by threads created in the process.
The scheduler will then need to be updated so it can switch mode when it reloads CR3 when switching between processes. It will do an xor between for the long mode flag between the current thread and the new thread, and if the result is 0 it can just reload CR3. Otherwise, it will either call the long-mode driver and make it switch from long mode to protected mode (and reload CR3 at the same time), or the reverse. After that is done, it can just do the ordinary thing.
Re: Adding 64-bit support to RDOS
Some more progress. Now I could write "11" at top of the screen from 64-bit mode.
Unfortunately, I'm no longer able to return to compability-mode, as I cannot find a way to do that. Although, I think it is better to implement the exception handlers, letting them write register contents first, and then I'll know why the code fails to return to compatibility-mode.
Code: Select all
bits 32
jmp long_kernel_code_sel:test64
bits 64
test64:
mov rbx,0xB8000
mov eax,0x7310731
mov [rbx],eax
stopl:
jmp stopl
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Adding 64-bit support to RDOS
Long mode's compatibility submode is entered by loading a 32-bit or 16-bit code segment (CS.L=0). When this is done the basic segmentation behavior (i.e. excluding system descriptors) behaves per protected mode.rdos wrote:Unfortunately, I'm no longer able to return to compability-mode, as I cannot find a way to do that.
Re: Adding 64-bit support to RDOS
After some thoughts on this I think part of problem was that the stack is invalid. The thread had a segmented SS:ESP, but long mode skips the SS part, and uses only the offset, which probably caused page faults when I tried retf. I also tried jmp far [rbx], but none of the memory-layouts I used gave anything else than tripple faults. It's bad design that jmp seg:offset in not supported.Owen wrote:Long mode's compatibility submode is entered by loading a 32-bit or 16-bit code segment (CS.L=0). When this is done the basic segmentation behavior (i.e. excluding system descriptors) behaves per protected mode.rdos wrote:Unfortunately, I'm no longer able to return to compability-mode, as I cannot find a way to do that.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Adding 64-bit support to RDOS
JMP FAR indirect is supported, but note that it behaves differently between AMD and Intel:rdos wrote:It's bad design that jmp seg:offset in not supported.
- On AMD, a REX.W prefix is ignored; there is no jmp far seg16:off64 (it is interpreted as seg16:off32)
- On Intel, a REX.W prefix is honored; there is a jmp far seg16:off64
If memory serves, JMP FAR direct is unsupported.
The solution, of course, is to make sure you switch to a flat SS before entering long mode. If your compatibility/legacy mode stack is non-flat, then it will be of course necessary to adjust it as entering (and preferably reload SS as the null selector, as this makes various book keeping tasks simpler).