Adding 64-bit support to RDOS

Owen · Post by **Owen** » Wed Nov 07, 2012 9:14 am

The higher-half preserving behavior of the 16 bit registers causes major pipeline stall issues. Namely, the register renamer treats rH, rL, rX, ErX and RrX as equivalent (otherwise the logic in there gets very messy and complex, balloons in size, and becomes slower). In particular, in code like the following:

Code: Select all

mov eax, [some_mem_location]
... do something with eax ...
... perhaps some more code ...
mov ax, [some_value]
... do something with ax ...

the "mov ax" instruction will stall waiting on the completion of the previous eax-related operations, even if the code has no further dependency upon the higher half of EAX, because it needs to preserve it. You will get better performance by doing "movzx eax, word ptr [some_value]" because that way the false dependency is eliminated.

The "Higher half cleared" convention increases the performance of all code. Every architecture designer wants to do things this way; indeed, Intel wanted to do the same with regards to SSE instructions and the AVX YMM/ZMM registers... until they discovered Windows device drivers which used SSE in a context where the AVX registers weren't preserved (manually saving the previous value); hence, we have ended up in a mess.

rdos · Post by **rdos** » Wed Nov 07, 2012 10:13 am

I solved the issue for the crash debugger. I had to assign an interrupt (0x84) for the crash gate function. This solves the issue in that the interrupt handler for 64-bit mode will save the full register set before 32-bit code can clobber the higher halves, while the protected mode interface uses an 32-bit interrupt handler which only saves the 32-bit registers.

BTW, how are stack faults handled in long mode? In protected mode I setup the stack fault handler as a TSS, but since hardware task-switching is not supported in long mode, this no longer works to make the code execute in a known-valid environment. Is there a solution for this, or is a tripple fault inevitable when the stack is invalid in kernel space?

AJ · Post by AJ » Wed Nov 07, 2012 10:22 am

Hi,

Have a look at the IST entries in the Intel Manuals.

Cheers,
Adam

rdos · Post by **rdos** » Wed Nov 07, 2012 3:49 pm

OK, so I can use up to 7 IST stacks, which have pointers to RSP in the TSS. But is it really necessary to let each task allocate space for all used ISTs themselves? It would be more convinient to use the same ISTs for all tasks.

I also dislike the idea that the 64-bit TSS needs two consecutive descriptors, which wastes GDT entries, especially since I will not use above 4G addresses for the TSS anyway. I suspect that it is possible to load a 32-bit TSS while still in protected mode, switch to long-mode, and then use the TSS as a 64-bit TSS. All the processor caches is the base of the current task. It should have no idea that the TR was loaded (as a single GDT entry) in protected mode. The same applies to LDT, which I don't want to extend to 64-bits either, since it will never be above 4G. The ltr and lldt instructions should only validate the descriptors, not the memory referenced by the descriptors, so this should be safe.

This means the scheduler will do like this: (some pseudo-code)

Code: Select all


switch_to_long_mode:
    mov ax,long_mode_ldt
    lldt ax
    mov ax,core_tss
    ltr ax
    mov eax,new_cr3
    call switch_to_long_mode_and_load_cr3
    call patch_cpl0_stack

switch_to_protected_mode:
    mov eax,new_cr3
    call switch_to_protected_mode_and_load_cr3
    mov ax,new_ldt
    lldt ax
    mov ax,new_tss
    ltr ax

switch_between_long_mode_processes:
    mov eax,new_cr3
    mov cr3,eax
    call patch_cpl0_stack

switch_between_protected_mode_processes:
    mov eax,new_cr3
    mov cr3,eax
    mov ax,new_ldt
    lldt ax
    mov ax,new_tss
    ltr ax

Note in the example that ldt and tr are always loaded in protected-mode so they can use single descriptors.

Although this will not solve the issue of switching between long mode processes / threads. Perhaps a better idea is to use one LDT for all 64-bit processes, as the LDT will not be frequently used in long mode. One might also use one TSS per core, and patch CPL0 stack on task switches. This will break some protected mode code, but I think it would be possible to solve.

Edit: Actually, only the futex implementation relies on TR contents being different between threads. Since the futex interface for 64-bit applications is likely to be revised anyway, this is no issue at all. 32-bit applications will run in protected mode, which will keep the one-TSS per task concept, and thus futexes will continue to work as expected.

Additionally, this logic also solves the issue of ISTs being valid before switching to long mode (because they are loaded in protected mode), and being valid after switching to protected mode. IOW, the transitional state of ISTs is ensured so ltr doesn't need to be done with interrupts disabled in the same sequence as the switching procedure.

The issue I have with special IO-permission maps for my V86 process (because this process does video-mode switches) are solved because this processes execute in protected mode, and thus can have per-thread IO-permissions. There is no need for 64-bit processes to have specific IO-permission maps. The design actually still allows switching the video hardware without tailor-made graphics device-drivers, while still supporting 64-bit applications. 64-bit device drivers can run at CPL=1 or 2, and with IOPL=1 or 2, so they can access any IO-port.

Owen · Post by **Owen** » Wed Nov 07, 2012 5:32 pm

Even in protected mode most OSes just patch the TSS and LDT entries in the GDT at process switch time. Long mode is engineered around that assumption; along with the removal of hardware context switching the use for multiple TSSes disappeared. If it would have reduced the required amount of silicon I do not doubt that they would have got rid of LTR and replaced it with a WRMSR to a newly invented "TSS_BASE_MSR"

The fact that the system descriptor GDT entries doubled in size (to accommodate the length of the base) shouldn't be an issue, because there is a finite number of them

rdos · Post by **rdos** » Thu Nov 08, 2012 9:21 am

I think I need two ISTs, which should have different stacks for different cores so these can happen simultaneously with no bad effects. One IST would be for stack fault, and the other would be for double fault. Other than that, I see no real use for ISTs, primarily since they would not be recursive.

Owen · Post by **Owen** » Thu Nov 08, 2012 9:53 am

The IST exists pretty much exactly for the case of a double fault (and other similar "emergency" scenarios). You'll probably want a TSS per core (many OSes handle this by having a GDT per core)

The IST is a table of 7 stack pointers ("IST 0" is used to refer to the normal TSS rsp0/1/2). Unless you have 7 emergency scenarios, you only need one.

bluemoon · Post by **bluemoon** » Thu Nov 08, 2012 10:02 am

If you use syscall, you need IST for IRQ handlers to avoid the following problem:

1. usermode -> syscall
2. enter ring0 with rsp -> application space (note that syscall do not alter rsp)
3. in the syscall stub you typically do this somewhere:

Code: Select all

    mov     rsp, qword [k_TSS + tss64_rsp0]     ; switch to kernel stack

However, if you are lucky enough, IRQ can happen between 2 and 3, and your IRQ handler would not have a stack pointed by TSS since ring0->ring0 do not trigger stack switch.

ps. There may be other solution to solve this issue but IST is convenient for this, and you can have same IST per IRQ for all processes within same core.

Brendan · Post by **Brendan** » Thu Nov 08, 2012 11:35 am

Hi,

bluemoon wrote:ps. There may be other solution to solve this issue but IST is convenient for this, and you can have same IST per IRQ for all processes within same core.

For long mode (but not protected mode), there's a "SYSCALL Flag Mask" MSR (MSR 0xC0000084) which allows any flags in EFLAGS to be masked, including IF. If this mask is configured to clear IF, it prevents normal IRQs from interrupting before you've switched to the kernel's stack (and means that you only really need to use IST for NMI and Machine Check).

Cheers,

Brendan

Antti · Post by **Antti** » Thu Nov 08, 2012 11:53 am

Brendan wrote:For long mode (but not protected mode), there's a "SYSCALL Flag Mask" MSR (MSR 0xC0000084) which...

Yes, that is very useful. I also use it to clear Direction Flag. I wonder how "IST IRQ scenario" handles nested interrupts. Is it so that the "IST interrupt" always jumps to the "beginning" of that stack. So nested interrupts are not handled very well... I am sorry if I have misunderstood the issue. IRQs are not ISTed for me.

rdos · Post by **rdos** » Thu Nov 08, 2012 4:22 pm

Brendan wrote:Hi,

bluemoon wrote:ps. There may be other solution to solve this issue but IST is convenient for this, and you can have same IST per IRQ for all processes within same core.
For long mode (but not protected mode), there's a "SYSCALL Flag Mask" MSR (MSR 0xC0000084) which allows any flags in EFLAGS to be masked, including IF. If this mask is configured to clear IF, it prevents normal IRQs from interrupting before you've switched to the kernel's stack (and means that you only really need to use IST for NMI and Machine Check).

OK. but when using SYSENTER in protected mode, the kernel entry-point is called with interrupts disabled, so this is no issue there either. Additionally, you set SS:ESP in an MSR, so the most effective way is to reprogram ESP MSR in the scheduler when current thread changes.

It seems sensible to clear IF if the entrypoint is called with application rbp loaded. OTOH, it might not be problematic to let IRQs run with the application stack either. They shouldn't be able to tell the difference, unless they run out of stack space.

rdos · Post by **rdos** » Thu Nov 08, 2012 4:25 pm

Antti wrote: Yes, that is very useful. I also use it to clear Direction Flag. I wonder how "IST IRQ scenario" handles nested interrupts. Is it so that the "IST interrupt" always jumps to the "beginning" of that stack. So nested interrupts are not handled very well... I am sorry if I have misunderstood the issue. IRQs are not ISTed for me.

I've wondered about this too. If an IRQ using IST always triggers a change of stack, then nesting IRQs using the same IST would produce chaotic results. But maybe I've misunderstood this issue as well?

Owen · Post by **Owen** » Thu Nov 08, 2012 4:55 pm

rdos wrote:
Antti wrote: Yes, that is very useful. I also use it to clear Direction Flag. I wonder how "IST IRQ scenario" handles nested interrupts. Is it so that the "IST interrupt" always jumps to the "beginning" of that stack. So nested interrupts are not handled very well... I am sorry if I have misunderstood the issue. IRQs are not ISTed for me.
I've wondered about this too. If an IRQ using IST always triggers a change of stack, then nesting IRQs using the same IST would produce chaotic results. But maybe I've misunderstood this issue as well?

The IST exists to serve IRQs/Exceptions where you can't be certain of the stack's safety - NMI, MC# and #DF. None of these should be nesting (the three of them - except perhaps NMI, if you're using it for IPIs, perhaps - are "end of days" scenarios).

rdos · Post by **rdos** » Fri Nov 09, 2012 1:09 am

Owen wrote: The IST exists to serve IRQs/Exceptions where you can't be certain of the stack's safety - NMI, MC# and #DF. None of these should be nesting (the three of them - except perhaps NMI, if you're using it for IPIs, perhaps - are "end of days" scenarios).

I think it could be pretty good to have an IST for stack fault as well, as it can avoid double faults when kernel stack is corrupt, while still being presented with the correct fault information. This is unlike using a task gate for stack fault in protected mode, which has several problems, one of them being that the register state is in a TSS, and that it cannot be nested. OTOH, if the stack fault handler generates another stack fault, things become really nasty with ISTs as well, so perhaps not?

Owen · Post by **Owen** » Fri Nov 09, 2012 2:44 am

In long/64-bit mode, any stack fault in user mode is going to be reported as a page fault (being as its the only method of bounding the stack) as with any other addressing fault. In kernel mode, likewise, but being as the page fault handler would be unable to push anything to the stack, it would turn into a double fault.

Erm, just make sure your kernel stacks are big enough?

OSDev.org

Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS

Re: Adding 64-bit support to RDOS