OK, so I can use up to 7 IST stacks, which have pointers to RSP in the TSS. But is it really necessary to let each task allocate space for all used ISTs themselves? It would be more convinient to use the same ISTs for all tasks.
I also dislike the idea that the 64-bit TSS needs two consecutive descriptors, which wastes GDT entries, especially since I will not use above 4G addresses for the TSS anyway. I suspect that it is possible to load a 32-bit TSS while still in protected mode, switch to long-mode, and then use the TSS as a 64-bit TSS. All the processor caches is the base of the current task. It should have no idea that the TR was loaded (as a single GDT entry) in protected mode. The same applies to LDT, which I don't want to extend to 64-bits either, since it will never be above 4G. The ltr and lldt instructions should only validate the descriptors, not the memory referenced by the descriptors, so this should be safe.
This means the scheduler will do like this: (some pseudo-code)
Code: Select all
switch_to_long_mode:
mov ax,long_mode_ldt
lldt ax
mov ax,core_tss
ltr ax
mov eax,new_cr3
call switch_to_long_mode_and_load_cr3
call patch_cpl0_stack
switch_to_protected_mode:
mov eax,new_cr3
call switch_to_protected_mode_and_load_cr3
mov ax,new_ldt
lldt ax
mov ax,new_tss
ltr ax
switch_between_long_mode_processes:
mov eax,new_cr3
mov cr3,eax
call patch_cpl0_stack
switch_between_protected_mode_processes:
mov eax,new_cr3
mov cr3,eax
mov ax,new_ldt
lldt ax
mov ax,new_tss
ltr ax
Note in the example that ldt and tr are always loaded in protected-mode so they can use single descriptors.
Although this will not solve the issue of switching between long mode processes / threads. Perhaps a better idea is to use one LDT for all 64-bit processes, as the LDT will not be frequently used in long mode. One might also use one TSS per core, and patch CPL0 stack on task switches. This will break some protected mode code, but I think it would be possible to solve.
Edit: Actually, only the futex implementation relies on TR contents being different between threads. Since the futex interface for 64-bit applications is likely to be revised anyway, this is no issue at all. 32-bit applications will run in protected mode, which will keep the one-TSS per task concept, and thus futexes will continue to work as expected.
Additionally, this logic also solves the issue of ISTs being valid before switching to long mode (because they are loaded in protected mode), and being valid after switching to protected mode. IOW, the transitional state of ISTs is ensured so ltr doesn't need to be done with interrupts disabled in the same sequence as the switching procedure.
The issue I have with special IO-permission maps for my V86 process (because this process does video-mode switches) are solved because this processes execute in protected mode, and thus can have per-thread IO-permissions. There is no need for 64-bit processes to have specific IO-permission maps. The design actually still allows switching the video hardware without tailor-made graphics device-drivers, while still supporting 64-bit applications. 64-bit device drivers can run at CPL=1 or 2, and with IOPL=1 or 2, so they can access any IO-port.