Page 1 of 1

[SOLVED] Questions about calling system calls from user mode

Posted: Sun Jan 19, 2014 3:34 pm
by wichtounet
Hi,

Thanks to a lot of forum posts and wiki pages, I've been able to run some code in ring 3, in long mode (my OS is in long mode and all user code will be long mode).

I've now reached the point where I want to go from user mode -> kernel mode with system calls, with interrupts.

However, I'm unable to make that working.

I have create a TSS structure and a GDT selector point to it. Just before I switch to user mode, I set the stack in RSP0 of the TSS. I've created entry in the IDT for the the syscall (syscall is working if called from ring 0 kernel code).

When the interrupt occurs in ring 3 user code, a Page Fault is thrown.

Here is the code making the switch:

Code: Select all

            uint64_t rsp;
            asm volatile("mov %0, rsp;" : "=m" (rsp));
            gdt::tss.rsp0 = rsp;

            asm volatile("mov ax, %0; mov ds, ax; mov es, ax; mov fs, ax; mov gs, ax;"
                :  //No outputs
                : "i" (gdt::USER_DATA_SELECTOR + 3)
                : "rax");

            asm volatile("push %0; push %1; pushfq; push %2; push %3; iretq"
                :  //No outputs
                : "i" (gdt::USER_DATA_SELECTOR + 3), "i" (0x500000 + paging::PAGE_SIZE * 2 - 64), "i" (gdt::USER_CODE_SELECTOR + 3), "r" (header->e_entry)
                : "rax");
I have several questions about the switch to kernel mode via interrupts:
  1. Obviously, I'd be glad to know where does the PF comes from ;)
  2. In long mode, it seems that there is no SS in the TSS structure, is there another place I should put it ? Or is it something I should do in the interrupt handling code ?
  3. For now, I've set the DPL of my syscall to 3, is that correct ? If so, how will the processor go to ring 0 ? If not, how will I call it from user mode ?
  4. I've seen in many posts that OSes were mapping the kernel into user space, is it related to this problem?
  5. Finally, the page fault handler of my kernel is not run, instead, I see the page fault in Bochs output and it causes a triple fault and reset. Is it directly related to the fact that I cannot run int in user mode code ?
Sorry about the long list of questions, but I'm kind of lost. It seems that every page I find only explains parts of the problem and very few of them talks about long mode.

Thanks

Re: Some questions about calling system calls from user mode

Posted: Sun Jan 19, 2014 4:16 pm
by iansjack
You'll find it a lot easier, and more efficient to use SYSCALL/SYSRET for system calls rather than interrupts.

Re: Some questions about calling system calls from user mode

Posted: Sun Jan 19, 2014 5:19 pm
by Owen
iansjack wrote:You'll find it a lot easier, and more efficient to use SYSCALL/SYSRET for system calls rather than interrupts.
But you need to get interrupts working anyway.

---

The kernel needs to be mapped into the entirety of every application's paging structures. After all, the system doesn't switch which page tables are used on taking an interrupt.

SS is loaded with 0000h when switching from user to kernel mode in long mode. Long mode is generally very "don't care" about SS in kernel mode.

Re: Some questions about calling system calls from user mode

Posted: Mon Jan 20, 2014 10:20 am
by wichtounet
iansjack wrote:You'll find it a lot easier, and more efficient to use SYSCALL/SYSRET for system calls rather than interrupts.
I'd like to start with interrupts before I go to SYSCALL/SYSRET.
Owen wrote:The kernel needs to be mapped into the entirety of every application's paging structures. After all, the system doesn't switch which page tables are used on taking an interrupt.

SS is loaded with 0000h when switching from user to kernel mode in long mode. Long mode is generally very "don't care" about SS in kernel mode.
I didn't change the paging structure for now, so isn't it already mapped to the application paging structure ?

Re: Some questions about calling system calls from user mode

Posted: Mon Jan 20, 2014 10:41 am
by xenos
What do the error code, page fault linear address and faulting instruction tell you about the page fault?

Re: Some questions about calling system calls from user mode

Posted: Mon Jan 20, 2014 12:30 pm
by wichtounet
I've been able to go further :)

Now the user mode code can call ints and it executes correctly the system call :)

The page fault was coming from the RSP0 of the TSS which was not correct, I did not separate high and low parts of the RSP and the result was not the same.

I now have to figure how to come back from the user mode function, because now ret returns at 0x0 which does not contains code, but I'm gonna search.

The fact that the kernel should be mapped in user mode paging is still unclear to me. If I understand well, you should have a different CR3 for each process, is that right ? And then, you need the CR3 of the process to map the kernel for the processor to be able to find the IDT/GDT, is that it ? So why map the complete kernel ? And then, can you map the kernel in supervisor mode (flag) ?

Thanks

Re: Some questions about calling system calls from user mode

Posted: Tue Jan 21, 2014 12:15 am
by zhiayang
wichtounet wrote: The fact that the kernel should be mapped in user mode paging is still unclear to me. If I understand well, you should have a different CR3 for each process, is that right ? And then, you need the CR3 of the process to map the kernel for the processor to be able to find the IDT/GDT, is that it ? So why map the complete kernel ? And then, can you map the kernel in supervisor mode (flag) ?

Thanks
Yes. Basically, each process should have its own CR3 with its own set of mappings. The kernel does need to be in every process's address space, at least for a monolithic design. The other option is to switch CR3 on process switch, which as you imagine would be expensive, since it usually invalidates most if not all the mappings. (However IIRC micro kernel do that)

Therefore yes, the kernel needs to be mapped. However, your understanding of the idt/GDT is slightly flawed; the descriptor tables don't need to be in the kernel, but they need to be mapped somewhere.

Also: you can map the kernel as ring0 (supervisor page), since your system call would have setup the handler to set the requested CPL to zero. For a generic tutorial abiding OS, you'd set the selector as 0x8 (kernel CS) and the flags to 0xEE.

Finally, you would also need to map the kernel for your process switcher to work (if hooked up to an interrupt -- at this point sortiecat would like to remind you that they are separate things). However, you can choose to map only parts of the kernel. For instance, you might choose to map only the scheduling code and the system calls. That being said, it's often hard to determine which parts of the kernel would be used -- your scheduler might need a linked list library, which in turn might need some kind of heap code -- things get complex quickly, so at this point I'd suggest mapping the entire kernel.

Re: Some questions about calling system calls from user mode

Posted: Tue Jan 21, 2014 2:34 am
by wichtounet
Thanks for the great answer, it is much clearer now :)

Just one small point:
requimrar wrote:The other option is to switch CR3 on process switch, which as you imagine would be expensive, since it usually invalidates most if not all the mappings. (However IIRC micro kernel do that)
If each process has its own CR3, don't you have to change it on each process switch ? Otherwise, how would you know which one to use ?
Or did you just mean that you don't have to set the CR3 of the kernel again in each system call ?

Re: Some questions about calling system calls from user mode

Posted: Tue Jan 21, 2014 4:24 am
by Combuster
The intended observation was actually that you can keep the minimal amount of pages in the user's address space and then change CR3 on each switch between user and kernel land. But you'll still need some kernel code in each process so that you can actually make that switch.

And even in microkernels, it's just as sane to keep the entire kernel mapped in every process as it is in monolithics - The only observable difference is that you just have a lot less to map, and you only get additional task switches when driver code is invoked for the simple reason it's not part of the kernel any more.

Re: Some questions about calling system calls from user mode

Posted: Tue Jan 21, 2014 4:26 am
by wichtounet
Combuster wrote:The intended observation was actually that you can keep the minimal amount of pages in the user's address space and then change CR3 on each switch between user and kernel land. But you'll still need some kernel code in each process so that you can actually make that switch.

And even in microkernels, it's just as sane to keep the entire kernel mapped in every process as it is in monolithics - The only observable difference is that you just have a lot less to map, and you only get additional task switches when driver code is invoked for the simple reason it's not part of the kernel any more.
Ok, thanks, this time I got it :)

Re: Some questions about calling system calls from user mode

Posted: Tue Jan 21, 2014 5:53 am
by zhiayang
wichtounet wrote:Thanks for the great answer, it is much clearer now :)

Just one small point:
requimrar wrote:The other option is to switch CR3 on process switch, which as you imagine would be expensive, since it usually invalidates most if not all the mappings. (However IIRC micro kernel do that)
If each process has its own CR3, don't you have to change it on each process switch ? Otherwise, how would you know which one to use ?
Or did you just mean that you don't have to set the CR3 of the kernel again in each system call ?
EDIT: I meant you'd have to switch CR3 when doing a system call. Sorry for the confusion.

Re: Some questions about calling system calls from user mode

Posted: Tue Jan 21, 2014 6:27 am
by wichtounet
requimrar wrote:
wichtounet wrote:Thanks for the great answer, it is much clearer now :)

Just one small point:
requimrar wrote:The other option is to switch CR3 on process switch, which as you imagine would be expensive, since it usually invalidates most if not all the mappings. (However IIRC micro kernel do that)
If each process has its own CR3, don't you have to change it on each process switch ? Otherwise, how would you know which one to use ?
Or did you just mean that you don't have to set the CR3 of the kernel again in each system call ?
EDIT: I meant you'd have to switch CR3 when doing a system call. Sorry for the confusion.
No problem, I understood that afterwards ;) Thanks for the correction.