Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Now I'm working on system calls but I don't really know what it does.
I prefer AMD's syscall, but I want to return to different privilege levels. So after searching, I find there's (probably) an "asymmetrical" way - syscall to call, and fake iretq ("push" 5 times and "iretq") to return - certainly I have no idea about this, so could anybody tell me? Or is there a better way? (Examples are more understandable.)
Do you mean you want the privilege level after the system call to be different to the one before it? Why would you want that?
To my mind, a system call is a way for a program with user privilege to access resources with supervisor privilege. But you wouldn't want that escalation to remain after the call.
Well, in my design, processes are divided into services and user tasks. Services run on PL1, and user tasks run on PL3. But both services and user tasks want to call the kernel.
I guess that's why most OS designs use just two privilege levels. Paging, essentially, forces this design choice. And a design that doesn't use paging is - IMO - a questionable design.
Js2xxx wrote:So after searching, I find there's (probably) an "asymmetrical" way - syscall to call, and fake iretq ("push" 5 times and "iretq") to return - certainly I have no idea about this, so could anybody tell me? Or is there a better way? (Examples are more understandable.)
I can't think of any real reason why this couldn't work. However; (depending on various details) the kernel may not be able to easily determine if the caller was CPL=3 or CPL=1 - e.g. if the callers share a virtual address space (which is the only case where using CPL=1 instead of IOPL makes sense to me) and could use similar address ranges (e.g. you can't just do "if(return_RIP < ...) { // Assume caller was CPL=3"). If the kernel can't figure out caller's privilege level it can't figure out how to return from the system call (and there'd be major issues for security too - e.g. determining if caller should/shouldn't be able to use a "rdmsr syscall", etc).
For alternatives; call gates and software interrupts were designed for this purpose (including automated privilege level checks at time of call/int using the "DPL" field), but both are old and "less fast"; and anything more recent is only really designed for a "2 privilege level (user/supervisor)" arrangement.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Thanks for all. Now I think I have got to design afresh.
EDIT: Is there a possibility that the processes with CPL = 3 use AMD's syscall, while the ones that CPL = 1 use interrupts? EDIT AGAIN: And for microkernel, only two real system calls - send and receive are present. So which way is better to implement these?
I guess you could build a second set of system calls this way (or at least the system calls that your services need). You can't use exactly the same code as it ends with a sysret, not an iret. But a common design is for the system call to encapsulate another routine, so it wouldn't be difficult.
But is this extra level of complexity worth it? Current thinking seems to be that two privilege levels are enough. I think the ability to use 4 levels is now only of historic interest, much as the hardware task switching is. It seems clear that the chip designers now intend only two levels to be used.
For your original question, here's an example. I raise privilege level in only one case, when I return to the idle thread as it's using the hlt instruction which requires CPL0. https://github.com/bztsrc/osz/blob/mast ... src.S#L226
I just point the stack to a memory where I already have those 5 arguments for iretq. For security reasons I also check the returning address to be in the idle() function, so others can't misuse the syscall to raise privilege with that.