Page 1 of 1

Setup syscall stack pointer

Posted: Sat Aug 13, 2011 3:34 am
by torshie
I'm using the 64-bit syscall/sysret for system call handling. I'm not sure about how to setup the system call stack.
Should I put the system call stack in the lower half or in the higher half? Is it necessary to setup a system call stack for every user mode thread?

Thanks
-torshie

Re: Setup syscall stack pointer

Posted: Sat Aug 13, 2011 1:47 pm
by Nessphoro
Okay, I do not know a lot about syscall/sysret, but regarding the syscalls -

There are two models by which you can go,

1: One kernel stack per CPU Core - very hard to implement since you need to use tricks like stack continuation.

2: One kernel stack per thread - very easy to implement you just change the ESP0 in TSS (If you're using interrupts) - or the MSR to point to the threads kernel stack.
And also since writing to the MSR is quite slow you can put a memory location in there to load the ESP's for all the threads.

Cheers,

Paul

Re: Setup syscall stack pointer

Posted: Sat Aug 13, 2011 7:23 pm
by gerryg400
3. A combination of both. You can have a small kernel stack per thread (about 96 bytes for i386, 160 bytes for x86_64) that's just big enough to hold the ring 3 context and then switch to a bigger kernel stack per core that's used for performing syscalls, interrupts and faults. There are some tricks needed to pre-empt system calls and nest interrupts but it's all very doable (at least in a microkernel).

Re: Setup syscall stack pointer

Posted: Sat Aug 13, 2011 9:33 pm
by torshie
Nessphoro wrote:Okay, I do not know a lot about syscall/sysret, but regarding the syscalls -

There are two models by which you can go,

1: One kernel stack per CPU Core - very hard to implement since you need to use tricks like stack continuation.

2: One kernel stack per thread - very easy to implement you just change the ESP0 in TSS (If you're using interrupts) - or the MSR to point to the threads kernel stack.
And also since writing to the MSR is quite slow you can put a memory location in there to load the ESP's for all the threads.

Cheers,

Paul
I'm very interested in the "One stack per CPU Core" model, do you have any documents/links about this model ?
I tried this model, but didn't find out how to handle long system calls. During long system calls, another system call (of a different process) can happen in the middle of current system calls. I have no idea how to handle this kind of mess :(

Re: Setup syscall stack pointer

Posted: Sat Aug 13, 2011 10:14 pm
by torshie
Not just long system call has this problem. If the kernel is fully preemptible, another system call could happen anytime, anywhere :shock:

Re: Setup syscall stack pointer

Posted: Sat Aug 13, 2011 11:51 pm
by Nessphoro
The only place of where to go right now is
http://i30www.ira.uka.de/~neider/edu/mk ... /ch02.html

But as it points out -"All threads executing on one CPU can use the same kernel stack. As a consequence, either only one thread can execute in kernel mode at any time (i.e., threads executing in kernel mode cannot be preempted), or unusual approaches such as continuations must be used," I would go against it - but suit yourself sir.

Re: Setup syscall stack pointer

Posted: Sun Aug 14, 2011 1:02 am
by gerryg400
You're assuming that (pre-emptable == good). This is not an automatic decision.

I always wonder what the advantages of a pre-emptable kernel are. As far as I can see it's about scheduling latency.

Writing a fully pre-emptable kernel is not easy and it doesn't guarantee that maximum interrupt and/or scheduling latency is shorter than the non-pre-empting kernel. The reason is that you cannot usually pre-empt or sleep a thread that is holding an exclusive lock or a deadlock will occur. This in turn means that you need to disable pre-emption, and perhaps interrupts, frequently in your code. It's actually only pre-emptable when pre-emption isn't disabled.

I think it's better to decide what sort of latencies you can live with and try to keep your system calls shorter than that. If a system call is longer it can be split up into pieces and allow a co-operative pre-emption. It may even be possible for the system call that is split up to be completed by more than one core.

You may judge that this is okay and it seems the Linux guys did after quite some debate. Personally I feel that, for a microkernel at least, co-operative pre-emption provides better control over the latencies of the kernel.

Re: Setup syscall stack pointer

Posted: Sun Aug 14, 2011 8:25 am
by Owen
...This is all fine if you design your kernel to never run with interrupts disabled (or only disable them during some critical scheduler operations)

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 12:31 am
by gerryg400
Owen wrote:...This is all fine if you design your kernel to never run with interrupts disabled (or only disable them during some critical scheduler operations)
Is there any other way ? What are you thinking ?

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 1:05 am
by Nessphoro
Here this should give you an idea:
http://www.disy.cse.unsw.edu.au/theses_ ... warton.pdf

Single stack kernels are only possible if (I think):
1:Threads data is being discarded on a context switch
2:You use blocking context switches

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 1:46 am
by gerryg400
Nessphoro wrote:Here this should give you an idea:
http://www.disy.cse.unsw.edu.au/theses_ ... warton.pdf

Single stack kernels are only possible if (I think):
1:Threads data is being discarded on a context switch
2:You use blocking context switches
I'll have a proper read of that article tonight. I do see now what continuations are. I do use a continuation function (just one function, not a stack of them) in my kernel. When a thread wakes from a blocked state (let's say it went to sleep waiting for a thread to join) the scheduler calls a post-processing function on its behalf to complete the system call. In this case it grabs the exit status and returns it to the joiner.

I really don't believe that it's much more difficult to design a single stack microkernel than a multistack one, expecially in the multi-core case. It's probably different for a monolithic kernel. I don't really understand point 2, but in answer to your points no 1.

1. I assmume you mean that the threads syscall data is discarded on a pre-emption. This would be true but if you use a co-operative pre-emption the thread can store enough info in it's TCB to complete the syscall in a continuation function. The trick is to interrupt the system call at a convenient point. Take for example, a long system call like deleting a process and all its resources. You might being by removing the process from the process table to make sure no-one accesses it any more. Now as soon and any reference counts reach zero you can start to dismantle the process and remove and delete its resources. If it takes too long the thread running in the kernel might need to yield to another thread. Later on it when it resumes it can try to delete the process again. It will make more progress each time and eventually the system call will complete.

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 2:23 am
by Nessphoro
What I meant by 2 is that on a syscall - no interrupt will occur, so that it will enforce one thread in kernel mode rule

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 2:41 am
by gerryg400
Nessphoro wrote:What I meant by 2 is that on a syscall - no interrupt will occur, so that it will enforce one thread in kernel mode rule
That's certainly not true. Interrupts can execute below (on intel) the syscall on the kernel stack. Interrupts can nest as well. No problem there at all.

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 4:24 am
by Combuster
The point was: you can't freely have interrupts in kernel land when you have a global kernel stack rather than a per-process one. You might end up calling the scheduler from random points within unrelated system calls, which is probably not what you want.

Re: Setup syscall stack pointer

Posted: Mon Aug 15, 2011 5:03 am
by gerryg400
It's true, you can't call the scheduler from just anywhere. You can only call it when you're about to return to userspace. There are a few cases to consider.
1. An interrupt occurs during userspace. -- execute ISR and then call scheduler.
2. An interrupt occurs during a syscall -- execute ISR and queue the result, resume syscall, process queued result then call scheduler.
3. An interrupt occurs during another interrupt -- execute (or defer if lower priority) ISR and queue result. When all interrupts are complete process the queued results then call the scheduler.

Add to that some co-operative yielding inside long system calls and you can achieve latencies that are more than acceptable, even comparable with a pre-emptable kernel.

Note Combuster, that I'm talking about a microkernel here, where system calls are short and there's no IO waiting etc.