Page 1 of 1
Interruptable kernel syscalls
Posted: Sat Jul 07, 2007 9:05 pm
by mutex
I have implemented a kernel with syscall interface (interrupt based). A ring3 can call this and the syscall code is executed in ring0.
I also have a basic software task switcher impelemted. Most tasks are ring3 but idle task and some drivers for doing IO are ring0.
I have until now just had uninterruptable syscalls by disabling interrupts on entry to the call and enable again on exit.
I am thinking about implementing interruptable syscalls. Eg: i have a function that takes 1sec to return. Then i want this syscall to be interrupted, and other tasks scheduled with it not waithing for that call to finish. There are probably other ways to come around something like this, but i think this would be a nice approach.
Critical sections in the syscall kernel part can be handled with enable/disable of interrupts for now since i just have one cpu
Anyone implemented something like this? How did you do it? Any design conciderations? Anyone got a reference how this could be done?
It has been a few years since i did something on my code last time so im a little bit "rusty"
cheers
Thomas
Posted: Sat Jul 07, 2007 9:23 pm
by Colonel Kernel
Here are some random hastily-worded pieces of advice:
- Have a dedicated kernel stack for each thread. That way, pre-empting a long-running system call has almost the same mechanics as pre-empting a user-mode thread.
- At least think about how you would handle multiple CPUs. You may want to do it eventually anyway, and it might give you good ideas in the meantime.
- Be careful to disallow pre-emption during critical operations (like in the scheduler, interrupt handlers, etc.) I guess for you this is really easy (just disable interrupts), but see what I said about multiple CPUs above.
Sorry I can't be more specific... I'm sort of posting on the run.
How long was adam in paradise? Until asm("sti");
Posted: Mon Jul 09, 2007 7:52 am
by mutex
Thans for your answer! It always helps me talk about things whit someone understanding me
I have been thinking about mp, but easier to leave all the loopholes out until i have the most basic working..
I have a separate kernel stack now for the task switch. All tasks have a kernel-context-switch stack that is patched into the TSS on every return to ring3.
The problem is.. When i do a syscall it enters ring0. Saves its context and starts doing the syscall function. IF i enable ints then it totally screws up. Triple faults in the end.
This is how things are done in short terms:
Isr:
timer_isr:
SAVECONTEXT
call _timer ; This runs the scheduler, changes the pointer to current task etc
LOADCONTEXT
syscall_isr:
SAVECONTEXT
call _syscall
LOADCONTEXT
SAVECONTEXT
; push all regs / seg-regs
; load kernel cs/ds etc
; save esp into the process_sturct for this process
LOADCONTEXT
; load esp
; patch tss with the begining/of this stack
; pic eoi
; load user cs/ds etc
; pop all regs / seg-regs
; iretd
I can do syscalls without problem. Task switch and everything else
but, if i enable INTS again in the syscall i get into trouble. Looks like the SS is not set (0x00) and some other things **** up. Stack problems anyway and it crashes and burns.. Cant find any good way to debug it without bochs+gdb and linux. Installing it now
Anyway.. What happens when i get a timer interrupt when im in ring0 and in the syscall? The syscall will stop, context pushed on its runningstack (since in ring0) and then the TSS for next task is loaded??.. this is where i have trouble.. The tss should be patched to match the stack for where it stoppet right? This means that i should have another stack and not use the kernel context stack for the syscall???
Anyone follow?
cheers
Thomas
Re: How long was adam in paradise? Until asm("sti"
Posted: Mon Jul 09, 2007 8:24 am
by Colonel Kernel
thomasnilsen wrote:I have a separate kernel stack now for the task switch. All tasks have a kernel-context-switch stack that is patched into the TSS on every return to ring3.
Are you talking about two different things here? What is your "separate kernel stack for the task switch"? This may be what's giving you trouble.
In my kernel, there is one kernel stack per thread, and no other special kernel stacks.
Anyway.. What happens when i get a timer interrupt when im in ring0 and in the syscall? The syscall will stop, context pushed on its runningstack (since in ring0) and then the TSS for next task is loaded??.. this is where i have trouble.. The tss should be patched to match the stack for where it stoppet right? This means that i should have another stack and not use the kernel context stack for the syscall???
Anyone follow?
Sort of... Here is how it works in my kernel: If an interrupt occurs while running in a syscall, the context will be saved on the current thread's kernel stack. When the ISR is done, before it returns it checks the saved CS register to see if the interrupted thread was in ring 3. Since it wasn't, it takes no further action (i.e. -- it doesn't patch anything) and just does the iret. Later, when the syscall is done, it goes through the same routine, but this time it notices that the "interrupted" code was in ring 3, so it patches the TSS to point to the kernel stack of the new thread to run (if a new thread has been selected to run).
I hope that makes sense...
Hmmm
Posted: Mon Jul 09, 2007 8:44 am
by mutex
Hmm. Think i follow you. I might have explained myself bad before.
I have a kernel stack which is the stack that the TSS knows about so that the ring3->ring0 call will work. This is the only kernel stack i have. I have as you, one pr thread.
Scenario:
1. ring3 app is running
2. the app calls Consume10Seconds(); which is a syscall
3. the code enters __isr_syscall.
4. the cpu used the esp0 from TSS and we are in ring0.
5. we do not have a fresh kernel stack for this thread (since its not used yet).
6. we call the function in the syscall code. This function is marked as interruptable and enables ints.
7. Syscall is executing a looooong while loop before disabling ints and returning with a "RET" to where we where in the syscall isr before the call.
but, at 7.1 the timer fires and we have a new context switch...
then the TSS puts the cpu in ring0 again (with new stack from TSS.esp0), which is actually what we where using before.. and it starts using that for the timer and scheduler work. This works fine, but overwrites the data for the previous thread/stack. The timer exits loading the next thread and patches TSS with it.
On return we are on the next thread. This works.. but when we are back in the thread with the syscall we dont have anything on the stack and something is wrong.. starts execution over again or something....
I think i maby have an idea where it messes up now
This sounds correct right?
cheers
Thomas
Posted: Mon Jul 09, 2007 9:19 am
by JAAman
no, this shouldnt ever happen:
1. if your task-switch-while-processing-ring0 switches you to a different task, then you will be in a different address-space (normally, if not, see below), and the same virtual address (as refered to by TSS.SS0:TSS.ESP0) will point to a different physical address
2. if your task-switch-while-processing-ring0 switches you to a different thread in the same task, then one of two things will occur
A. the task-switch code will alter the TSS.ESP0 before returning to the thread, and when the int occurs you have a different space for the kernel stack to use (this is the most common method)
B. the task-switch code will alter the page-tables to remap certain thread-specific structures (including the kernel-stack) to point to a different physical memory, at the same virtual memory location, so the same stack is at the same place using different memory (just like a task switch does)
hope this helps
Posted: Mon Jul 09, 2007 9:39 am
by mutex
As for now i have paging but in my scenario all threads are running in same address space actually. vm=pm+0xc0000000..
So all tasks have a different virtual/physical address in ss:esp0..
I think i need to do some scetching on paper to convince myself that my theory is correct in the design....
Anyway.. anyone using BFE + bochs to debug code with inline source statements??
Im installing it now and hoping to get rid of this bug / design flaw
cheers
Thomas
Re: Hmmm
Posted: Mon Jul 09, 2007 11:16 am
by Colonel Kernel
thomasnilsen wrote:but, at 7.1 the timer fires and we have a new context switch...
then the TSS puts the cpu in ring0 again (with new stack from TSS.esp0), which is actually what we where using before.. and it starts using that for the timer and scheduler work. This works fine, but overwrites the data for the previous thread/stack. The timer exits loading the next thread and patches TSS with it.
The CPU shouldn't switch stacks when an interrupt occurs if you're using an interrupt gate and the CPU was already running in ring0 at the time (assuming the interrupt gate's PL is 0 as well).
Are you using hardware or software task switching? Everything I mentioned before is true for software task switching, but I haven't looked much at hardware task switching. I do know that if you're using task gates in your IDT instead of interrupt gates, things might not be working the way you intend...
Well....
Posted: Mon Jul 09, 2007 8:37 pm
by mutex
After installing bochs with the iodebug util i could do some real debuging and stop the code where i wanted to.
Now it seems that everything works well from the begining anyway.. as long as the syscall thread dont return from its ring0 call.
I also assumed that the syscall thread did not work/run at all since i only got one update on screen.. But that i figured out.. typo-error
So after fixing that i could see that the thread is actually running and doing its thing until it is done and should return from the int/syscall... On return it screws up.. Also my bluescreen handler did not take into account kernel tasks and loads context data from wrong pos. Therefore the strange cpu_regs...
So now i just have to figure out why & what really goes wrong on the end of the LOADCONTEXT function..
bochs and its iodebug util is really handy... Couldn't live without it
-
Thomas