OSDev.org

Posted: **Tue Apr 10, 2012 4:02 am**

turdus wrote:
Brendan wrote:You're using "one (small) kernel stack per thread plus a bunch more kernel stacks for various interrupt handlers" and not using "one kernel stack per CPU"; and because you're having trouble seeing the difference you're not comprehending what everyone else here is talking about.
Sorry to disappoint you, but you are wrong. I do not have per thread IST values. IST pointers are in TSS, and you have only one TSS per cpu (unless you use hardware task switching, which I don't). Whenever a handler called, this pointer is substracted by KERNELSTACKSIZE, and before leave added. This way the "beginning of stack" is always arranged for the next handler to not interfere.

In other words, you wrote a buggy software implementation of things hardware is already capable of. Try an NMI between the start of an interrupt and the IST update. If you just have a sufficiently sized stack without an IST, everything will get pushed onto that without wasting space on manual gaps.

Posted: **Tue Apr 10, 2012 4:52 am**

Try an NMI between the start of an interrupt and the IST update

Theoretically yes if I would use the same IST, but IST update is the first instruction of every handler, so it's very very unlikely. Second, NMI uses a separate stack (handlers IST0, NMI IST1). No bugs here. Otherwise this problem affects software stack switch more, because there're more instructions in ISR before the right stack is selected. Also, having a conditional jump (checking CPL) will flush cpu pipelines that slows down even more (increasing the possibility of an NMI in a critical section), and you must use the same stack for NMI. So it's even better solution to this problem. Q.E.D.

you wrote a buggy software implementation of things hardware is already capable of

No. IST is a hardware solution. That's the point. I don't have to manage stacks from code. I already have a well-known good stack when the ISRs are called (this is an invariant property that's been restored by handler before another handler called).

Posted: **Tue Apr 10, 2012 5:28 am**

Hi,

turdus wrote:
Brendan wrote:You're using "one (small) kernel stack per thread plus a bunch more kernel stacks for various interrupt handlers" and not using "one kernel stack per CPU"; and because you're having trouble seeing the difference you're not comprehending what everyone else here is talking about.
Sorry to disappoint you, but you are wrong. I do not have per thread IST values.

I never said you have per thread IST values. I said you're using one (small) kernel stack per thread. I also said you're using more kernel stacks for various interrupt handlers, but I didn't say they were per thread or per CPU or whatever because it really makes no difference. The "one (small) kernel stack per thread" part is enough to make it clear that it isn't "one kernel stack per CPU".

turdus wrote:
Note: Ignoring space used for things like thread name, amount of CPU time used by the thread, etc; we'd be looking at about 512 bytes for FPU/MMX/SSE state alone; plus another 32 bytes (for protected mode) or 128 bytes (for long mode) to store the thread's general registers. There are no stacks in the TCB at all.

TCB contains stack to store general purpuse registers and return data for iret.

I was obviously talking about "one kernel stack per CPU". Create a topic called "one (small) kernel stack per thread" if you'd like to discuss your method - I'd be interested to hear how you handle SYSCALL/SYSRET.

Cheers,

Brendan

Posted: **Tue Apr 10, 2012 5:31 am**

turdus wrote:
you wrote a buggy software implementation of things hardware is already capable of
No.

Yes. I demonstrated the bug and you even confirmed it.

Not using an IST automatically gives you a good piece of stack without you preparing it in software, so your method is proven doing extra work hardware is capable of.

And remember last chat? You'll be losing on this because you're bitching about being 100% right which you aren't.

Posted: **Wed Apr 11, 2012 2:13 am**

@Brendan: I'm not interesting in discussing my method. I really don't care what you would say, I'll write my OS my way. I showed you a hardware supported way to implement per CPU stack in long mode. Use it or not, depends on you.

In general, task gates (in protected mode) and "IST" (in long mode) just complicate things more without solving anything...

For you. For me IST simplifies a lot.

@Combuster: you have serious problems with reading.

Try an NMI between the start of an interrupt and the IST update

I demonstrated the bug and you even confirmed it.

Where? You wrote about an NMI problem in ISR, which does not affect IST called ISRs for two reasons I wrote. I did not confirm anything, read again my post.

I don't get you guys. If you are so stucked with old proven methods and multics roots, what's the point of this site? You should be open to non-ordinary and new solutions. Just becasue you cannot think of a way of use for the first time does not mean it cannot be used that way. Think of it.

Posted: **Wed Apr 11, 2012 6:26 am**

What is, really, needed for reliable handling of NMI/#MC, is a system which enables switching to a dedicated stack (to avoid the potential damage to the TCB) upon an NMI/#MC, but will then enable those interrupts to nest.

It occurs to me that the processor's privilege level switch system implements this, but the hard part is, if you're handling the exception at CPL=1 (because your kernel pretty much has to be CPL=0), how to safely get back to CPL=0...

Some thoughts:

If you came from CPL=0, then when you RETF/IRET/etc from CPL=1 a #GP will be triggered. Before you IRET, you need to adjust TSS.RSP0 to mach the RSP on your stack frame (else you will trash the data saved by an earlier entry), or to point to the CPU thread (if it points inside a TCB0, and TSS.RSP1 to point past the stack frame (else an NMI/#MC during these gymnastics will trash your present stack frame before the #GP handler can rescue it). The #GP handler detects that the return CPL is 1, uses this to establish that NMI/#MC has been triggered, and copies over the data from the CPL=1 stack frame. RSP0/RSP1 can now be returned to normal, and the #MC/NMI handler is now safely running like a normal interrupt
If you came from CPL=3, then the return would normally be fully permitted. Copy the return CS on your stack (to save it for return time), adjust your on stack CS to be 0 (so that a #GP will be triggered), and adjust TSS.RSP1 like for when coming from CPL=0. When you IRET/RETF, a #GP will be triggered (because you are returning to a null CS), and the #GPF handler will be invoked almost as if coming from usermode. The #GPF code can detect the CPL=1 return value, and use that to ascertain that it is again running because an NMI/#MC has occured. Again the data should be copied off of the CPL=1 stack.

I swear, if someone had the sanity to make SMI not cause NMI nesting, this would be far simpler...

Posted: **Fri Apr 13, 2012 3:26 pm**

After quite some time of coding, debugging and fixing minor problems in my code I finally managed to get my 32 bit x86 kernel running using the "one kernel stack per CPU" approach. I still need to change my 64 bit kernel code as well, and to clean up everything. And of course I'll spend a lot of time testing everything...

Just to give you a broad overview how I finally solved the issues from the beginning, let me show you some pseudo code:

Code: Select all

KernelEntry:
	(perform kernel setup)
1:
	sti
	hlt
	jmp 1b

HardwareInterrupt:
	(interrupts are disabled)
	(if we come from ring 0, go to 1f)
	(load segment registers with sane values)
	(get address of current thread's register save area)
	(save register contents)
	(call actual handler - returns pointer to new thread's register save area)
	jmp ReturnToUser
1:
	(no need to save registers - only point where interrupts are enabled is "sti; hlt;" wait loop)
	(call actual handler - returns pointer to new thread's register save area)
	jmp ReturnToUser

SomeFault:
	(interrupts are disabled)
	(if we come from ring 0, go to 1f)
	(load segment registers with sane values)
	(get address of current thread's register save area)
	(save register contents)
	(call actual handler - returns pointer to new thread's register save area)
	jmp ReturnToUser
1:
	(save registers on the stack)
	(call actual handler - dumps register contents, this is a kernel bug or hardware failure)
	cli
	hlt

ReturnToUser:
	(if we get a NULL pointer, go to 1f)
	(restore register contents)
	(load user segments into segment registers)
	iret
1:
	sti
	hlt
	jmp 1b

I still need to solve issues such as nested NMI handling (which would currently crash my kernel) and probably many other things as well... But anyway, at least the first step is done.

Posted: **Fri Apr 13, 2012 5:56 pm**

Code: Select all

   (get address of current thread's register save area)
   (save register contents)

Xenos, I noticed this. Does this mean that you don't put the "thread's register save area" in the TSS ? And that you put a pointer to the core's kernel stack there ?

If so then that is quite different from what I'm doing.

Posted: **Sat Apr 14, 2012 1:12 am**

gerryg400 wrote:Xenos, I noticed this. Does this mean that you don't put the "thread's register save area" in the TSS ? And that you put a pointer to the core's kernel stack there ?

If so then that is quite different from what I'm doing.

You're right, I finally decided against the "esp0 points to register save area" idea, although I still think it is quite elegant. However, the reason why I decided against it is the potential problems which may arise from situations like nested NMI and such, and which must be carefully handled. While it may very well possible to deal with such situations, it probably requires more effort to make sure that everything works out.

For example, consider the case that an NMI occurs within an interrupt stub, when the thread's state has just been saved into the TCB and the stack pointer points to the lower end of the register save area. The NMI will push the return address (and stack pointer, if we are in long mode) onto the stack and thus may trash some part of the TCB which is just below the register save area.

Maybe I'll rewrite my interrupt stub some time later, but that will probably take me a lot more time thinking about these issues. At the moment there are many other fixes to be done which have higher priority.

OSDev.org

Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach

Re: Issues moving to the "one kernel stack per CPU" approach