Page 1 of 2

Unable to set the stack pointer

Posted: Tue Apr 25, 2017 5:22 am
by prasoc
Hello
I am trying to create a local stack for each thread. I have set up my scheduler on the timer interrupt and save+restore the registers for each thread like this:

Code: Select all

			// save the previous threads state
			memcpy(&thread_list[last_thread]->state_reg, r, sizeof(registers));

			// set the registers from the current thread's saved state
			memcpy(r, &thread_list[task_idx]->state_reg, sizeof(registers));
"r" is the first parameter of the scheduler's interrupt (a pointer to the registers in memory). Every register gets stored and set correctly, apart from "esp"! Is there a way to force the stack pointer to point at a new address in C/C++?

Following up from this, is "one stack per thread" a good idea? I've heard that some kernels use one thread per *core*, but I am having a hard time understanding how multiple threads can push to the stack and not interfere with each other. Can anyone shed a little light on this?

Thanks

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 5:26 am
by iansjack
I'm a little confused. Your code seems to be dealing with "a pointer to the registers in memory". But the register set isn't part of the memory address space. So this doesn't change the registers themselves, just some saved version of them.

Changing the stack pointer directly in C is probably not a good idea. I would use an assembler routine to actually set, or read, the registers.

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 5:29 am
by prasoc
That's strange. It seems to work quite well in Bochs :? ok so I will implement this switching in ASM (with push and pop)

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 12:59 pm
by LtG
As for your second question regarding one stack per core, that's likely referring to having one _kernel_ thread per core as opposed to having one kernel thread for each process (or thread, which would mean even more kernel threads).

Ignoring pathological cases each thread must have its own stack, not sure but there might also be some non-pathological (though unconventional) cases where each thread didn't need its own stack but in practice they always have dedicated stack. Remember the CPU (CALL) uses the stack implicitly, and compilers also use it, so it has to stay consistent..

Note also that there's at least two obvious ways to handle each thread having their own stack:
- Stack starts always at fixed virtual address and when you switch threads you adjust the page tables
- Each stack has unique address, main problem with this is in 32-bit mode if you want to have large number of stacks as it starts to eat the address space

The first one has its issues also, threads can never send data to other threads that's stored in their stack as the recipient thread could not access it and will access something random in its own stack instead, in addition if you run threads in the same process on multiple cores simultaneously you need multiple page directories (one for each thread that is running simultaneously), though majority of the page tables can be the same, just the stack tables and the directory itself must be dedicated for each thread..

Deciding which route to take might have some impact on long term development so probably a good idea to think about which route to take.

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 1:05 pm
by iansjack
LtG wrote:The first one has its issues also, threads can never send data to other threads that's stored in their stack as the recipient thread could not access it and will access something random in its own stack instead
Not quite true. If you need to communicate in this way you can always provide a mechanism for a thread to map another thread's stack to some linear address in it's address space.

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 5:16 pm
by zaval
to the author, you only don't listen to those who says you a theard has its own address space. It doesn't. "Non-pathological" threads live in their process' address space. The owner of address space is always a process. Thread is a scheduling unit, unit of execution. This is the only sane way to do stuff, to not mess up everything upside down.

You need to use assembly for doing interrupt dispatching, context saving/switching, stack switching etc. It's an architecture specific thing. On an interrupt, you might need to save registers which your handler chain may use, switch to the kernel stack, again it depends heavily on the architecture, not only on your design.
But your threads need their personal stack for each. Moreover, every thread should have its kernel stack too. This is the most powerful variant. No need to grant "kernel threads" a special meaning. Threads in kernel mode are ordinary threads except they run in kernel mode. there might be a lot of threads running in kernel mode simultaneously, on each core. One thread running in user mode might preempt other one running in kernel mode and vice versa.
Following up from this, is "one stack per thread" a good idea? I've heard that some kernels use one thread per *core*, but I am having a hard time understanding how multiple threads can push to the stack and not interfere with each other. Can anyone shed a little light on this?
Yes, it's a good idea. and this
some kernels use one thread per *core*
is a bad idea. it's even not an idea. it's just some unixoid dinosaurs can not learn themselves what threads are. xD don't tie threads either to the core or to the kernel. you'll get nothing useful. Having every thread easily migrate from user mode to kernel mode retaining its "identity", not bothering with address space (it's a process attribute), being easily preemptible, giving a great deal of kernel reentrancy, having their own stacks in both spaces - this is the thing. ;)

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 5:50 pm
by OSwhatever
iansjack wrote:Changing the stack pointer directly in C is probably not a good idea. I would use an assembler routine to actually set, or read, the registers.
When changing the stack pointer in general you always have to do a "change stack pointer and jump/return" otherwise changing the stack pointer inside a C function might lead to unexpected behaviour.

I have never heard of "one kernel stack per core" unless you have a design that only allow one process to visit the kernel at once and that would be a very limited design without any benefits. The standard is 1:1 stack model, one user/kernel stack per thread and this is what most production kernels are using. I'm using a N:M user/kernel stack model which is more complex to implement, benefit is less kernel stacks needs to be allocated typically 4-6 of them per CPU but is completely dynamic.

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 11:42 pm
by Velko
Take a look at this old forum thread - Brendan explains One stack per core, it's pros and cons.

Re: Unable to set the stack pointer

Posted: Tue Apr 25, 2017 11:54 pm
by Brendan
Hi,
prasoc wrote:Following up from this, is "one stack per thread" a good idea? I've heard that some kernels use one thread per *core*, but I am having a hard time understanding how multiple threads can push to the stack and not interfere with each other. Can anyone shed a little light on this?
First; let me explain "one kernel stack per CPU" (because it isn't as common and isn't as obvious).

Imagine if the kernel was like a multi-threaded process with one thread per CPU (but still mapped into all virtual address spaces and still running at CPL=0); and every time you enter the kernel (IRQ, exception, system call) you do a thread switch from "user-space thread" to "kernel thread for this CPU", and every time you leave the kernel (iret, etc) you do a thread switch from "kernel thread for this CPU" to "user-space thread scheduler decided to return to".

In this case, every time you enter the kernel you'd save a user-space thread's state and load a kernel thread's state; and
every time you leave the kernel you'd save a kernel thread's state and load a user-space thread's state.

Now imagine if we took this a little further; and every time you enter the kernel (IRQ, exception, system call) you create a new kernel thread and switch from "user-space thread" to "new kernel thread for this CPU that was just created", and every time you leave the kernel (iret, etc) you terminate the "new kernel thread for this CPU that was just created".

In this case, every time you enter the kernel you'd save a user-space thread's state and create new kernel thread; and every time you leave the kernel you'd terminate the kernel thread and load a user-space thread's state.

Now imagine if we optimised this a little. Because there can only be one kernel thread per CPU (and not 2 or more kernel thread's per CPU) you could have one kernel stack per CPU that is recycled every time you create the "new kernel thread for this CPU" and forgotten every time you terminate the kernel thread. You could also say that when a new kernel thread is created all general purpose registers are undefined and not set them to anything. Because the kernel is mapped into all virtual address spaces you also never need to worry about the virtual address space when creating a kernel thread (or changing virtual address spaces when switching from a user-space thread to a kernel thread). Essentially, "create new kernel thread and switch to it" means changing ESP, setting EIP and setting segment registers to default; and "terminate new kernel thread" means doing literally nothing. However, for most cases (IRQs, exceptions, etc - everything except SYSCALL) changing ESP and setting EIP is done by the CPU for you, so you don't actually have to do it yourself.

In this case, every time you enter the kernel you'd save a user-space thread's state and create new kernel thread (which is almost nothing); and every time you leave the kernel you'd terminate the kernel thread (which is nothing) and load a user-space thread's state.

In other words, every time you enter the kernel you'd save a user-space thread's state, and every time you leave the kernel you'd load a user-space thread's state.

The disadvantages are:
  • You can't have other kernel threads.
  • If the user-space thread that you switched from (when entering kernel) is the same user-space thread that you switch to (when leaving the kernel); then you've done a little extra work saving and loading that thread's state that could've been avoided.
  • Because you can't preempt kernel threads, expensive operations can cause latency problems if you don't split them into multiple faster operations and check for "kernel should be doing something else" between the smaller/faster operations. For example; you might split "spawn a new process" into 3 pieces ("create new virtual address space", "create process data structure", "create initial thread") and check if anything more important happened between these pieces.
The advantages are:
  • Less memory consumed by kernel stacks
  • Far better cache locality of kernel stacks (much more likely that a kernel stack will still be in that CPU's caches)
  • Slightly simpler scheduler design. You'd just have a "which user-space thread this CPU should return to" variable and change it to whatever you like as many times as you like while the kernel is running.
  • Simpler locking in kernel space (for multi-CPU). Because a kernel thread can't be pre-empted you don't need to care about disabling task switches when acquiring a lock (and only ever need to worry about disabling IRQs for some locks).
In general, the disadvantages make it unsuitable for a monolithic kernel, and the advantages can make it a faster (but more complex) alternative for a micro-kernel.


Cheers,

Brendan

Re: Unable to set the stack pointer

Posted: Wed Apr 26, 2017 5:44 am
by prasoc
Thanks all for the enlightening discussion!
At the moment, I haven't implemented any userspace, so everything is based in the kernel (monolithic)
One (kernel) stack per core does seem a lot simpler, but I would like to have multiple kernel driver threads to keep the screen refreshed, and other updating mechanisms, all kept away from the user.

It is currently a lot more of a monolithic kernel than I would like, but maybe in the future I will work on a microkernel, it definitely seems a lot cleaner in theory, but I don't have the necessary knowledge to implement one yet.

In practical code, I now save the registers in the IRQ1 scheduling function (with C, nothing fancy - just a memcpy to the thread's state) but now I have created a function which restores the register from a struct pointer in ASM. However, I am having difficulties with the fact that it is being called inside an interrupt, and I'm trashing registers that the function which calls it needs (ebp, esp)

I guess my question is, *where* do I set these registers? They are necessary for my scheduler to perform it's task! Seems like a chicken-and-egg problem. Especially with eip, surely it would instantly jump out of the interrupt and straight into the thread's execution? Do I need to do any cleanup?

Re: Unable to set the stack pointer

Posted: Wed Apr 26, 2017 6:03 am
by iansjack
prasoc wrote:Especially with eip, surely it would instantly jump out of the interrupt and straight into the thread's execution? Do I need to do any cleanup?
But where was EIP when the registers were saved?

Re: Unable to set the stack pointer

Posted: Wed Apr 26, 2017 6:21 am
by prasoc
iansjack wrote:
prasoc wrote:Especially with eip, surely it would instantly jump out of the interrupt and straight into the thread's execution? Do I need to do any cleanup?
But where was EIP when the registers were saved?
It sets the EIP for each thread firstly to the entry point of the function that needs to be ran. From there, every schedule tick the EIP gets saved when I save all of the registers.

Drat. This is not what I want, is it? I need to save the EIP of the previous thread BEFORE the interrupt gets called! Just as a theoretical, my interrupt gives a struct pointer to the "registers" as it's first parameter, like this:

Code: Select all

extern "C" struct registers * isr_handler(registers_t * registers) {
  struct registers * new_registers = registers;

  if ((registers->isr_num >= IRQ0) && (registers->isr_num <= IRQ15)) { // IRQ
    if (registers->isr_num >= IRQ8) { // IRQ originated from the slave PIC
      outportb(PIC2_COMMAND, PIC_EOI);
    }

    outportb(PIC1_COMMAND, PIC_EOI);
  }

  if (isr_handlers[registers->isr_num] != 0) {
    isr_handlers[registers->isr_num](registers);
  }

  return new_registers;
}

my "memcpy" method of saving and restoring the registers should work correctly, which brings me back to my first question of "how to set the stack pointer"? Should I add some ASM to the end of where the isr_handler gets called to call a stack pointer change?

Re: Unable to set the stack pointer

Posted: Wed Apr 26, 2017 6:31 am
by Brendan
Hi,
prasoc wrote:At the moment, I haven't implemented any userspace, so everything is based in the kernel (monolithic)
One (kernel) stack per core does seem a lot simpler, but I would like to have multiple kernel driver threads to keep the screen refreshed, and other updating mechanisms, all kept away from the user.

It is currently a lot more of a monolithic kernel than I would like, but maybe in the future I will work on a microkernel, it definitely seems a lot cleaner in theory, but I don't have the necessary knowledge to implement one yet.
Ok; in that case I have to assume "one kernel stack per thread".
prasoc wrote:In practical code, I now save the registers in the IRQ1 scheduling function (with C, nothing fancy - just a memcpy to the thread's state) but now I have created a function which restores the register from a struct pointer in ASM. However, I am having difficulties with the fact that it is being called inside an interrupt, and I'm trashing registers that the function which calls it needs (ebp, esp)
The words "IRQ1 scheduling function" don't make any sense.

At the start of the IRQ handler you need to save registers that you're going to use (to avoid trashing the state of whatever you interrupted), and at the end of the IRQ handler you need to restore the registers you saved; but (for "one kernel stack per thread") this has nothing to do with task switching or scheduling at all in any way whatsoever.
prasoc wrote:I guess my question is, *where* do I set these registers?
You save the register in the same place you'd save them for every function call - on the stack.
prasoc wrote:They are necessary for my scheduler to perform it's task! Seems like a chicken-and-egg problem. Especially with eip, surely it would instantly jump out of the interrupt and straight into the thread's execution? Do I need to do any cleanup?
No, it has nothing to do with your scheduler.

To write a scheduler (for "one kernel stack per thread"):

a) Write a "switch to task" kernel function that saves any registers that the calling convention you're using says need to be "preserved by callee"; then saves the old task's stack pointer somewhere, then loads the new task's stack pointer from somewhere, then loads any registers that the calling convention you're using says need to be "preserved by callee" and returns like a normal function normally would. This should be in 100% assembly (because you're doing "strange" things with the stack).

b) Write some code to create any management data for the thread that the kernel has been using since the computer booted (e.g. a "thread data structure" or something, which is where the tasks stack pointer would be saved during a task switch).

c) Write some code to create a new task, which includes allocating a stack and making sure that the new stack contains whatever the "switch to task" function pops off of the stack in the correct order (this is also why you want that "switch to task" function to be in 100% assembly - so you can guarantee the stack layout is known and won't change).

d) Test everything above by spawning a second task; where the first task does "switch to task #2" and the second task does "switch to task #1", so that you end up with a huge number of task switches.

e) Implement a "find task to switch to, then switch to it" function. This will involve having some kind of structure to keep track of tasks (if this is your first scheduler, I'd suggest using a simple linked list for "round robin" scheduler initially, just to get some experience with it - that initial "round robin" scheduler can be replaced by something that doesn't suck later). In any case; this function would find a task to switch to and then call the "switch to task" kernel function you wrote earlier.

f) Test your "find task to switch to, then switch to it" function with the same pair of kernel threads you used for testing earlier; just by making both tasks call your "find task to switch to, then switch to it" function (instead of switching directly to each other). When that works, you should be able to spawn 123 more threads and test that too.

g) Write some kind of "block_task(reason)" and "unblock_task(taskID)" functions. Whenever a task has to wait for something the kernel calls "block_task(reason)", which removes the task from whatever data structure/s the scheduler is using to keep track of tasks and then calls your "find task to switch to, then switch to it" function; which means that the task doesn't get any CPU time anymore. Whenever something happens that a task was waiting for the kernel calls "unblock_task(taskID)", which adds the task back into whatever data structure/s the scheduler is using to keep track of tasks, which means the task starts running again. Note that (for more advanced schedulers) the "unblock_task(taskID)" might also check if the task that was unblocked is higher priority than the currently running task and call your "switch to task" kernel function - this allows higher priority tasks to be able to respond "immediately" when (e.g.) the user presses a key, or a network packet arrives, or whatever. Also note that (for almost all OSs under almost all conditions) the majority of task switches are caused by tasks blocking and unblocking.

h) Test the "block_task(reason)" and "unblock_task(taskID)" functions by having a kernel thread call "block_task()" and then getting a different kernel thread to wake it back up using "unblock_task(taskID)".

i) Write code to do "sleep()", which puts a task onto some kind of "list of tasks to wake up when" and then calls "block_task(SLEEPING);". You will need some kind of timer IRQ that checks that "list of tasks to wake up when" and calls "unblock_task(taskID)" to wake them up. Test this too.

j) Consider writing some kind of IPC (messages, pipes, whatever). This will end up using the "block_task(reason)" and "unblock_task(taskID)" functions too.

k) Modify the code a little to set some kind of time-out during task switches; and modify your timer IRQ so that when that time-out expires it calls the "find task to switch to, then switch to it" function. This is mostly just a little hack to prevent CPU hogs.


Cheers,

Brendan

Re: Unable to set the stack pointer

Posted: Wed Apr 26, 2017 6:42 am
by iansjack
prasoc wrote:Drat. This is not what I want, is it? I need to save the EIP of the previous thread BEFORE the interrupt gets called!
I don't think you do. (or, rather, you have already done so by saving it on the stack when the interrupt was called.) I would say that if you are going to restore the registers within an interrupt then that is the point at which you wanted to have saved them. (I think Brendan misunderstood what you meant by "where" - he uses the stack, I use the task structure.)

I have an interrupt routine that switches tasks; it saves the current registers (in my case to the structure in memory) then restores the previously saved registers from the structure for the task to be switched to. As these were saved from within the task-switching interrupt routine, the stack of the new task will contain the correct entries to return from the interrupt. (It returns to a different place than the task switched from would, but that's what you want - you return to the point in the task's code where the interrupt to switch tasks was called from as if the task had been running all along.)

You do have to be careful that when you create a task you put the correct entries on its stack to pretend that it is in the middle of the interrupt routine, and "returns" to the start code for your task.

Re: Unable to set the stack pointer

Posted: Wed Apr 26, 2017 6:46 am
by prasoc
Brendan wrote: The words "IRQ1 scheduling function" don't make any sense.
What about "my scheduler function attached to the timer IRQ (1)"? still quite new to this, need to get my terminology correct!
Brendan wrote: At the start of the IRQ handler you need to save registers that you're going to use (to avoid trashing the state of whatever you interrupted), and at the end of the IRQ handler you need to restore the registers you saved; but (for "one kernel stack per thread") this has nothing to do with task switching or scheduling at all in any way whatsoever.
Yes, which is why I was confused about the kernel stacks appearing in the thread. The crux of my problem was that I couldn't set ESP from within my C code! Do I need a "jmp" or something to force the stack change?

Brendan wrote: To write a scheduler (for "one kernel stack per thread"):
Wow, what an in-depth explanation!
Brendan wrote: c) Write some code to create a new task, which includes allocating a stack and making sure that the new stack contains whatever the "switch to task" function pops off of the stack in the correct order (this is also why you want that "switch to task" function to be in 100% assembly - so you can guarantee the stack layout is known and won't change).
At the moment, the stack it switches to is zero-initialised. I will need to push some values to that memory address so that when the stack gets swapped out, the C code running doesn't detect the change?
Brendan wrote: e) Implement a "find task to switch to, then switch to it" function. This will involve having some kind of structure to keep track of tasks (if this is your first scheduler, I'd suggest using a simple linked list for "round robin" scheduler initially, just to get some experience with it - that initial "round robin" scheduler can be replaced by something that doesn't suck later). In any case; this function would find a task to switch to and then call the "switch to task" kernel function you wrote earlier.
I use my scheduler class (keeping track of a static variable) to manage the threads, stored inside a basic implemented version of std::vector, which I can search threads, etc. with (using some C++ iterators/magic)
Brendan wrote: j) Consider writing some kind of IPC (messages, pipes, whatever). This will end up using the "block_task(reason)" and "unblock_task(taskID)" functions too.
This is the reason why I have the issue in the first place - I'm implementing a port-based message queue, which doesn't seem to work when called in multiple threads at the same time. I can only assume that stack mixes both threads up