ITchimp wrote:My question to you (Dear nullplan) is that what you would do differently on getting the kernel task its own stack without breaking the kernel execution path?
Not use fork in kernel code, for one thing. I'm creating a 64-bit kernel, and in that kernel, all physical memory is always mapped. The kernel half of the address space is the same in all address spaces, starting with the memory mirror and ending in the kernel image mapping. With dynamic allocations in the middle. It is advantageous to map the memory needed in the kernel twice, so that fragmentation does not becomes as much of a problem. Also, so that I can use guard pages to prevent a stack overflow from going unnoticed.
Before you can understand task creation, you first must understand my task switching code:
Code: Select all
void switch_to_task(uintptr_t *old_stack, uintptr_t new_stack);
Code: Select all
switch_to_task:
pushq %r15
pushq %r14
pushq %13
pushq %r12
pushq %rbp
pushq %rbx
movq %rsp, (%rdi)
movq %rsi,%rsp
popq %rbx
popq %rbp
popq %r12
popq %r13
popq %r14
popq %r15
retq
The task structure is located at the top of the kernel stack. I'm allocating 2 pages for kernel stack (and another guard page), and I recon I probably won't need that much. And the task structure is a few hundred bytes. So that's why.
Code: Select all
int new_kthread(const char *name, void (*func)(void*), void* arg)
{
int r = -EAGAIN;
int tid = alloc_tid();
if (tid < 0) goto out;
r = -ENOMEM;
void *p = do_mmap(0, KSTACK_SIZE + KSTACK_GUARD_SIZE, PROT_NONE, MAP_KERNEL | MAP_ANONYMOUS, -1, 0);
if (!p) goto out;
if (do_mprotect((char*)p + KSTACK_GUARD_SIZE, KSTACK_SIZE, MAP_READ | MAP_WRITE)) goto out_unmap;
struct task *t = (struct task *)((char*)p + KSTACK_SIZE + KSTACK_GUARD_SIZE - sizeof (struct task));
strlcpy(t->name, name, sizeof t->name);
t->tid = tid;
task_list_add(t);
t->flags = TIF_KERNEL;
uint64_t *stk = (void*)t;
stk -= 7;
stk[0] = (uint64_t)t;
stk[1] = (uint64_t)func;
stk[2] = (uint64_t)arg;
stk[6] = (uint64_t)start_kernel_thread;
t->kstack = (uintptr_t)stk;
t->kstackbot = (uintptr_t)p + KSTACK_GUARD_SIZE;
mark_task_runnable(t);
return 0;
out_unmap:
do_munmap(p, 3 << 12);
out:
return r;
}
The synchronization happens inside the subroutines there. mark_task_runnable() will queue the task up in the run queue. When the scheduler gets to it, it will pop the task pointer, the function, and the argument into RBX, RBP, and R12, and will then "return" to a label called "start_kernel_thread". What happens there?
Code: Select all
start_kernel_thread:
sti
movq %rbx, %rdi
movq %rbp, %rsi
movq %r12, %rdx
xorl %ebp, %ebp
btrq $3, %rsp
call start_kernel_thread_c
ud2
OK, nothing major. Shuffle the arguments around for a C function, call it according to ABI, and expect it not to return. And then that function does what?
Code: Select all
noreturn void start_kernel_thread_c(struct task *t, void (*fn)(void*), void* arg) {
this_cpu_write(current_task, t);
fn(arg);
exit_task(t);
for (;;)
schedule();
}
Unsurprisingly, this sets the "current_task" CPU variable to the new task, calls the function, then condemns the thread and calls the scheduler forever. In theory, the scheduler shouldn't return, but since the function is "noreturn", I must also convince the compiler that it doesn't, lest I get a warning. Setting "current_task" must be done from C, because that is just so dynamic.
For a userspace task I basically do the same, except the new task also gets a CR3 (preconfigured from outside), and sets the TSS RSP0 (kernel threads also don't have that because they are already running at CPL 0, so no change of privilege will take place), and then performs an IRET. The frame for that is preconfigured when the stack is first allocated.
See, this way all tasks have their own kernel stack, but also, the kernel half of address space is always consistent. If a kernel tasks shares a stack-based pointer with another task, they will actually see the same memory.