yet another james molloy kernel tutorial question

ITchimp · Post by **ITchimp** » Thu Aug 13, 2020 2:46 am

I am just wondering, in his tutorial, the fork() code seems to have created the kernel stack for the
wrong task, it is new task should receive that kernel stack, not the current_task... can someone please
help me on this? I high lighted the code below...

Code: Select all

int fork()
{
    // We are modifying kernel structures, and so cannot be interrupted.
    asm volatile("cli");

    // Take a pointer to this process' task struct for later reference.
    task_t *parent_task = (task_t*)current_task;

    // Clone the address space.
    page_directory_t *directory = clone_directory(current_directory);

    // Create a new process.
    task_t *new_task = (task_t*)kmalloc(sizeof(task_t));
    new_task->id = next_pid++;
    new_task->esp = new_task->ebp = 0;
    new_task->eip = 0;
    new_task->page_directory = directory;
  [b]  current_task->kernel_stack = kmalloc_a(KERNEL_STACK_SIZE);[/b]
    new_task->next = 0;

    // Add it to the end of the ready queue.
    // Find the end of the ready queue...
    task_t *tmp_task = (task_t*)ready_queue;
    while (tmp_task->next)
        tmp_task = tmp_task->next;
    // ...And extend it.
    tmp_task->next = new_task;

    // This will be the entry point for the new process.
    u32int eip = read_eip();

    // We could be the parent or the child here - check.
    if (current_task == parent_task)
    {
        // We are the parent, so set up the esp/ebp/eip for our child.
        u32int esp; asm volatile("mov %%esp, %0" : "=r"(esp));
        u32int ebp; asm volatile("mov %%ebp, %0" : "=r"(ebp));
        new_task->esp = esp;
        new_task->ebp = ebp;
        new_task->eip = eip;
        // All finished: Reenable interrupts.
        asm volatile("sti");

        // And by convention return the PID of the child.
        return new_task->id;
    }
    else
    {
        // We are the child - by convention return 0.
        return 0;
    }

}

Octocontrabass · Post by **Octocontrabass** » Thu Aug 13, 2020 3:31 am

Why do you keep asking questions about code that is not sane?

nullplan · Post by **nullplan** » Thu Aug 13, 2020 8:00 am

There are a lot of things that code should do, but fails to. For example, it clears the interrupt flag for some arbitrary reason. Apparently as a replacement for a spinlock. It is not a terribly sane thing to do, to protect code instead of data. What is it that is in danger if an interrupt arrives? Identify that and add a spinlock for it. Because if an interrupt would be inconvenient, another concurrent CPU would probably also be. Write your code properly at the start, then you won't have to do a major rewrite later.

The line you highlighted is another example. He's probably losing a live pointer there, as well. So that is an unreachable page he can never get back. Also, the new task has no kernel stack, that variable just goes uninitialized. As for the rest, copying the EIP, and copying ESP and EBP with inline assembler has already been commented on (utterly insane!). This way, multiple tasks end up executing on the same stack which is horrendous! Kernel code should not be using fork(), because it cannot use the same copy-on-write semantics userspace can.

Also, why is he using a singly linked list of tasks, and then is appending new tasks to it? That means, fork() takes longer the more tasks you already have. In today's time, linked lists are horrible for cache performance, but having to iterate over the entire thing to add a new element just adds another layer of awful. Just make it a doubly-linked circular list! That way, appending a new element is constant time and you don't have to deal with reaching the end. And since you have an idle task, you don't even need to deal with the list being empty.

Schol-R-LEA · Post by **Schol-R-LEA** » Thu Aug 13, 2020 1:02 pm

nullplan wrote:There are a lot of things that code should do, but fails to. For example, it clears the interrupt flag for some arbitrary reason. Apparently as a replacement for a spinlock.

It is more that the tutorial is outdated - the approach used was typical of hobby OSes prior to 2010 or so. Spinlocks only came into regular use when multicore CPUs became common, and even then were slow to be adopted by hobbyists, as most were using older hardware and didn't realize the trouble it would cause them once they had multiple processors sharing memory and peripherals.

The usual rule was to minimize the time spend in the mutual exclusion zone - generally just long enough to set a mutex, perform a rendezvous, or transfer a message into a mailbox - precisely because it risked dropping interrupts; but it was all too easy to hold it too long if it was being done directly rather than through a minimal API, and if the kernel process in question hung, the whole system would too.

This was actually the method recommended by several textbooks at the time, as well, or at least discussed as the simplest method.

Clearing interrupts to avoid re-entrancy and enforce mutual exclusion was never a good method or mutual exclusion, but on a uniprocessor it could be made to work. Not so on a multiprocessor.

I should add that spinlocks themselves aren't necessarily the best solution; they are simply an easy one. Other means of synchronizing multiprocessor systems were already well established when the 8086 was first designed, but they all required a greater design effort and integration into the scheduler and synchronization primitives - spinlocks were easier to retrofit on to existing kernels.

I would argue that, for a new kernel spinlocks aren't the best approach - but they at least do the job, which clearing interrupts on just one of the processors won't.

I agree with you on the linked lists; other data structures such as a tree or hash table (or more likely, a hybrid tree of hash tables, or something similar to a B-tree) would generally fit this use much better, IMO. Again, this is largely a matter of being dated, as well as being something of a toy design - it assumes it will never have more than a handful of running processes, so it sticks to the simplest design possible (short of a fixed array, which actually was a common - and badly flawed - approach in even earlier tutorials and textbooks, and even in some older production OSes in the 1970s - there was a time when OSes with a fixed limit of 32 or 64 processes were the norm on minis, including in some early versions of Unix). I would say that at minimum, a priority queue would be the starting point for a sensible process table, and even that is probably too simplistic.

(OTOH, for a sleep queue, where the order in which the processes get revived is all that counts, a delta list is pretty reasonable, though again for locality's sake today you might use a dynamic array/list-of-arrays rather than a singly-linked list).

The other design flaws are exactly that - terrible design choices from someone who didn't understand what he was doing as well as he thought he did. The use of a single stack for multiple threads was and remains common in libraries such as the (quite bad) Pthreads, and it is possible Molloy couldn't see how kernel level protected-memory processes differ from userland shared-memory threads - he wouldn't be the first, nor the last, to make that mistake.

I'll be honest, I have been planning to go through Molloy's tutorial myself (as well as some of the others floating around), not to learn from it but to get a sense of just where its flaws are. While I know about the wiki page on known bugs in the tutorial, I don't know enough about the actual codebase to comment on it sensibly.

ITchimp · Post by **ITchimp** » Thu Aug 13, 2020 8:24 pm

I deeply appreciate all your inputs and comments on my question.

James Molloy's context switching in chapter 10 establishes a kernel stack for each task..so I don't really
see part of your critical comments that multiple thread sharing the same stack...

also on single CPU cli, sti does the job, I think for a beginner like me knowing the 80 percent of the key concepts in 20% of the time is where I am right now...

nullplan · Post by **nullplan** » Thu Aug 13, 2020 9:59 pm

ITchimp wrote:James Molloy's context switching in chapter 10 establishes a kernel stack for each task..so I don't really
see part of your critical comments that multiple thread sharing the same stack...

I just read chapter 9 of the tutorial. I had thought, since he is copying ESP and EBP from the parent stack, he is also reusing the stack. But no, he is using paging to copy the kernel stack and have the same kernel address point to different physical addresses. So there's another point of insanity. This also means that different kernel tasks can never share a stack pointer, since one won't be able to see the stack of the other. Well, it is awfully clever, I'll give him that. Or just awful, depending on your point of view.

ITchimp wrote:also on single CPU cli, sti does the job

It's more of a design problem. CLI and STI are protecting code, when it has long been established that you should be protecting data from concurrent accesses. So you actually find the points of inconsistency and protect those.

ITchimp · Post by **ITchimp** » Fri Aug 14, 2020 1:13 am

Yes, that was my reading too. A kernel task acquires independence only when it has its own kernel stack and pg_dir struct (its own virtual to physical mapping).
but since the kernel code/data are shared among all kernel tasks... the content on the stack is from all those initialization functions before kernel transitions from single thread to multithread...

My question to you (Dear nullplan) is that what you would do differently on getting the kernel task its own stack without breaking the kernel execution path?

nullplan · Post by **nullplan** » Fri Aug 14, 2020 11:24 am

ITchimp wrote:My question to you (Dear nullplan) is that what you would do differently on getting the kernel task its own stack without breaking the kernel execution path?

Not use fork in kernel code, for one thing. I'm creating a 64-bit kernel, and in that kernel, all physical memory is always mapped. The kernel half of the address space is the same in all address spaces, starting with the memory mirror and ending in the kernel image mapping. With dynamic allocations in the middle. It is advantageous to map the memory needed in the kernel twice, so that fragmentation does not becomes as much of a problem. Also, so that I can use guard pages to prevent a stack overflow from going unnoticed.

Before you can understand task creation, you first must understand my task switching code:

Code: Select all

void switch_to_task(uintptr_t *old_stack, uintptr_t new_stack);

Code: Select all

switch_to_task:
  pushq %r15
  pushq %r14
  pushq %13
  pushq %r12
  pushq %rbp
  pushq %rbx
  movq %rsp, (%rdi)
  movq %rsi,%rsp
  popq %rbx
  popq %rbp
  popq %r12
  popq %r13
  popq %r14
  popq %r15
  retq

The task structure is located at the top of the kernel stack. I'm allocating 2 pages for kernel stack (and another guard page), and I recon I probably won't need that much. And the task structure is a few hundred bytes. So that's why.

Code: Select all

int new_kthread(const char *name, void (*func)(void*), void* arg)
{
  int r = -EAGAIN;
  int tid = alloc_tid();
  if (tid < 0) goto out;
  r = -ENOMEM;
  void *p = do_mmap(0, KSTACK_SIZE + KSTACK_GUARD_SIZE, PROT_NONE, MAP_KERNEL | MAP_ANONYMOUS, -1, 0);
  if (!p) goto out;
  if (do_mprotect((char*)p + KSTACK_GUARD_SIZE, KSTACK_SIZE, MAP_READ | MAP_WRITE)) goto out_unmap;
  struct task *t = (struct task *)((char*)p + KSTACK_SIZE + KSTACK_GUARD_SIZE - sizeof (struct task));
  strlcpy(t->name, name, sizeof t->name);
  t->tid = tid;
  task_list_add(t);
  t->flags = TIF_KERNEL;
  uint64_t *stk = (void*)t;
  stk -= 7;
  stk[0] = (uint64_t)t;
  stk[1] = (uint64_t)func;
  stk[2] = (uint64_t)arg;
  stk[6] = (uint64_t)start_kernel_thread; 
  t->kstack = (uintptr_t)stk;
  t->kstackbot = (uintptr_t)p + KSTACK_GUARD_SIZE;
  mark_task_runnable(t);
  return 0;
out_unmap:
  do_munmap(p, 3 << 12);
out:
  return r;
}

The synchronization happens inside the subroutines there. mark_task_runnable() will queue the task up in the run queue. When the scheduler gets to it, it will pop the task pointer, the function, and the argument into RBX, RBP, and R12, and will then "return" to a label called "start_kernel_thread". What happens there?

Code: Select all

start_kernel_thread:
  sti
  movq %rbx, %rdi
  movq %rbp, %rsi
  movq %r12, %rdx
  xorl %ebp, %ebp
  btrq $3, %rsp
  call start_kernel_thread_c
  ud2

OK, nothing major. Shuffle the arguments around for a C function, call it according to ABI, and expect it not to return. And then that function does what?

Code: Select all

noreturn void start_kernel_thread_c(struct task *t, void (*fn)(void*), void* arg) {
  this_cpu_write(current_task, t);
  fn(arg);
  exit_task(t);
  for (;;)
    schedule();
}

Unsurprisingly, this sets the "current_task" CPU variable to the new task, calls the function, then condemns the thread and calls the scheduler forever. In theory, the scheduler shouldn't return, but since the function is "noreturn", I must also convince the compiler that it doesn't, lest I get a warning. Setting "current_task" must be done from C, because that is just so dynamic.

For a userspace task I basically do the same, except the new task also gets a CR3 (preconfigured from outside), and sets the TSS RSP0 (kernel threads also don't have that because they are already running at CPL 0, so no change of privilege will take place), and then performs an IRET. The frame for that is preconfigured when the stack is first allocated.

See, this way all tasks have their own kernel stack, but also, the kernel half of address space is always consistent. If a kernel tasks shares a stack-based pointer with another task, they will actually see the same memory.

nexos · Post by **nexos** » Fri Aug 14, 2020 12:47 pm

Forking kernel mode is a bad idea. I know Linux 1.0 did it, but it still isn't good. Instead, you should have a function that spawns the first task in user mode. Then process creation from that point forward is done via forking. I tried basing my fork off of James Molloy, and found that he is over complicating it, and doing it in an insane manner.

sj95126 · Post by **sj95126** » Fri Aug 14, 2020 2:25 pm

nexos wrote:Instead, you should have a function that spawns the first task in user mode.

This is the point where I am right now. I've successfully implemented software-based task switching, but in a fairly hackish manner. Still, it works.

I'm thinking what I should do is have the entry point for the first user process be in a separate page-aligned section of the kernel image (e.g. sections like .usertext, .userdata, .userrodata, etc.), so that I can reassign those pages to the user process page map and disassociate them from the kernel.

OSDev.org

yet another james molloy kernel tutorial question

yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question

Re: yet another james molloy kernel tutorial question