yet another james molloy kernel tutorial question

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
ITchimp
Member
Member
Posts: 134
Joined: Sat Aug 18, 2018 8:44 pm

yet another james molloy kernel tutorial question

Post by ITchimp »

I am just wondering, in his tutorial, the fork() code seems to have created the kernel stack for the
wrong task, it is new task should receive that kernel stack, not the current_task... can someone please
help me on this? I high lighted the code below...

Code: Select all

int fork()
{
    // We are modifying kernel structures, and so cannot be interrupted.
    asm volatile("cli");

    // Take a pointer to this process' task struct for later reference.
    task_t *parent_task = (task_t*)current_task;

    // Clone the address space.
    page_directory_t *directory = clone_directory(current_directory);

    // Create a new process.
    task_t *new_task = (task_t*)kmalloc(sizeof(task_t));
    new_task->id = next_pid++;
    new_task->esp = new_task->ebp = 0;
    new_task->eip = 0;
    new_task->page_directory = directory;
  [b]  current_task->kernel_stack = kmalloc_a(KERNEL_STACK_SIZE);[/b]
    new_task->next = 0;

    // Add it to the end of the ready queue.
    // Find the end of the ready queue...
    task_t *tmp_task = (task_t*)ready_queue;
    while (tmp_task->next)
        tmp_task = tmp_task->next;
    // ...And extend it.
    tmp_task->next = new_task;

    // This will be the entry point for the new process.
    u32int eip = read_eip();

    // We could be the parent or the child here - check.
    if (current_task == parent_task)
    {
        // We are the parent, so set up the esp/ebp/eip for our child.
        u32int esp; asm volatile("mov %%esp, %0" : "=r"(esp));
        u32int ebp; asm volatile("mov %%ebp, %0" : "=r"(ebp));
        new_task->esp = esp;
        new_task->ebp = ebp;
        new_task->eip = eip;
        // All finished: Reenable interrupts.
        asm volatile("sti");

        // And by convention return the PID of the child.
        return new_task->id;
    }
    else
    {
        // We are the child - by convention return 0.
        return 0;
    }

}
Octocontrabass
Member
Member
Posts: 5572
Joined: Mon Mar 25, 2013 7:01 pm

Re: yet another james molloy kernel tutorial question

Post by Octocontrabass »

Why do you keep asking questions about code that is not sane?
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: yet another james molloy kernel tutorial question

Post by nullplan »

There are a lot of things that code should do, but fails to. For example, it clears the interrupt flag for some arbitrary reason. Apparently as a replacement for a spinlock. It is not a terribly sane thing to do, to protect code instead of data. What is it that is in danger if an interrupt arrives? Identify that and add a spinlock for it. Because if an interrupt would be inconvenient, another concurrent CPU would probably also be. Write your code properly at the start, then you won't have to do a major rewrite later.

The line you highlighted is another example. He's probably losing a live pointer there, as well. So that is an unreachable page he can never get back. Also, the new task has no kernel stack, that variable just goes uninitialized. As for the rest, copying the EIP, and copying ESP and EBP with inline assembler has already been commented on (utterly insane!). This way, multiple tasks end up executing on the same stack which is horrendous! Kernel code should not be using fork(), because it cannot use the same copy-on-write semantics userspace can.

Also, why is he using a singly linked list of tasks, and then is appending new tasks to it? That means, fork() takes longer the more tasks you already have. In today's time, linked lists are horrible for cache performance, but having to iterate over the entire thing to add a new element just adds another layer of awful. Just make it a doubly-linked circular list! That way, appending a new element is constant time and you don't have to deal with reaching the end. And since you have an idle task, you don't even need to deal with the list being empty.
Carpe diem!
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: yet another james molloy kernel tutorial question

Post by Schol-R-LEA »

nullplan wrote:There are a lot of things that code should do, but fails to. For example, it clears the interrupt flag for some arbitrary reason. Apparently as a replacement for a spinlock.
It is more that the tutorial is outdated - the approach used was typical of hobby OSes prior to 2010 or so. Spinlocks only came into regular use when multicore CPUs became common, and even then were slow to be adopted by hobbyists, as most were using older hardware and didn't realize the trouble it would cause them once they had multiple processors sharing memory and peripherals.

The usual rule was to minimize the time spend in the mutual exclusion zone - generally just long enough to set a mutex, perform a rendezvous, or transfer a message into a mailbox - precisely because it risked dropping interrupts; but it was all too easy to hold it too long if it was being done directly rather than through a minimal API, and if the kernel process in question hung, the whole system would too.

This was actually the method recommended by several textbooks at the time, as well, or at least discussed as the simplest method.

Clearing interrupts to avoid re-entrancy and enforce mutual exclusion was never a good method or mutual exclusion, but on a uniprocessor it could be made to work. Not so on a multiprocessor.

I should add that spinlocks themselves aren't necessarily the best solution; they are simply an easy one. Other means of synchronizing multiprocessor systems were already well established when the 8086 was first designed, but they all required a greater design effort and integration into the scheduler and synchronization primitives - spinlocks were easier to retrofit on to existing kernels.

I would argue that, for a new kernel spinlocks aren't the best approach - but they at least do the job, which clearing interrupts on just one of the processors won't.

I agree with you on the linked lists; other data structures such as a tree or hash table (or more likely, a hybrid tree of hash tables, or something similar to a B-tree) would generally fit this use much better, IMO. Again, this is largely a matter of being dated, as well as being something of a toy design - it assumes it will never have more than a handful of running processes, so it sticks to the simplest design possible (short of a fixed array, which actually was a common - and badly flawed - approach in even earlier tutorials and textbooks, and even in some older production OSes in the 1970s - there was a time when OSes with a fixed limit of 32 or 64 processes were the norm on minis, including in some early versions of Unix). I would say that at minimum, a priority queue would be the starting point for a sensible process table, and even that is probably too simplistic.

(OTOH, for a sleep queue, where the order in which the processes get revived is all that counts, a delta list is pretty reasonable, though again for locality's sake today you might use a dynamic array/list-of-arrays rather than a singly-linked list).

The other design flaws are exactly that - terrible design choices from someone who didn't understand what he was doing as well as he thought he did. The use of a single stack for multiple threads was and remains common in libraries such as the (quite bad) Pthreads, and it is possible Molloy couldn't see how kernel level protected-memory processes differ from userland shared-memory threads - he wouldn't be the first, nor the last, to make that mistake.

I'll be honest, I have been planning to go through Molloy's tutorial myself (as well as some of the others floating around), not to learn from it but to get a sense of just where its flaws are. While I know about the wiki page on known bugs in the tutorial, I don't know enough about the actual codebase to comment on it sensibly.
Last edited by Schol-R-LEA on Fri Aug 14, 2020 6:20 am, edited 1 time in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
ITchimp
Member
Member
Posts: 134
Joined: Sat Aug 18, 2018 8:44 pm

Re: yet another james molloy kernel tutorial question

Post by ITchimp »

I deeply appreciate all your inputs and comments on my question.

James Molloy's context switching in chapter 10 establishes a kernel stack for each task..so I don't really
see part of your critical comments that multiple thread sharing the same stack...

also on single CPU cli, sti does the job, I think for a beginner like me knowing the 80 percent of the key concepts in 20% of the time is where I am right now...
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: yet another james molloy kernel tutorial question

Post by nullplan »

ITchimp wrote:James Molloy's context switching in chapter 10 establishes a kernel stack for each task..so I don't really
see part of your critical comments that multiple thread sharing the same stack...
I just read chapter 9 of the tutorial. I had thought, since he is copying ESP and EBP from the parent stack, he is also reusing the stack. But no, he is using paging to copy the kernel stack and have the same kernel address point to different physical addresses. So there's another point of insanity. This also means that different kernel tasks can never share a stack pointer, since one won't be able to see the stack of the other. Well, it is awfully clever, I'll give him that. Or just awful, depending on your point of view.
ITchimp wrote:also on single CPU cli, sti does the job
It's more of a design problem. CLI and STI are protecting code, when it has long been established that you should be protecting data from concurrent accesses. So you actually find the points of inconsistency and protect those.
Carpe diem!
ITchimp
Member
Member
Posts: 134
Joined: Sat Aug 18, 2018 8:44 pm

Re: yet another james molloy kernel tutorial question

Post by ITchimp »

Yes, that was my reading too. A kernel task acquires independence only when it has its own kernel stack and pg_dir struct (its own virtual to physical mapping).
but since the kernel code/data are shared among all kernel tasks... the content on the stack is from all those initialization functions before kernel transitions from single thread to multithread...

My question to you (Dear nullplan) is that what you would do differently on getting the kernel task its own stack without breaking the kernel execution path?
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: yet another james molloy kernel tutorial question

Post by nullplan »

ITchimp wrote:My question to you (Dear nullplan) is that what you would do differently on getting the kernel task its own stack without breaking the kernel execution path?
Not use fork in kernel code, for one thing. I'm creating a 64-bit kernel, and in that kernel, all physical memory is always mapped. The kernel half of the address space is the same in all address spaces, starting with the memory mirror and ending in the kernel image mapping. With dynamic allocations in the middle. It is advantageous to map the memory needed in the kernel twice, so that fragmentation does not becomes as much of a problem. Also, so that I can use guard pages to prevent a stack overflow from going unnoticed.

Before you can understand task creation, you first must understand my task switching code:

Code: Select all

void switch_to_task(uintptr_t *old_stack, uintptr_t new_stack);

Code: Select all

switch_to_task:
  pushq %r15
  pushq %r14
  pushq %13
  pushq %r12
  pushq %rbp
  pushq %rbx
  movq %rsp, (%rdi)
  movq %rsi,%rsp
  popq %rbx
  popq %rbp
  popq %r12
  popq %r13
  popq %r14
  popq %r15
  retq
The task structure is located at the top of the kernel stack. I'm allocating 2 pages for kernel stack (and another guard page), and I recon I probably won't need that much. And the task structure is a few hundred bytes. So that's why.

Code: Select all

int new_kthread(const char *name, void (*func)(void*), void* arg)
{
  int r = -EAGAIN;
  int tid = alloc_tid();
  if (tid < 0) goto out;
  r = -ENOMEM;
  void *p = do_mmap(0, KSTACK_SIZE + KSTACK_GUARD_SIZE, PROT_NONE, MAP_KERNEL | MAP_ANONYMOUS, -1, 0);
  if (!p) goto out;
  if (do_mprotect((char*)p + KSTACK_GUARD_SIZE, KSTACK_SIZE, MAP_READ | MAP_WRITE)) goto out_unmap;
  struct task *t = (struct task *)((char*)p + KSTACK_SIZE + KSTACK_GUARD_SIZE - sizeof (struct task));
  strlcpy(t->name, name, sizeof t->name);
  t->tid = tid;
  task_list_add(t);
  t->flags = TIF_KERNEL;
  uint64_t *stk = (void*)t;
  stk -= 7;
  stk[0] = (uint64_t)t;
  stk[1] = (uint64_t)func;
  stk[2] = (uint64_t)arg;
  stk[6] = (uint64_t)start_kernel_thread; 
  t->kstack = (uintptr_t)stk;
  t->kstackbot = (uintptr_t)p + KSTACK_GUARD_SIZE;
  mark_task_runnable(t);
  return 0;
out_unmap:
  do_munmap(p, 3 << 12);
out:
  return r;
}
The synchronization happens inside the subroutines there. mark_task_runnable() will queue the task up in the run queue. When the scheduler gets to it, it will pop the task pointer, the function, and the argument into RBX, RBP, and R12, and will then "return" to a label called "start_kernel_thread". What happens there?

Code: Select all

start_kernel_thread:
  sti
  movq %rbx, %rdi
  movq %rbp, %rsi
  movq %r12, %rdx
  xorl %ebp, %ebp
  btrq $3, %rsp
  call start_kernel_thread_c
  ud2
OK, nothing major. Shuffle the arguments around for a C function, call it according to ABI, and expect it not to return. And then that function does what?

Code: Select all

noreturn void start_kernel_thread_c(struct task *t, void (*fn)(void*), void* arg) {
  this_cpu_write(current_task, t);
  fn(arg);
  exit_task(t);
  for (;;)
    schedule();
}
Unsurprisingly, this sets the "current_task" CPU variable to the new task, calls the function, then condemns the thread and calls the scheduler forever. In theory, the scheduler shouldn't return, but since the function is "noreturn", I must also convince the compiler that it doesn't, lest I get a warning. Setting "current_task" must be done from C, because that is just so dynamic.

For a userspace task I basically do the same, except the new task also gets a CR3 (preconfigured from outside), and sets the TSS RSP0 (kernel threads also don't have that because they are already running at CPL 0, so no change of privilege will take place), and then performs an IRET. The frame for that is preconfigured when the stack is first allocated.

See, this way all tasks have their own kernel stack, but also, the kernel half of address space is always consistent. If a kernel tasks shares a stack-based pointer with another task, they will actually see the same memory.
Carpe diem!
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: yet another james molloy kernel tutorial question

Post by nexos »

Forking kernel mode is a bad idea. I know Linux 1.0 did it, but it still isn't good. Instead, you should have a function that spawns the first task in user mode. Then process creation from that point forward is done via forking. I tried basing my fork off of James Molloy, and found that he is over complicating it, and doing it in an insane manner.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
sj95126
Member
Member
Posts: 151
Joined: Tue Aug 11, 2020 12:14 pm

Re: yet another james molloy kernel tutorial question

Post by sj95126 »

nexos wrote:Instead, you should have a function that spawns the first task in user mode.
This is the point where I am right now. I've successfully implemented software-based task switching, but in a fairly hackish manner. Still, it works.

I'm thinking what I should do is have the entry point for the first user process be in a separate page-aligned section of the kernel image (e.g. sections like .usertext, .userdata, .userrodata, etc.), so that I can reassign those pages to the user process page map and disassociate them from the kernel.
Post Reply