Page 3 of 3

Re: Round Robin Priority Implementation

Posted: Fri Aug 03, 2018 5:02 pm
by Octacone
Brendan wrote:reply to your last post, snipped for readability
Well here is an awesome update.
I managed to understand most of what you've said. Except that the Hardware::Enable_Interrupts(); thing still confuses me, but that is okay because I found a workaround that works just as fine.
This is what my timer handler looks like currently:

Code: Select all

void Programmable_Interval_Timer::Handler(multitasking_registers_t* multitasking_registers)
{
    tick_count++; //thingy that rolls over
    ticks_since_boot++; //obvious
    auto current_thread = Multitasking.Get_Current_Thread(); //obvious
    if(multitasking_status == true)
    {
        if(current_thread != null) //so we don't get a page fault, this is true 99.99% of the time
        {
            if(--Multitasking.Get_Current_Thread()->current_quantum == 0) //is it time for a switch
            {
                Hardware::Disable_Interrupts();
                Multitasking.Schedule(multitasking_registers);
            }
        }
        else // this only gets called when we do the first initial switch
        {
            Hardware::Disable_Interrupts();
            Multitasking.Schedule(multitasking_registers);
        }
    }
}
What about enabling the interrupts?
When saving the current state I just modify eflags and that's it.
I managed to implement both the priority and time slices abstraction and it all works well.
I'm really amazed that I managed to pull this off. It ran for like 30 minutes without a problem.
Higher the priority the less CPU time it has.

Do my quantums seem reasonable btw?

Code: Select all

idle = 20,
normal = 15,
medium = 10,
high = 5 //this is a nice PUN
I know I have to implement some sort of Block_Current_Task(reason); and Unblock_Task(specific thread); but I can't do that right now since I don't have any system calls, or I could do a temp fake block system call function just to test it. Any suggestion on that?

Also I will need to implement a thread removal function and some actual processes (not just threads) (multiple threads per process, custom address space, you know what is it anyways). I guess sleep(ms)? Stuff related to blocking, any other things to consider implementing?

How do I update my time slices? Once a threads reaches 0 and gets preempted by the timer interrupt do I just restore its time slice to what it was before etc?

Since each of the threads has a separate stack where should I map that thing (currently I'm just ID mapping it), once I have a working process implementation.
Also when allocation space for my threads where should I map it, too? Kernel threads to 3 GB? User threads to???

When it comes to my kernel kernel (talking about the Kernel_Main execution flow). What would I need to do in order to continue execution after Multitasking_Initialize();, or to just initialize everything beforehand and leave multitasking as the last.

I also have some other question but lets not get ahead of ourselves (myself lol).
And yes my kernel is monolithic, if you didn't know because you've had signatures disabled.

Re: Round Robin Priority Implementation

Posted: Fri Aug 03, 2018 6:53 pm
by Brendan
Hi,
Octacone wrote:This is what my timer handler looks like currently:

Code: Select all

void Programmable_Interval_Timer::Handler(multitasking_registers_t* multitasking_registers)
{
    tick_count++; //thingy that rolls over
    ticks_since_boot++; //obvious
    auto current_thread = Multitasking.Get_Current_Thread(); //obvious
    if(multitasking_status == true)
    {
        if(current_thread != null) //so we don't get a page fault, this is true 99.99% of the time
        {
            if(--Multitasking.Get_Current_Thread()->current_quantum == 0) //is it time for a switch
            {
                Hardware::Disable_Interrupts();
                Multitasking.Schedule(multitasking_registers);
            }
        }
        else // this only gets called when we do the first initial switch
        {
            Hardware::Disable_Interrupts();
            Multitasking.Schedule(multitasking_registers);
        }
    }
}
That looks OK for now..
Octacone wrote:What about enabling the interrupts?
When saving the current state I just modify eflags and that's it.
That will probably cause race conditions later (and there shouldn't be any reason to save and load EFLAGS).

In various places you will want to block a task, and then do something as soon as the task is unblocked (without anything being able to interrupt after the task is unblocked but before you do something).

For a simple example; imagine if a process wants to allocate more memory but the OS doesn't have any memory left. In this case you'll want to block the task and start freeing up some memory (e.g. sending pages of data to swap space, flushing any cached writes to disk, etc). Then when there's enough memory free you'd want to unblock the task and immediately allocate the memory before anything else can interrupt and steal all the free memory. Otherwise you'll have to do a horrible loop where you repeated do "tell kernel to free some more memory and block while memory is being freed, unblock when memory is free, get interrupted, then try to allocate the memory and fail because it was stolen and retry".

Mostly (for maximum flexibility) you want something like:
  • Disable IRQs
  • (Optional) Do things that need to be done immediately before blocking while IRQs are disabled
  • Block the task (allowing some other task to run and allowing some other task to enable IRQs)
    Wait for task to be unblocked (while IRQs are enabled because some other task enabled them)
  • Disable IRQs and then do a task switch back to this task
  • (Optional) Do things that need to be done immediately after unblocking while IRQs are still disabled
  • Enable IRQs
Octacone wrote:I know I have to implement some sort of Block_Current_Task(reason); and Unblock_Task(specific thread); but I can't do that right now since I don't have any system calls, or I could do a temp fake block system call function just to test it. Any suggestion on that?
Implement a "pause until some other task un-pauses" feature; then have two kernel tasks where the first kernel task "un-pauses" the second kernel task and then pauses itself; and where the second kernel task "un-pauses" the first kernel task and then pauses itself. If you do it right (and if you avoid the "task was un-paused before it paused itself" problem); you'd end up with two kernel tasks that are constantly blocking/unblocking.

Note that this feature may be useful later (for one example, when the user tells a debugger to let the program being debugged continue running); and would be the easiest thing to implement because you'd only need the "Block_Current_Task(reason);" and "Unblock_Task(specific thread);" and nothing else.
Octacone wrote:How do I update my time slices? Once a threads reaches 0 and gets preempted by the timer interrupt do I just restore its time slice to what it was before etc?
Yes.
Octacone wrote:Since each of the threads has a separate stack where should I map that thing (currently I'm just ID mapping it), once I have a working process implementation.
You'd probably just use "new_thread_stack_address = kmalloc(size)";" and let the kernel's heap figure out where it is.
Octacone wrote:Also when allocation space for my threads where should I map it, too? Kernel threads to 3 GB? User threads to???
For user-space; when a new process is started you'd have some kind of executable loader in the kernel (that uses kernel stack) which loads the executable into user-space and figures out where the initial thread's user-space stack is (most likely controlled by information in the executable file's header). Note: For some operating systems there's a very minimal executable loader in the kernel that starts a full-blown second executable loader, where the second executable loader runs in user-space.

When a process has already been started and is spawning more thread/s, the process (e.g. maybe a pthreads library or something) will allocate space itself (e.g. using whatever kind of memory management the process uses) and tell the kernel where the new thread's stack should be.
Octacone wrote:When it comes to my kernel kernel (talking about the Kernel_Main execution flow). What would I need to do in order to continue execution after Multitasking_Initialize();, or to just initialize everything beforehand and leave multitasking as the last.
You'd go from "one task running, without meta-data for that task" (before "Multitasking_Initialize();" has been called) to "one task running, with meta-data for that task" (just after "Multitasking_Initialize();" has been called). I'd do this relatively early (e.g. as soon as memory management is initialised enough to allow you to allocate memory for the task's meta-data) because you can't create more kernel tasks or do anything that causes a task to block (e.g. "nano_sleep()" needed for small delays in lots of different device drivers) until multi-tasking has been initialised.


Cheers,

Brendan

Re: Round Robin Priority Implementation

Posted: Sun Aug 05, 2018 6:36 am
by Octacone
Brendan wrote:Hi,
Octacone wrote:This is what my timer handler looks like currently:

Code: Select all

void Programmable_Interval_Timer::Handler(multitasking_registers_t* multitasking_registers)
{
    tick_count++; //thingy that rolls over
    ticks_since_boot++; //obvious
    auto current_thread = Multitasking.Get_Current_Thread(); //obvious
    if(multitasking_status == true)
    {
        if(current_thread != null) //so we don't get a page fault, this is true 99.99% of the time
        {
            if(--Multitasking.Get_Current_Thread()->current_quantum == 0) //is it time for a switch
            {
                Hardware::Disable_Interrupts();
                Multitasking.Schedule(multitasking_registers);
            }
        }
        else // this only gets called when we do the first initial switch
        {
            Hardware::Disable_Interrupts();
            Multitasking.Schedule(multitasking_registers);
        }
    }
}
That looks OK for now..
Octacone wrote:What about enabling the interrupts?
When saving the current state I just modify eflags and that's it.
That will probably cause race conditions later (and there shouldn't be any reason to save and load EFLAGS).

In various places you will want to block a task, and then do something as soon as the task is unblocked (without anything being able to interrupt after the task is unblocked but before you do something).

For a simple example; imagine if a process wants to allocate more memory but the OS doesn't have any memory left. In this case you'll want to block the task and start freeing up some memory (e.g. sending pages of data to swap space, flushing any cached writes to disk, etc). Then when there's enough memory free you'd want to unblock the task and immediately allocate the memory before anything else can interrupt and steal all the free memory. Otherwise you'll have to do a horrible loop where you repeated do "tell kernel to free some more memory and block while memory is being freed, unblock when memory is free, get interrupted, then try to allocate the memory and fail because it was stolen and retry".

Mostly (for maximum flexibility) you want something like:
  • Disable IRQs
  • (Optional) Do things that need to be done immediately before blocking while IRQs are disabled
  • Block the task (allowing some other task to run and allowing some other task to enable IRQs)
    Wait for task to be unblocked (while IRQs are enabled because some other task enabled them)
  • Disable IRQs and then do a task switch back to this task
  • (Optional) Do things that need to be done immediately after unblocking while IRQs are still disabled
  • Enable IRQs
Octacone wrote:I know I have to implement some sort of Block_Current_Task(reason); and Unblock_Task(specific thread); but I can't do that right now since I don't have any system calls, or I could do a temp fake block system call function just to test it. Any suggestion on that?
Implement a "pause until some other task un-pauses" feature; then have two kernel tasks where the first kernel task "un-pauses" the second kernel task and then pauses itself; and where the second kernel task "un-pauses" the first kernel task and then pauses itself. If you do it right (and if you avoid the "task was un-paused before it paused itself" problem); you'd end up with two kernel tasks that are constantly blocking/unblocking.

Note that this feature may be useful later (for one example, when the user tells a debugger to let the program being debugged continue running); and would be the easiest thing to implement because you'd only need the "Block_Current_Task(reason);" and "Unblock_Task(specific thread);" and nothing else.
Octacone wrote:How do I update my time slices? Once a threads reaches 0 and gets preempted by the timer interrupt do I just restore its time slice to what it was before etc?
Yes.
Octacone wrote:Since each of the threads has a separate stack where should I map that thing (currently I'm just ID mapping it), once I have a working process implementation.
You'd probably just use "new_thread_stack_address = kmalloc(size)";" and let the kernel's heap figure out where it is.
Octacone wrote:Also when allocation space for my threads where should I map it, too? Kernel threads to 3 GB? User threads to???
For user-space; when a new process is started you'd have some kind of executable loader in the kernel (that uses kernel stack) which loads the executable into user-space and figures out where the initial thread's user-space stack is (most likely controlled by information in the executable file's header). Note: For some operating systems there's a very minimal executable loader in the kernel that starts a full-blown second executable loader, where the second executable loader runs in user-space.

When a process has already been started and is spawning more thread/s, the process (e.g. maybe a pthreads library or something) will allocate space itself (e.g. using whatever kind of memory management the process uses) and tell the kernel where the new thread's stack should be.
Octacone wrote:When it comes to my kernel kernel (talking about the Kernel_Main execution flow). What would I need to do in order to continue execution after Multitasking_Initialize();, or to just initialize everything beforehand and leave multitasking as the last.
You'd go from "one task running, without meta-data for that task" (before "Multitasking_Initialize();" has been called) to "one task running, with meta-data for that task" (just after "Multitasking_Initialize();" has been called). I'd do this relatively early (e.g. as soon as memory management is initialised enough to allow you to allocate memory for the task's meta-data) because you can't create more kernel tasks or do anything that causes a task to block (e.g. "nano_sleep()" needed for small delays in lots of different device drivers) until multi-tasking has been initialised.


Cheers,

Brendan
I need to save EFlags because of virtual 8086 mode and some other flags such as carry and auxiliary that the CPU needs.
That's a good point about fairness, I guess I'll have to look at that and try to fix it somehow.
As far as interrupts go, I will read your replies over and over until a light-bulb turns on.
I also wanted to ask you about spinlocks, semaphores, mutex stuff. What is the best thing to use for a singe core monolithic kernel?
They are used to lock the actual resource not the thread, right?
Since we're talking about thread blocking, is it as simple as having an enum with different "blocked reasons" and not letting the thread run until it gets unblocked?
What about user mode threads, how hard is to implement that? Any significant changes required?
What about stack alignment, currently my stacks are page aligned, is that okay? Doe my main kernel stack isn't, could that cause a significant performance hit?
Once I actually implement processes, do you think that mapping all the stacks to the same location (same virtual addresses, different physical addresses) would be okay?
Also you didn't comment anything on my quantums, they're okay then?

Sorry for asking this many questions, I just want something as crucial as this to be coded properly.

Re: Round Robin Priority Implementation

Posted: Sun Aug 05, 2018 10:11 pm
by Brendan
Hi,
Octacone wrote:I need to save EFlags because of virtual 8086 mode and some other flags such as carry and auxiliary that the CPU needs.
No, you don't.

Sometimes EFLAGS needs to be saved when the CPU switches from user-space to kernel (and loaded again when the CPU switches from kernel back to user-space), and for all of the cases where it's needed (IRQs and exceptions) the CPU automatically saves and loads EFLAGS for you. However; switching from user-space to kernel has nothing to do with task switches at all, and switching from kernel to user-space has nothing to do with task switches at all; and the code that does task switches never needs to save or load EFLAGS because (in cases where EFLAGS does need to be saved and loaded somewhere) it has already been taken care of by code that has nothing to do with task switching (as part of the "user-space <-> kernel" privilege level changes).
Octacone wrote:I also wanted to ask you about spinlocks, semaphores, mutex stuff. What is the best thing to use for a singe core monolithic kernel?
For Kernel

For multi-CPU I'd have 2 types of spinlocks where one disables IRQs and the other doesn't, but where both postpone task switches. For single-CPU I'd do it exactly the same except that I wouldn't actually have any lock - e.g. just disable IRQs or not, and the "postpone task switches" logic. This makes it easier to support multi-CPU later because you can mostly just define a few macros and use conditional code (like "#ifdef SMP") to skip the lock itself (but I'd recommend adding support for multi-CPU as soon as you can because it's easier to find mistakes one at a time as you create them than it is to deal with many mistakes at once).

For the "postpone task switching" logic; you'd increment a "task switches disabled" counter at the start of a critical section (where you'd acquire a lock if was multi-CPU); then before doing a task switch the scheduler would check if this counter is zero, and if the counter is not zero the scheduler would set a "task switch/es were postponed" flag instead of doing any task switch. Then at the end of a critical section (where you'd release a lock if was multi-CPU) you'd (atomically) decrement the "task switches disabled" counter and see if it was decremented to zero; and if it did become zero you'd tell the scheduler to do the task switch it postponed.

For mutexes and semaphores; they're mostly just a list of tasks waiting for the mutex/semaphore that is protected by a spinlock. When you acquire the mutex/semaphore you begin by acquiring the spinlock, and then check if the mutex/semphore can be acquired. If the mutex/semaphore can be acquired you acquire it and release the spinlock. If the mutex/semaphore can't be acquired you put the task on the list of tasks waiting for the mutex/semaphore, then block the task (where the task switch will be postponed) then release the spinlock (causing the postponed task switch to happen).

When you release the mutex/semaphore you begin by acquiring the spinlock again, then release the mutex/semaphore, then check if the list of tasks waiting for the mutex/semaphore is empty and remove and unblock a task if it's not, then release the spinlock.

For User-Space

For user-space; spinlocks don't make sense. For mutexes/semaphores I'd have a kernel API function that does an "atomically check value in memory and block task if value in memory wasn't zero" operation that uses the same "list of tasks waiting for that mutex/semaphore" that is used for kernel's mutexes/semaphores (with some additional code to associate the list in kernel-space with the memory location in user-space, so the kernel can figure out which list corresponds to which mutex/semaphore). Then I'd have another kernel API function that does "unblock a task on the list of tasks associated with this memory location".

The user-space side of things would use these kernel API functions; so that if the mutex/semaphore can't be acquired the "atomically check value in memory and block task if value in memory wasn't zero" kernel API function is called; and when the mutex/semaphore is released it'd call the "unblock a task on the list of tasks associated with this memory location" kernel API function. The main idea here is that most of the time (if there's no contention) neither kernel API function would be used (the whole acquire and release would happen purely in user-space).
Octacone wrote:They are used to lock the actual resource not the thread, right?
Yes - all kinds of locks (spinlocks, mutexes, semaphores) are used to protect data and not code (unless code is treated as data, in a "self-modifying code" way).
Octacone wrote:Since we're talking about thread blocking, is it as simple as having an enum with different "blocked reasons" and not letting the thread run until it gets unblocked?
That's the basic idea; but there's a whole pile of race conditions. For example, you don't want to check if the lock can be acquired (and then have a different task release the lock and wake up any waiting tasks) and then block until the lock is released (after it's too late and the task won't be unblocked for ages).
Octacone wrote:What about user mode threads, how hard is to implement that? Any significant changes required?
For user-space threads all of the multi-tasking remains the same; you just add extra/unrelated code to do the "user-space <-> kernel" switching and extra/unrelated code for the kernel API and extra/unrelated code for an executable loader.
Octacone wrote:What about stack alignment, currently my stacks are page aligned, is that okay? Doe my main kernel stack isn't, could that cause a significant performance hit?
That depends on how unaligned it is. For a 32-bit kernel the stack probably only needs to be 4-byte aligned (unless you use SSE in the kernel and need stack to be 16-byte aligned for that, but using SSE in the kernel is a bad idea anyway).
Octacone wrote:Once I actually implement processes, do you think that mapping all the stacks to the same location (same virtual addresses, different physical addresses) would be okay?
For multiple (single-threaded) processes; putting the stacks at the same virtual address (in different virtual address spaces) is fine.

For multi-threaded processes normally you can't have 2 threads in the same process (in the same virtual address space) using the same address for their stacks; but that depends on your OS (e.g. whether you implement "thread specific storage" or just "thread local storage").
Octacone wrote:Also you didn't comment anything on my quantums, they're okay then?
They're fine for now. Later you'll probably change scheduling algorithms and/or make the quantums depend on how fast the CPU is and/or make other changes; and eventually you'll have enough user-space stuff done to be able to test the OS under various conditions; so you should assume that whatever you use now is a temporary place-holder.


Cheers,

Brendan

Re: Round Robin Priority Implementation

Posted: Tue Aug 07, 2018 5:31 am
by Octacone
Brendan wrote:Hi,
Octacone wrote:I need to save EFlags because of virtual 8086 mode and some other flags such as carry and auxiliary that the CPU needs.
No, you don't.

Sometimes EFLAGS needs to be saved when the CPU switches from user-space to kernel (and loaded again when the CPU switches from kernel back to user-space), and for all of the cases where it's needed (IRQs and exceptions) the CPU automatically saves and loads EFLAGS for you. However; switching from user-space to kernel has nothing to do with task switches at all, and switching from kernel to user-space has nothing to do with task switches at all; and the code that does task switches never needs to save or load EFLAGS because (in cases where EFLAGS does need to be saved and loaded somewhere) it has already been taken care of by code that has nothing to do with task switching (as part of the "user-space <-> kernel" privilege level changes).
Octacone wrote:I also wanted to ask you about spinlocks, semaphores, mutex stuff. What is the best thing to use for a singe core monolithic kernel?
For Kernel

For multi-CPU I'd have 2 types of spinlocks where one disables IRQs and the other doesn't, but where both postpone task switches. For single-CPU I'd do it exactly the same except that I wouldn't actually have any lock - e.g. just disable IRQs or not, and the "postpone task switches" logic. This makes it easier to support multi-CPU later because you can mostly just define a few macros and use conditional code (like "#ifdef SMP") to skip the lock itself (but I'd recommend adding support for multi-CPU as soon as you can because it's easier to find mistakes one at a time as you create them than it is to deal with many mistakes at once).

For the "postpone task switching" logic; you'd increment a "task switches disabled" counter at the start of a critical section (where you'd acquire a lock if was multi-CPU); then before doing a task switch the scheduler would check if this counter is zero, and if the counter is not zero the scheduler would set a "task switch/es were postponed" flag instead of doing any task switch. Then at the end of a critical section (where you'd release a lock if was multi-CPU) you'd (atomically) decrement the "task switches disabled" counter and see if it was decremented to zero; and if it did become zero you'd tell the scheduler to do the task switch it postponed.

For mutexes and semaphores; they're mostly just a list of tasks waiting for the mutex/semaphore that is protected by a spinlock. When you acquire the mutex/semaphore you begin by acquiring the spinlock, and then check if the mutex/semphore can be acquired. If the mutex/semaphore can be acquired you acquire it and release the spinlock. If the mutex/semaphore can't be acquired you put the task on the list of tasks waiting for the mutex/semaphore, then block the task (where the task switch will be postponed) then release the spinlock (causing the postponed task switch to happen).

When you release the mutex/semaphore you begin by acquiring the spinlock again, then release the mutex/semaphore, then check if the list of tasks waiting for the mutex/semaphore is empty and remove and unblock a task if it's not, then release the spinlock.

For User-Space

For user-space; spinlocks don't make sense. For mutexes/semaphores I'd have a kernel API function that does an "atomically check value in memory and block task if value in memory wasn't zero" operation that uses the same "list of tasks waiting for that mutex/semaphore" that is used for kernel's mutexes/semaphores (with some additional code to associate the list in kernel-space with the memory location in user-space, so the kernel can figure out which list corresponds to which mutex/semaphore). Then I'd have another kernel API function that does "unblock a task on the list of tasks associated with this memory location".

The user-space side of things would use these kernel API functions; so that if the mutex/semaphore can't be acquired the "atomically check value in memory and block task if value in memory wasn't zero" kernel API function is called; and when the mutex/semaphore is released it'd call the "unblock a task on the list of tasks associated with this memory location" kernel API function. The main idea here is that most of the time (if there's no contention) neither kernel API function would be used (the whole acquire and release would happen purely in user-space).
Octacone wrote:They are used to lock the actual resource not the thread, right?
Yes - all kinds of locks (spinlocks, mutexes, semaphores) are used to protect data and not code (unless code is treated as data, in a "self-modifying code" way).
Octacone wrote:Since we're talking about thread blocking, is it as simple as having an enum with different "blocked reasons" and not letting the thread run until it gets unblocked?
That's the basic idea; but there's a whole pile of race conditions. For example, you don't want to check if the lock can be acquired (and then have a different task release the lock and wake up any waiting tasks) and then block until the lock is released (after it's too late and the task won't be unblocked for ages).
Octacone wrote:What about user mode threads, how hard is to implement that? Any significant changes required?
For user-space threads all of the multi-tasking remains the same; you just add extra/unrelated code to do the "user-space <-> kernel" switching and extra/unrelated code for the kernel API and extra/unrelated code for an executable loader.
Octacone wrote:What about stack alignment, currently my stacks are page aligned, is that okay? Doe my main kernel stack isn't, could that cause a significant performance hit?
That depends on how unaligned it is. For a 32-bit kernel the stack probably only needs to be 4-byte aligned (unless you use SSE in the kernel and need stack to be 16-byte aligned for that, but using SSE in the kernel is a bad idea anyway).
Octacone wrote:Once I actually implement processes, do you think that mapping all the stacks to the same location (same virtual addresses, different physical addresses) would be okay?
For multiple (single-threaded) processes; putting the stacks at the same virtual address (in different virtual address spaces) is fine.

For multi-threaded processes normally you can't have 2 threads in the same process (in the same virtual address space) using the same address for their stacks; but that depends on your OS (e.g. whether you implement "thread specific storage" or just "thread local storage").
Octacone wrote:Also you didn't comment anything on my quantums, they're okay then?
They're fine for now. Later you'll probably change scheduling algorithms and/or make the quantums depend on how fast the CPU is and/or make other changes; and eventually you'll have enough user-space stuff done to be able to test the OS under various conditions; so you should assume that whatever you use now is a temporary place-holder.


Cheers,

Brendan
That is all I wanted to know, for now.
Thanks for your patience and will to answer all my questions.
Also thanks to other people who contributed, now onto coding.


To answer your question from the previous thread:
Why do so many people have wrong ideas about multitasking?
1.People read James' tutorial, which is not written very well. (not bashing anyone, just saying the facts)
2.People don't realize it's not how you do it.
3.They implement it.
4.Even more people do the same and implement it again.
5.A year later 10 genuine noobs come, they don't read the tutorial but the implementations of people that have done it before, thinking "Well if everybody does it this way, why not do the same?"
6.They cycle repeats.

What did I study?
1.OSDev wiki -> mostly generic stuff, no explicit details (I know you've done some improvements recently)
2.OSDev forums -> mostly problems, page faults, global protection faults, SMP, inline assembly issues, more advanced implementations but only briefly, user mode problems, stack problems, I would recommend reading your replies because they really helped me a lot and are very detailed.
3.GitHub, GitLab -> lots of code to study, similar implementations (mostly tutorial based), or just too advanced implementations (for a beginner)
4.Some other miscellaneous tutorials (not James') -> very minimalistic implementations with IRQs and iret