How is multithreading done at the OS level?

nightcrawler · Post by **nightcrawler** » Fri Mar 27, 2020 3:18 pm

Sorry if this is covered before, I wasn't able to find a link to it.

When doing multithreading in C, we take advantage of the pthread library and call functions like:
pthread_create
pthread_join
etc.

How is this implemented in the OS itself? Given a system with multiple cores,
how does the OS select a specific CPU core to run instructions on?

nullplan · Post by **nullplan** » Fri Mar 27, 2020 4:14 pm

Wrong levels of abstraction. To the OS scheduler, there are only ever tasks. Threads are tasks that share a single address space, but this is a mere detail. In fact, most schedulers don't really know, nor do they care to, whether the task they are scheduling in is single-threaded or multi-threaded. It is simply not necessary to know.

As for how the scheduler picks the CPU to run on, there are several factors to weigh against each other. Each CPU can only run a single task at any given time, so it might be prudent to move a task from a busy CPU to one that is less busy, but it is usually advantageous to limit CPU migrations. But, there is no cost associated with executing a new task on another CPU. However, I have to make clear that these days, most of the time, no task is runnable, and if a task becomes runnable it does not remain so for very long. So more than one CPU is almost never used. Beyond this advice I would suggest reading up on multi-CPU scheduling techniques, because that is what I will do when I get there.

nightcrawler · Post by **nightcrawler** » Fri Mar 27, 2020 4:37 pm

nullplan wrote:Wrong levels of abstraction. To the OS scheduler, there are only ever tasks. Threads are tasks that share a single address space, but this is a mere detail. In fact, most schedulers don't really know, nor do they care to, whether the task they are scheduling in is single-threaded or multi-threaded. It is simply not necessary to know.

As for how the scheduler picks the CPU to run on, there are several factors to weigh against each other. Each CPU can only run a single task at any given time, so it might be prudent to move a task from a busy CPU to one that is less busy, but it is usually advantageous to limit CPU migrations. But, there is no cost associated with executing a new task on another CPU. However, I have to make clear that these days, most of the time, no task is runnable, and if a task becomes runnable it does not remain so for very long. So more than one CPU is almost never used. Beyond this advice I would suggest reading up on multi-CPU scheduling techniques, because that is what I will do when I get there.

Thanks but that didn't answer my question. I'm interested to know how a specific cpu is picked to run instructions on.
Let's say i'm writing my own "hello world" kernel using this tutorial: https://wiki.osdev.org/Bare_Bones

Assuming the machine has 1 cpu, all the assembly code will just run on that one CPU. What if there are more CPUs and I decide run this all on another CPU, how do I accomplish this?
Are there specific instructions to do so? or specific procedures?

nullplan · Post by **nullplan** » Fri Mar 27, 2020 5:26 pm

The most useful mental model I have of an OS in a multi-CPU system is that all CPUs are executing the same operating system independently of each other. There are specific instructions you can use to start the other CPUs and make them run whatever code you want, but after that you just have them execute the scheduler, and that will somehow load a task into the CPU when there actually is one.

Getting a bit more concrete, the procedure to start secondary CPUs in an x86 based system is described in Intel's Software Developers' Manual. Once you know the local APIC IDs of an enabled CPU, you can send it an INIT IPI, followed by a brief wait and a startup IPI, followed by a long wait and another startup IPI if the CPU failed to respond. That will start the other CPU, but it will run in 16-bit protected mode. Indeed, the initial address will be xx00:0000, where xx is equal to the vector number sent in the startup IPI. This means that the initial code for the AP has to be loaded below the 1MB mark. In my case, the trampoline code will transition to 32-bit protected mode, followed by 64-bit long mode (the page tables must have been taken care of by the BSP), and then it will jump into the entry point I have specified. That entry point will initialize GDT and IDT to their final versions, free the trampoline memory, and execute the scheduler. At this point, the scheduler will execute the idle task, since the CPU has only just announced that it is usable, so there will be no task in its queue. Once a task has been created that actually could run on the other CPU, the scheduler notices the unused CPU lying around and sends it an IPI after queuing up the task. The IPI wakes the AP from its slumber and makes it execute the new task.

I really can't get much more concrete, because at this point this is all highly OS-specific. If you don't know how to send an IPI, I suggest reading up on the APIC. An IPI is vectored like an external interrupt, so you just install your handler in the appropriate IDT slot.

iansjack · Post by **iansjack** » Sat Mar 28, 2020 2:00 am

https://wiki.osdev.org/Multiprocessor_Scheduling

eekee · Post by **eekee** » Sat Mar 28, 2020 3:05 am

nullplan wrote:That will start the other CPU, but it will run in 16-bit protected mode.

Are "legacy-free" systems different? It'll be interesting if they're not. (Now I'm wondering about running different cores in different modes, but that's another topic.)

Korona · Post by **Korona** » Sat Mar 28, 2020 4:06 am

No, they are not. But nullplan probably meant 16-bit real mode unless I am missing something.

bzt · Post by **bzt** » Sat Mar 28, 2020 4:13 am

nightcrawler wrote:I'm interested to know how a specific cpu is picked to run instructions on.

Maybe try to look it the other way around: all cpu is running in parallel, and you don't pick a cpu to run a task on, rather you pick a task on each and every cpu. When you move a task from one cpu core to another, that's a special function called task migration. So all cpus are running the same scheduler, and they all have their own task queue. If one queue is empty, and another is full, then you'll need to do task migration, moving some tasks from the full queue to the empty one. After that both cpus will have a scheduler queue with tasks to pick from.

Cheers,
bzt

nullplan · Post by **nullplan** » Sat Mar 28, 2020 4:29 am

Korona wrote:No, they are not. But nullplan probably meant 16-bit real mode unless I am missing something.

YES, yes, that is what I meant. Geez, what a bad mistake to make.

eekee wrote:Are "legacy-free" systems different?

"Legacy-free" typically refers to the selection of hardware on the mainboard. At least, I have never heard it used otherwise in the context of PC hardware. As such, a legacy-free system may be lacking a traditional BIOS, and it may lack a PIT and a PIC pair, and an ISA DMA unit (etc. I can do this all day), but it will still boot up in 16-bit real mode. It will be quickly made to transition into another mode, but that's where it will start, as long as Intel don't release new CPUs that lack these modes. In that case we will all have to rewrite our trampolines, but that's a story for another time.

eekee wrote:Now I'm wondering about running different cores in different modes, but that's another topic.

Well, the different CPUs are entirely independent of each other, other than sharing physical memory, so if you want to do such a thing, knock yourself out. I don't know why you would want to, but you probably could.

bzt wrote:Maybe try to look it the other way around: all cpu is running in parallel, and you don't pick a cpu to run a task on, rather you pick a task on each and every cpu.

This is why I'm not a teacher: you explain this so much better than me.

eekee · Post by **eekee** » Sat Mar 28, 2020 6:04 am

Korona wrote:No, they are not. But nullplan probably meant 16-bit real mode unless I am missing something.

Ah, thanks.

nullplan wrote:
eekee wrote:Are "legacy-free" systems different?
"Legacy-free" typically refers to the selection of hardware on the mainboard. At least, I have never heard it used otherwise in the context of PC hardware. As such, a legacy-free system may be lacking a traditional BIOS, and it may lack a PIT and a PIC pair, and an ISA DMA unit (etc. I can do this all day), but it will still boot up in 16-bit real mode. It will be quickly made to transition into another mode, but that's where it will start, as long as Intel don't release new CPUs that lack these modes. In that case we will all have to rewrite our trampolines, but that's a story for another time.

I see. I guess I will have to write that code one day. Not for a long time yet.

nullplan wrote:
eekee wrote:Now I'm wondering about running different cores in different modes, but that's another topic.
Well, the different CPUs are entirely independent of each other, other than sharing physical memory, so if you want to do such a thing, knock yourself out. I don't know why you would want to, but you probably could.

Ultimate dosemu.

Actually, this might seriously be useful for upgrading some ancient DOS-based process control system.

bzt wrote:Maybe try to look it the other way around: all cpu is running in parallel, and you don't pick a cpu to run a task on, rather you pick a task on each and every cpu.

That way of looking at it makes it much clearer to me too.

OSDev.org

How is multithreading done at the OS level?

How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?

Re: How is multithreading done at the OS level?