OSDev.org

Posted: **Tue Mar 18, 2008 4:39 pm**

it has already been discussed in this topic.

you will need to use a locking mechanism to stop the CPU's violating each other memory however if you use a shared scheduler

Posted: **Wed Mar 19, 2008 12:14 am**

bewing wrote:
What's the point of multi-threading if the threads are running on a single CPU only?
Each thread can block independently.

wouldn't it be better if they could run on seperate CPU's so both parts are done simultaneously, rather than on a single CPU where they are sharing cylces?
If that one process was the only one running on a machine (which it NEVER is) then that would be great. If there are 500 tasks sharing cycles on 8 cores, then you gain absolutely nothing by spreading the threads between cores. In fact, you lose a tremendous amount of "affinity".

(And I did oversimplify my load balancing method description. It won't suffer any of the "obvious" errors. But it may have some subtle ones.)

Yes, but even if you have 20 tasks running with 4 cpu's, and someone is trying to play a game that is multi-threaded at high priority (like the quake 3 engine is), it would be much more efficient to have the threads on different CPU's (where that game has seen nice increases on multi cpu/multi-core cpu setups). Running all the threads (most of which are non-blocking) on a single CPU would not net any increase in speed vs. a single core CPU, outside of the fact that a few of the low cycle processes could run on another CPU (which would mostly be idle during gameplay). Also, for something like a very large web server (i know not everyone writes a gaming OS, so lets go to the other side of things), spawning multiple tasks is common, so they can using blocking I/O for the connections. So, we have 1000 threads for 1000 people connected, and all 1000 threads are on a single CPU while the other CPU's do very little, this does not seem ideal to me. Sure, in some cases, there are reasons to want to have a child thread on the same CPU as it's parent, but those are typically best cases, not worst cases like I have provided. I would rather my OS be efficient in the worst case/heavy load, than be the best at idling

.

And if I have 500 threads sharing cycles, and 99% of those are idle, then spreading them out among cores wouldn help, especially if 480 of those threads where on the same CPU because they all spawned from one process, and that left 20 threads for the other 7 cpu's to share. That does not sound balanced at all, and if all 500 threads where seperate and active, with only 2 of them being from the same process, would it really make a difference if they were on the same CPU or not? Not likely. It's not my OS, i'm just giving my perspect/reasoning for wanting to do things the way I explained. If you get really bored and wanted to make some benchmarks, I would be willing to write an implementation for your OS (if you've got specs) and do some testing to see what is more efficient, because I really haven't done any testing, it's just the way I view it in my head. Sharing a CPU is fine if the other CPU's aren't starving, but I can see them starving doing something intensive, like playing a game, running a server, etc. I would like a few, real world, examples of when your method of each sub-thread on the same CPU would really be a bonus to speed. I have provided examples (and there are many more, like video compression, i know we've all seen the benchmarks between single and multi-core CPU's doing compression and decompression, running multiple threads on seperate CPU's net faster speeds, and these are things that people do every day with thier PC).

For sharing threads among CPU's there are a few methods, I'll discuss the 2 main ones.
1.) Sharing a thread queue: Each CPU has it's own interrupts firing at specific intervals, when it fires to switch tasks, it locks the queue, and searches for the next thread/process and goes to it. Each CPU just pulls the next process off the list. The problem is, the chances of each CPU spinning while waiting for the queue to become unlocked is pretty high, since they are all pulling from one pool. This means wasted cycles, but on the plus side, there really isn't any load balancing, as each CPU is running all the tasks in the list, so they never stop grabbing tasks.
2.) List per CPU: Just how it sounds, each CPU has its own list, the benefit here is, that, even if you lock the list (so tasks can be added/removed), the chance of another CPU spinning is much less, since most of the time will be spent just reading it's own task list, meaning the CPU cycles are used more smartly. The down side is, now you have to try to balance your processes among processors, and keep track of multiple lists, and swap tasks when one CPU has to much idle time, or whatever. There is a lot of housekeeping going on, so this method is only worth it, if the housekeeping takes less cycles than the spinning that is done with the other method.

Now, there are plenty of sub-methods for each of these, but they are the main overviews. There are also plenty of ways to implement and try to get around the negative sides, but those are the problems that each specific OS needs to solve depending on your goals. If I want to build a single app gaming OS, my goals and implementations would probably be much different than someone shooting for a server, etc.

Posted: **Wed Mar 19, 2008 7:10 am**

lukem95 wrote:it has already been discussed in this topic.

you will need to use a locking mechanism to stop the CPU's violating each other memory however if you use a shared scheduler

I have yet initialized processors with the same GDT, TSS, IDT and PDBR. And each processor is in a cycle:
while(1);

The scheduler is locked with spinlocks.

What must I do now?

Another question:
When an IRQ happens (for example the timer IRQ) each AP is interrupted? If not, how can I switch tasks if the IRQ happens only in the BSP?

Posted: **Wed Mar 19, 2008 10:41 am**

This depends on how the IO/APIC is setup. You can route the IRQ's to each core you want. If you use the legacy PIC the IRQ's are always routed to the BSP.

Posted: **Wed Mar 19, 2008 10:51 am**

cyr1x wrote:If you use the legacy PIC the IRQ's are always routed to the BSP.

Then how can I switch tasks in the APs?

Posted: **Wed Mar 19, 2008 10:53 am**

Use the IOAPIC

Posted: **Wed Mar 19, 2008 12:16 pm**

Combuster wrote:Use the IOAPIC

could you explain me how to switch tasks in APs ?

Posted: **Wed Mar 19, 2008 12:42 pm**

Basically scheduling on an AP is done the same way you do it on the BSP. The only difference is that you might need to route interrupts to a different processor.

Is it that non-obvious?

Posted: **Wed Mar 19, 2008 1:07 pm**

Combuster wrote:Basically scheduling on an AP is done the same way you do it on the BSP. The only difference is that you might need to route interrupts to a different processor.

Is it that non-obvious?

Yes it's obvious... But how can I route interrupts?

Posted: **Wed Mar 19, 2008 4:16 pm**

if you wanted to have a tasking model for multiple processors using legacy PIC you could (as a theoretical idea):

When each AP starts have them check a struct at a shared data address

Code: Select all

APTaskstruct_t* ap = (APTaskstruct*) shared_address;

The number of processors is then read, and incremented to tell the other CPU's that this one has been initialised.

Code: Select all

int i = ap->num_ap;
ap->num_ap++;

Another struct is then read (containing a magic number and a pointer to a task list) for the AP. This is created by the BSP after detecting the number of AP's.

Code: Select all

APTaskptr_t* ptr = (APTaskptr_t*) shared_address+(sizeof(APTaskptr_t * i));

The task struct can then be read from that address. Tasks are added to the list by the BSP.

The timer fires on the BSP (as its using legacy PIC), so it decreases each tasks timeslice for each processor, instead of just for its own CPU. This is done through the task lists.

DISCLAIMER: this is PURELY theoretical.. i have thought of it only today, but it seems logical to me. If anybody spots a weakness feel free to C&C

Posted: **Wed Mar 19, 2008 4:54 pm**

You'd still need to do some IRQ routing in order to get the AP's to trap into scheduling to a new task. In this case you'd probably use the Local APIC to send the same messages.

Each processor also has a timer in its local APIC, you can use that to interrupt the processors without needing an IRQ routed to that core.

Posted: **Thu Mar 20, 2008 10:37 am**

Combuster wrote:You'd still need to do some IRQ routing in order to get the AP's to trap into scheduling to a new task. In this case you'd probably use the Local APIC to send the same messages.

Each processor also has a timer in its local APIC, you can use that to interrupt the processors without needing an IRQ routed to that core.

So I can do two things:
1 - IRQ Routing with IOAPIC
2 - Timers setup in each LAPIC

Right?

Have you any tutorial or source code about this?

Posted: **Thu Mar 20, 2008 10:59 am**

theres a tutorial on (L)APIC on osdever.net

Posted: **Thu Mar 20, 2008 2:48 pm**

lukem95 wrote:theres a tutorial on (L)APIC on osdever.net

Yes I know it, but I need something else.

However I can do these two things:
1 - IRQ Routing with IOAPIC
2 - Timers setup in each LAPIC

right?

Posted: **Fri Mar 21, 2008 1:35 pm**

MarkOS wrote: 1 - IRQ Routing with IOAPIC

[url=http://www.intel.com/design/chipsets/datashts/290566.htm]82093AA I/O ADVANCED
PROGRAMMABLE INTERRUPT
CONTROLLER (IOAPIC)[/url]

and

[url=http://www.intel.com/design/pentium/datashts/242016.htm]MultiProcessor
Specification[/url]

MarkOS wrote: 2 - Timers setup in each LAPIC

[url=http://download.intel.com/design/processor/manuals/253668.pdf]IntelÂ® 64 and IA-32 Architectures
Software Developerâ€™s Manual
Volume 3A:
System Programming Guide, Part 1[/url]

There's everything you need to know.

OSDev.org

ACPI or MP?