ACPI or MP?
Yes, but even if you have 20 tasks running with 4 cpu's, and someone is trying to play a game that is multi-threaded at high priority (like the quake 3 engine is), it would be much more efficient to have the threads on different CPU's (where that game has seen nice increases on multi cpu/multi-core cpu setups). Running all the threads (most of which are non-blocking) on a single CPU would not net any increase in speed vs. a single core CPU, outside of the fact that a few of the low cycle processes could run on another CPU (which would mostly be idle during gameplay). Also, for something like a very large web server (i know not everyone writes a gaming OS, so lets go to the other side of things), spawning multiple tasks is common, so they can using blocking I/O for the connections. So, we have 1000 threads for 1000 people connected, and all 1000 threads are on a single CPU while the other CPU's do very little, this does not seem ideal to me. Sure, in some cases, there are reasons to want to have a child thread on the same CPU as it's parent, but those are typically best cases, not worst cases like I have provided. I would rather my OS be efficient in the worst case/heavy load, than be the best at idling .bewing wrote:Each thread can block independently.What's the point of multi-threading if the threads are running on a single CPU only?
If that one process was the only one running on a machine (which it NEVER is) then that would be great. If there are 500 tasks sharing cycles on 8 cores, then you gain absolutely nothing by spreading the threads between cores. In fact, you lose a tremendous amount of "affinity".wouldn't it be better if they could run on seperate CPU's so both parts are done simultaneously, rather than on a single CPU where they are sharing cylces?
(And I did oversimplify my load balancing method description. It won't suffer any of the "obvious" errors. But it may have some subtle ones.)
And if I have 500 threads sharing cycles, and 99% of those are idle, then spreading them out among cores wouldn help, especially if 480 of those threads where on the same CPU because they all spawned from one process, and that left 20 threads for the other 7 cpu's to share. That does not sound balanced at all, and if all 500 threads where seperate and active, with only 2 of them being from the same process, would it really make a difference if they were on the same CPU or not? Not likely. It's not my OS, i'm just giving my perspect/reasoning for wanting to do things the way I explained. If you get really bored and wanted to make some benchmarks, I would be willing to write an implementation for your OS (if you've got specs) and do some testing to see what is more efficient, because I really haven't done any testing, it's just the way I view it in my head. Sharing a CPU is fine if the other CPU's aren't starving, but I can see them starving doing something intensive, like playing a game, running a server, etc. I would like a few, real world, examples of when your method of each sub-thread on the same CPU would really be a bonus to speed. I have provided examples (and there are many more, like video compression, i know we've all seen the benchmarks between single and multi-core CPU's doing compression and decompression, running multiple threads on seperate CPU's net faster speeds, and these are things that people do every day with thier PC).
For sharing threads among CPU's there are a few methods, I'll discuss the 2 main ones.
1.) Sharing a thread queue: Each CPU has it's own interrupts firing at specific intervals, when it fires to switch tasks, it locks the queue, and searches for the next thread/process and goes to it. Each CPU just pulls the next process off the list. The problem is, the chances of each CPU spinning while waiting for the queue to become unlocked is pretty high, since they are all pulling from one pool. This means wasted cycles, but on the plus side, there really isn't any load balancing, as each CPU is running all the tasks in the list, so they never stop grabbing tasks.
2.) List per CPU: Just how it sounds, each CPU has its own list, the benefit here is, that, even if you lock the list (so tasks can be added/removed), the chance of another CPU spinning is much less, since most of the time will be spent just reading it's own task list, meaning the CPU cycles are used more smartly. The down side is, now you have to try to balance your processes among processors, and keep track of multiple lists, and swap tasks when one CPU has to much idle time, or whatever. There is a lot of housekeeping going on, so this method is only worth it, if the housekeeping takes less cycles than the spinning that is done with the other method.
Now, there are plenty of sub-methods for each of these, but they are the main overviews. There are also plenty of ways to implement and try to get around the negative sides, but those are the problems that each specific OS needs to solve depending on your goals. If I want to build a single app gaming OS, my goals and implementations would probably be much different than someone shooting for a server, etc.
I have yet initialized processors with the same GDT, TSS, IDT and PDBR. And each processor is in a cycle:lukem95 wrote:it has already been discussed in this topic.
you will need to use a locking mechanism to stop the CPU's violating each other memory however if you use a shared scheduler
while(1);
The scheduler is locked with spinlocks.
What must I do now?
Another question:
When an IRQ happens (for example the timer IRQ) each AP is interrupted? If not, how can I switch tasks if the IRQ happens only in the BSP?
if you wanted to have a tasking model for multiple processors using legacy PIC you could (as a theoretical idea):
When each AP starts have them check a struct at a shared data address
The number of processors is then read, and incremented to tell the other CPU's that this one has been initialised.
Another struct is then read (containing a magic number and a pointer to a task list) for the AP. This is created by the BSP after detecting the number of AP's.
The task struct can then be read from that address. Tasks are added to the list by the BSP.
The timer fires on the BSP (as its using legacy PIC), so it decreases each tasks timeslice for each processor, instead of just for its own CPU. This is done through the task lists.
DISCLAIMER: this is PURELY theoretical.. i have thought of it only today, but it seems logical to me. If anybody spots a weakness feel free to C&C
When each AP starts have them check a struct at a shared data address
Code: Select all
APTaskstruct_t* ap = (APTaskstruct*) shared_address;
Code: Select all
int i = ap->num_ap;
ap->num_ap++;
Code: Select all
APTaskptr_t* ptr = (APTaskptr_t*) shared_address+(sizeof(APTaskptr_t * i));
The timer fires on the BSP (as its using legacy PIC), so it decreases each tasks timeslice for each processor, instead of just for its own CPU. This is done through the task lists.
DISCLAIMER: this is PURELY theoretical.. i have thought of it only today, but it seems logical to me. If anybody spots a weakness feel free to C&C
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
You'd still need to do some IRQ routing in order to get the AP's to trap into scheduling to a new task. In this case you'd probably use the Local APIC to send the same messages.
Each processor also has a timer in its local APIC, you can use that to interrupt the processors without needing an IRQ routed to that core.
Each processor also has a timer in its local APIC, you can use that to interrupt the processors without needing an IRQ routed to that core.
So I can do two things:Combuster wrote:You'd still need to do some IRQ routing in order to get the AP's to trap into scheduling to a new task. In this case you'd probably use the Local APIC to send the same messages.
Each processor also has a timer in its local APIC, you can use that to interrupt the processors without needing an IRQ routed to that core.
1 - IRQ Routing with IOAPIC
2 - Timers setup in each LAPIC
Right?
Have you any tutorial or source code about this?
[url=http://www.intel.com/design/chipsets/datashts/290566.htm]82093AA I/O ADVANCEDMarkOS wrote: 1 - IRQ Routing with IOAPIC
PROGRAMMABLE INTERRUPT
CONTROLLER (IOAPIC)[/url]
and
[url=http://www.intel.com/design/pentium/datashts/242016.htm]MultiProcessor
Specification[/url]
[url=http://download.intel.com/design/processor/manuals/253668.pdf]Intel® 64 and IA-32 ArchitecturesMarkOS wrote: 2 - Timers setup in each LAPIC
Software Developer’s Manual
Volume 3A:
System Programming Guide, Part 1[/url]
There's everything you need to know.