Code: Select all
mp = (mp* /* or how you named it*/) address;
Code: Select all
mp = (mp* /* or how you named it*/) address;
what???cyr1x wrote:Why not justThis is just the same as assigning the values like you do it, but is alot cleaner.Code: Select all
mp = (mp* /* or how you named it*/) address;
Yes. The CPU is in real mode, so you send the Startup IPI, telling it where to start executing your 'trampoline code'. That code mus reside under 1MB in physical RAM (the AP does not have paging yet and is in real mode), and must switch the processor in to whatever state you want it to be in.MarkOS wrote:1 - When I start an AP processor, I must also go to Protected Mode with it and initialize its descriptors (GDT, IDT and TSS)?
http://www.acpi.info/2 - With ACPI and MP can I do only CPUs detection? Or also other things?
AJ wrote:Yes. The CPU is in real mode, so you send the Startup IPI, telling it where to start executing your 'trampoline code'. That code mus reside under 1MB in physical RAM (the AP does not have paging yet and is in real mode), and must switch the processor in to whatever state you want it to be in.MarkOS wrote:1 - When I start an AP processor, I must also go to Protected Mode with it and initialize its descriptors (GDT, IDT and TSS)?
To do this, I use a sequence of signals between my BSP and AP's (in a shared, locked memory location), which tells the trampoline code where to find my system tables and so on. As I use the same trampoline binary for PMode and Long Mode, it also tells the AP whether to attempt the switch to 64 bit mode.
http://www.acpi.info/2 - With ACPI and MP can I do only CPUs detection? Or also other things?
http://www.intel.com/design/pentium/datashts/242016.htm
Cheers,
Adam
Code: Select all
;look for MP Floating Point Struct first
;------------------------------------------
movzx eax, word [mem_bios] ;EAX = size in KB
shl eax, 10 ;EAX = address
mov esi, 9ffffh ;a0000 - where bios ends
sub eax, 16
.find_MP_FPS:
add eax, 16
cmp eax, esi
jae .range_check
cmp dword [rax], "_MP_"
je .found
jmp .find_MP_FPS
.range_check:
cmp esi, 0fffffh
jae .no_MP_FPS
mov eax, 0e0000h-16 ;f0000h, ROMs
mov esi, 0fffffh ;they end here
jmp .find_MP_FPS
.found:
lea ebx, [rax+16]
xor edx, edx
.MP_FPS_checksum: ;verify FPS checksum
dec ebx
movzx edi, byte [rbx]
add edx, edi
cmp ebx, eax
jne .MP_FPS_checksum
test dl, dl
jnz .find_MP_FPS ;checksum != 0, continue search
.ok:
movzx ecx, byte [rax+11] ;CL = MP Feature Byte 1
mov edi, [rax+4] ;RDI = address of MP table
test cl, cl ;see if MP Feature Byte 1 != 0
jnz .F_byte1 ;positive # means configuration # and no MP table
test edi, edi ;see if addr = 0
jz .no_MP_FPS ;it is 0 - no MP table, and no config?
cmp dword [rdi], 'PCMP' ;check signature of MP Table
jne .no_MP_FPS ;some weird stuff - wrong signature
;default to no MP at all
;parse MP table now
;------------------------------------------
movzx esi, word [rdi+22h] ;ESI = # of entries
add edi, 2ch ;EDI = address of 1st entry in MPTable
.entry:
mov rax, [rdi]
cmp al, 4
ja .next_entry
je .4_loc_int
cmp al, 2
ja .3_io_int
je .2_io_apic
jp .1_bus
.0_cpu:
add edi, 12
jmp .next_entry
.4_loc_int:
jmp .next_entry
.1_bus:
jmp .next_entry
.2_io_apic:
jmp .next_entry
.3_io_int:
.next_entry:
add edi, 8
sub esi, 1
jnz .entry
;only MP Floating Point Structure present
;and default configuration present
.F_byte1:
;no MP of any kind
.no_MP_FPS:
The GDT and IDT can be shared - although you do need a separate TSS descriptor for each core. As for PDBR, that depends whether you have separate process spaces. My trampoline code loads the same PDBR value for the BSP and AP's. After the scheduler is started, it is a bit more chaotic as each core has the PDBR value of its corresponding process.MarkOS wrote: Will I have same descriptors for all APs?
PDBR is shared between all APs?
I have a separate scheduler for each core. After the trampoline code runs, each AP in turn runs some code initialising its unique scheduler class (I am working in C++, but if you are using C/ASM/Other, I'm sure you can adapt the idea) and is put in to an idle state. The BSP then starts cramming the scheduler full of the initial tasks to run, before jumping in to its own scheduler.Must I change something in my scheduler or paging code?
Must I change something in my interrupts code?
Your doing it like soMarkOS wrote: what???
Code: Select all
mp.signature = *address;
mp.config = *(address+4);
mp.length = *(unsigned char*)(address+8);
mp.version = *(unsigned char*)(address+9);
mp.checksum = *(unsigned char*)(address+10);
mp.features1 = *(unsigned char*)(address+11);
mp.features2 = *(unsigned char*)(address+12);
mp.features3[0] = *(unsigned char*)(address+13);
mp.features3[1] = *(unsigned char*)(address+14);
mp.features3[2] = *(unsigned char*)(address+15);
Code: Select all
MultiProcessorFloatingPointer* mp = (MultiProcessorFloatingPointer* /*or how you named it*/ )address;
I'm not too sure about the Linux scheduler's details, but the "tickless" direction it's going in (or went recently?) is well worth looking into IMHO. The basic idea being that a periodic timer IRQ is bad because it wakes up CPUs (takes them out of any power saving state and wastes power, and causes unnecessary heat & noise). My more recent schedulers have used "one-shot" timers to reduce the number of unnecessary IRQs (CPL=3->CPU=0->CPL=3 context switches) and to provide more "time slice" precision, but I didn't optimize it for power management.cyr1x wrote:@AJ How about assigning a thread to a specific core and they allways run the same threads, now if one CPU runs out of threads it calls "balance()" or something and it takes some threads of other CPU's queues. This is like it's done in Linux. I think Brenden has (again) great theory for this.
Hmm.... that sounds, well... not right to me . What's the point of multi-threading if the threads are running on a single CPU only? I mean, say I am writing a game, it has AI in one thread, graphics engine in another... wouldn't it be better if they could run on seperate CPU's so both parts are done simultaneously, rather than on a single CPU where they are sharing cylces. That kind of defeats the main benefit of multithreading in a multi-cpu OS doesn't it? Also, transfering a high CPU load process is great, but now you are locking 2 buffers (removal and insertion qeues), and if that is the only process running and is high priority, what's to stop your kernel from passing it from cpu 0 -> 1 -> 2 -> 3 and back to 0 again on each check. Now, what if it's the only process running, and it has a lot of sub-threads (web-server type applications spawn many threads to use blocking i/o for clients), having all threads on one CPU and always moved with it's parent seems like a bit of a waste to me. While brandon's method may seem a bit 'overkill', sometimes keeping it simple isn't always the best solution. I plan to have process balancing of sorts in my OS, where each CPU stores a # indicating how much work it's doing (based on process count, and process priorities), on each task switch it check itself against the other CPU's (so, 3-6 compares with 4 cpu's), if it finds one that is different by a certain # of points (will play with values to find a good compromise, or have a variable in the OS for balancing), then it will move a thread (with a specific priority based on the difference between the 2). So, for example:bewing wrote:Brendan's mechanism is impressive, but kinda fails the fourth important concept of KISS.
I am intending to implement an earlier suggestion, myself. Each core has an independent scheduler, with an independent task queue. In my system, independent "jobs" own "threads" as subtasks. When a new job is created, it is assigned to the core with the least load at that time. The job then creates all its child threads on that same core.
If one core's scheduler runs out of >idle priority runnable tasks, it requests a load balance from the most loaded core. The most loaded core picks a fairly high priority job, and dumps that entire job worth of threads all at once to that empty core -- ie. the "affinity" for the entire job gets modified all at once.
I like the basic idea of one-shot tickless systems. However, the idea scares me, too. What happens if the timer fails to deliver the next one-shot interrupt to the CPU? Nothing in a system is absolutely 100% guaranteed, including interrupt delivery. I would really want to have some sort of backup mechanism to wake up a CPU that missed a one-shot interrupt, and got stuck running the same task forever.
In a tick-based system, at least you can always count on the fact that another timer tick will come along the next time around.
Each thread can block independently.What's the point of multi-threading if the threads are running on a single CPU only?
If that one process was the only one running on a machine (which it NEVER is) then that would be great. If there are 500 tasks sharing cycles on 8 cores, then you gain absolutely nothing by spreading the threads between cores. In fact, you lose a tremendous amount of "affinity".wouldn't it be better if they could run on seperate CPU's so both parts are done simultaneously, rather than on a single CPU where they are sharing cylces?
Oh yes. I didn't understand... However I did like that because I wanted to see if there were something wrong with the mp structure! Naturally I'm now using mp = (struct mp_floating_ptr*)address;cyr1x wrote:
This is .. ehm ... bad.
You can just assign the pointer to the "mp" and it will be "filled in"(not the right word) automaticly. Like soCode: Select all
MultiProcessorFloatingPointer* mp = (MultiProcessorFloatingPointer* /*or how you named it*/ )address;