Multi-CPU stuff
Multi-CPU stuff
Yesterday I got the AP in a multi cpu system into pmode and my kernel. Now I want to make the changes that are need to get stuff like APIC, IOAPIC and the scheduler to work. APIC and IOAPIC are not that problem at the moment - but I think they will become a problem with the time - but I don?t know how I should write my scheduler for supporting smp, but also work on an uni-cpu system!
I use a scheduler like the actual linux scheduler, that means I have a bitmap where I get the highest priority and then get the thread which is in this priority queue from a table. But what have I to do to support smp with this scheduler?
Till I have a solution for this - or you give me on - I will write the code for APIC and IOAPIC handling!
I use a scheduler like the actual linux scheduler, that means I have a bitmap where I get the highest priority and then get the thread which is in this priority queue from a table. But what have I to do to support smp with this scheduler?
Till I have a solution for this - or you give me on - I will write the code for APIC and IOAPIC handling!
Re:Multi-CPU stuff
Hi,
For the first way, each time a CPU is deciding what to run next it'd need to lock the shared scheduling data, find the next thread to run, then unlock the scheduling data and do the thread switch.
For the second way, when a thread becomes ready to run you decide which CPU it should be run on and add it that CPU's scheduling data. When a CPU is deciding what to run next it only looks at it's own scheduling data.
The advantage of the second way is that locking is reduced - all CPUs can be doing thread switches at the same time and only one CPU can be effected when a thread becomes ready to run.
The disadvantage of the second way is that it doesn't respond to changes as quickly and you can end up with un-balanced load. For example, imagine there's 2 CPUs where both CPUs have 20 ready to run threads. The threads on the first CPU may execute for a short time and become blocked, while all the threads on the second CPU might take as much time as they can. This would leave the first CPU idle while the second CPU is very busy.
This leads to the first compromise - is reduced lock contention more important than unbalanced load? This depends on how many CPUs the computer has, and how well you can predict CPU load in advance.
One possibility would be to use the second method, but add extra code to monitor how balanced the load is. If you determine that CPU load is too unbalanced you can shift threads from one CPU to another. This adds overhead and shifting threads from one CPU to another can increase lock contention again. You'd need to be careful that threads aren't being shifted around too much (bouncing from CPU to CPU getting no-where). You'd also need to consider whether the load balancing is an improvement or not.
There's probably other alternatives too...
In any case, while you're designing the scheduler forget about single CPU computers - chances are it'll work fine by accident.
Cheers,
Brendan
At the most generic level, there's 2 ways to implement the scheduler- use the same scheduling data for all CPUs, or have seperate scheduling data for each CPU.FlashBurn wrote:I use a scheduler like the actual linux scheduler, that means I have a bitmap where I get the highest priority and then get the thread which is in this priority queue from a table. But what have I to do to support smp with this scheduler?
For the first way, each time a CPU is deciding what to run next it'd need to lock the shared scheduling data, find the next thread to run, then unlock the scheduling data and do the thread switch.
For the second way, when a thread becomes ready to run you decide which CPU it should be run on and add it that CPU's scheduling data. When a CPU is deciding what to run next it only looks at it's own scheduling data.
The advantage of the second way is that locking is reduced - all CPUs can be doing thread switches at the same time and only one CPU can be effected when a thread becomes ready to run.
The disadvantage of the second way is that it doesn't respond to changes as quickly and you can end up with un-balanced load. For example, imagine there's 2 CPUs where both CPUs have 20 ready to run threads. The threads on the first CPU may execute for a short time and become blocked, while all the threads on the second CPU might take as much time as they can. This would leave the first CPU idle while the second CPU is very busy.
This leads to the first compromise - is reduced lock contention more important than unbalanced load? This depends on how many CPUs the computer has, and how well you can predict CPU load in advance.
One possibility would be to use the second method, but add extra code to monitor how balanced the load is. If you determine that CPU load is too unbalanced you can shift threads from one CPU to another. This adds overhead and shifting threads from one CPU to another can increase lock contention again. You'd need to be careful that threads aren't being shifted around too much (bouncing from CPU to CPU getting no-where). You'd also need to consider whether the load balancing is an improvement or not.
There's probably other alternatives too...
In any case, while you're designing the scheduler forget about single CPU computers - chances are it'll work fine by accident.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Multi-CPU stuff
Ok, I finished with changing the code of the scheduler and the semaphore/mutex code so that I needn?t to write a new scheduler. My scheduler (should) work now on 1 cpu (proved) and on multi cpu (not proved) systems.
I will now try to use the APIC timer for the scheduler. I hope that this works under Bochs, because with my real dual system I have a problem with my loader
Another Topic::
What could it be that my scheduler works in Bochs and Qemu, but not really on my PC? With not really I mean, that I have 3 threads which print a number. On the emus there I can see all 3 numbers, but on my pc I can only see the 1st thread!?
Edit::
I found one more problem with smp. At the moment I use the "device not available" exception for saving the fpu envoirment. Is there a way that I can use this also for smp? Because imagine the following situation - thread A does something on the fpu of cpu 1 and then it gets back into the ready queue and cpu 2 picks it up and it wants to go on working with the fpu, but this time of cpu 2 and cpu 2 has not the fpu envoirment the thread needs! Is there a way how I can solve this problem without saving and loading the fpu envoirment every thread switch?
I will now try to use the APIC timer for the scheduler. I hope that this works under Bochs, because with my real dual system I have a problem with my loader
Another Topic::
What could it be that my scheduler works in Bochs and Qemu, but not really on my PC? With not really I mean, that I have 3 threads which print a number. On the emus there I can see all 3 numbers, but on my pc I can only see the 1st thread!?
Edit::
I found one more problem with smp. At the moment I use the "device not available" exception for saving the fpu envoirment. Is there a way that I can use this also for smp? Because imagine the following situation - thread A does something on the fpu of cpu 1 and then it gets back into the ready queue and cpu 2 picks it up and it wants to go on working with the fpu, but this time of cpu 2 and cpu 2 has not the fpu envoirment the thread needs! Is there a way how I can solve this problem without saving and loading the fpu envoirment every thread switch?
Re:Multi-CPU stuff
Hi,
As for Bochs, the only problem I've found so far is that "send to lowest priority" actually sends the interrupt to all CPUs (instead of the lowest priority CPU). Bochs should be good for testing multi-CPU boot and the local APIC timer (and more).
Cheers,
Brendan
Debugging one problem can be difficult, but leaving it can mean debugging 2 or more problems at the same time later, which is much harder. IMHO fixing your loader should be your first priority, and testing your scheduler on the dual system should be your second priority. Adding new features (e.g. APIC timer) can wait until you know all the previous stuff is right....FlashBurn wrote:Ok, I finished with changing the code of the scheduler and the semaphore/mutex code so that I needn?t to write a new scheduler. My scheduler (should) work now on 1 cpu (proved) and on multi cpu (not proved) systems.
I will now try to use the APIC timer for the scheduler. I hope that this works under Bochs, because with my real dual system I have a problem with my loader
As for Bochs, the only problem I've found so far is that "send to lowest priority" actually sends the interrupt to all CPUs (instead of the lowest priority CPU). Bochs should be good for testing multi-CPU boot and the local APIC timer (and more).
Bochs and Qemu fill all memory with zero before your OS gets it, they use different timing ("close enough" timing), they may or may not emulate the TLBs correctly, don't emulate caches, etc. If you've only tested it on one real computer then it might actually work on some real computers and not others. The possibilities are endless...FlashBurn wrote:What could it be that my scheduler works in Bochs and Qemu, but not really on my PC? With not really I mean, that I have 3 threads which print a number. On the emus there I can see all 3 numbers, but on my pc I can only see the 1st thread!?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Multi-CPU stuff
Hi,
Cheers,
Brendan
Yes - the trick is to use the "device not available" exception differently, only loading the FPU/MMX/SSE state during the "device not available" exception and only saving the state if it's been used. For example:FlashBurn wrote:Edit::
I found one more problem with smp. At the moment I use the "device not available" exception for saving the fpu envoirment. Is there a way that I can use this also for smp? Because imagine the following situation - thread A does something on the fpu of cpu 1 and then it gets back into the ready queue and cpu 2 picks it up and it wants to go on working with the fpu, but this time of cpu 2 and cpu 2 has not the fpu envoirment the thread needs! Is there a way how I can solve this problem without saving and loading the fpu envoirment every thread switch?
Code: Select all
device_not_available_exception:
load_new_FPU/MMX/SSE_state();
clear_CPU's_TS_flag();
iretd
thread_switch:
if(CPU's_TS_flag = clear ) save_FPU/MMX/SSE_state();
set_CPU's_TS_flag();
..etc...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Multi-CPU stuff
You have a failure in your sample code, I have to clear the TS flag before executing any fpu instruction!
Re:Multi-CPU stuff
Actually, it might be good idea to load FPU state when switching to a thread that has previously used FPU.
Code that uses FPU once, tends to use it again, and if you have a realtime thread (say, some audio code?) that needs FPU, then it's quite wasteful to take an extra exception more or less every time you switch to the thread.
Another thing is that it might not be a good idea to save the FPU state "when it has been used" but instead "when somebody else needs FPU". This way, if the said audio-thread is interrupted by some (say) device driver that couldn't care about FPU, then there's no purpose in saving and restoring the same FPU data (you can block FPU use before destroying the FPU state, right?).
Code that uses FPU once, tends to use it again, and if you have a realtime thread (say, some audio code?) that needs FPU, then it's quite wasteful to take an extra exception more or less every time you switch to the thread.
Another thing is that it might not be a good idea to save the FPU state "when it has been used" but instead "when somebody else needs FPU". This way, if the said audio-thread is interrupted by some (say) device driver that couldn't care about FPU, then there's no purpose in saving and restoring the same FPU data (you can block FPU use before destroying the FPU state, right?).
Re:Multi-CPU stuff
All you said is right, but only for single cpu systems, because we talk about multi cpu systems and there could it happen that a thread that has been run on cpu 1 will be the next time on cpu 2 and so you need to save the fpu env!
[OT] loader problem
I hope that I found the code for my loader problem. I use unreal mode to copy the kernel and the modules >1MiB. But my memory copy functions seems to hang the whole pc when its copying the kernel over 1MiB. So I think that it could be that the a20gate isn?t enabled. My dual system is the 1st and only system where this error occures and this could be/is because of my code for activateing the a20gate:
This code is very old and I don?t know what is was doing - I should comment my old code ! -
Edit::
Ok, after doing a search on the a20gate I found the code in the os wiki and now my bootloader is also working on my dual system ;D
Code: Select all
;----------------------------
proc enable_a20gate
;----------------------------
begin
push gs
push fs
xor ax,ax
mov bx,0ffffh
mov fs,ax
mov gs,bx
mov di,500h
mov si,510h
mov byte[fs:di],0
mov ax,2401h
int 15h
jnc .test
.kbd_ctrl:
in al,92h
or al,2
out 92h,al
.test:
mov byte[gs:si],1
cmp byte[fs:di],1
jne .end
cli
call wait_kbd
mov al,0d1h
out 64h,al
call wait_kbd
mov al,0dfh
out 64h,al
call wait_kbd
sti
mov byte[gs:si],1
cmp byte[fs:di],1 ;this could be the problem
jne .end_err
;----------------------------
.end:
pop fs
pop gs
return 1
;----------------------------
align 4
.end_err:
pop fs
pop gs
return 0
endp
;----------------------------
Edit::
Ok, after doing a search on the a20gate I found the code in the os wiki and now my bootloader is also working on my dual system ;D
Re:Multi-CPU stuff
@brendan
I used you code you have written in the apic timer thread to get the fsb speed in Hz. It is accurate but doesn?t work for me ???
This is my code:
For my Athlon system I get a fsb of 0xfd984c Hz and this isn?t enough! (because 0xfd984c Hz = 16.619.596 Hz = 16,62MHz)
I used you code you have written in the apic timer thread to get the fsb speed in Hz. It is accurate but doesn?t work for me ???
This is my code:
Code: Select all
;----------------------------
; init APIC Timer
mov esi,APIC_BASE_ADDR
mov eax,[esi+apic_regs_t.svr]
call apic_timer_set_vector, eax
call apic_timer_set_flags, dword 0
;----------------------------
; init PIT Timer
call idt_get_base, dword 20h
mov [pit_base],eax
call idt_set_base, dword 20h, dword irq_apic_timer_test
call pit_init, dword 37286, dword PIT_INIT
call pic_unmaskirq, dword 1
;----------------------------
; wait for PIT
mov eax,[cpu_bus_freq]
.wait1:
cmp [cpu_bus_freq],eax
je .wait1
;----------------------------
; start APIC Timer
mov dword[APIC_BASE_ADDR+apic_regs_t.init_count_reg],0ffffffffh
;----------------------------
; wait for PIT
mov eax,[cpu_bus_freq]
.wait2:
cmp [cpu_bus_freq],eax
je .wait2
;----------------------------
; disable APIC Timer
mov [APIC_BASE_ADDR+apic_regs_t.lvt_timer_reg],APIC_MASK_INT
;----------------------------
; stop PIT and set old base
call pic_maskirq, dword 1
call idt_set_base, dword 20h, dword[pit_base]
;----------------------------
; get current APIC Timer count
call apic_timer_get_count
;----------------------------
; calc how much the APIC Timer count has decreased
mov ebx,0ffffffffh
xor edx,edx
xchg eax,ebx
sub eax,ebx
;----------------------------
; calc the bus freq in Hz (div of apic timer / Hz of pit)
mov ebx,(128 / 32)
mul ebx
mov [cpu_bus_freq],eax
return
Re:Multi-CPU stuff
Hi,
Everything here looks right (or it did after I figured out that the first parameter to "call pit_init" was the PIT divider, not the PIT frequency ).
Therefore I'm assuming the problem is hidden in one of the called routines.
It can't be a problem in the PIC mask or the PIT IRQ handler or it'd wait forever, so that leaves one of these routines:
- call apic_timer_set_vector, eax
- call apic_timer_set_flags, dword 0
- call pit_init, dword 37286, dword PIT_INIT
To test the PIT frequency, try something like:
For a 32 Hz timer this delay loop should wait for 60 seconds - long enough to time it with a wall clock to see if the PIT frequency is accurate.
For the remaining 2 routines, would you mind posting the code?
BTW you could replace this code:
With:
Cheers,
Brendan
Everything here looks right (or it did after I figured out that the first parameter to "call pit_init" was the PIT divider, not the PIT frequency ).
Therefore I'm assuming the problem is hidden in one of the called routines.
It can't be a problem in the PIC mask or the PIT IRQ handler or it'd wait forever, so that leaves one of these routines:
- call apic_timer_set_vector, eax
- call apic_timer_set_flags, dword 0
- call pit_init, dword 37286, dword PIT_INIT
To test the PIT frequency, try something like:
Code: Select all
mov ecx, 32 * 60
.delay1:
mov eax,[cpu_bus_freq]
.delay2:
cmp [cpu_bus_freq],eax
je .delay2
loop .delay1
For the remaining 2 routines, would you mind posting the code?
BTW you could replace this code:
Code: Select all
;----------------------------
; get current APIC Timer count
call apic_timer_get_count
;----------------------------
; calc how much the APIC Timer count has decreased
mov ebx,0ffffffffh
xor edx,edx
xchg eax,ebx
sub eax,ebx
;----------------------------
; calc the bus freq in Hz (div of apic timer / Hz of pit)
mov ebx,(128 / 32)
mul ebx
mov [cpu_bus_freq],eax
Code: Select all
mov eax,0ffffffffh
mov ebx,(128 / 32)
sub eax,[APIC_BASE_ADDR+apic_regs_t.init_count_reg]
mul ebx
mov [cpu_bus_freq],eax
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Multi-CPU stuff
This is the code:
Maybe there is a failure in my pit_init function. I will have a look at the specs!
Edit::
You have to take the current count register of the APIC to get the needed value!
Code: Select all
;----------------------------
proc apic_timer_set_vector, vector
;----------------------------
begin
mov edi,APIC_BASE_ADDR
mov eax,[vector]
mov ebx,[edi+apic_regs_t.lvt_timer_reg]
and eax,0ffh
and ebx,0ffffff00h
or eax,ebx
mov [edi+apic_regs_t.lvt_timer_reg],eax
return
endp
;----------------------------
;----------------------------
proc apic_timer_set_flags, flags
;----------------------------
begin
mov edi,APIC_BASE_ADDR
mov eax,[flags]
mov ebx,[edi+apic_regs_t.lvt_timer_reg]
and eax,APIC_TIMER_PERIODIC
and ebx,not APIC_TIMER_PERIODIC
or eax,ebx
mov [edi+apic_regs_t.lvt_timer_reg],eax
return
endp
;----------------------------
PIT_COUNT0= 40h
PIT_COUNT1= 41h
PIT_COUNT2= 42h
PIT_CONTRL= 43h
PIT_MODUS0= 0000b
;----------------------------
proc pit_init, divider, control
;----------------------------
begin
mov eax,[control]
mov edx,PIT_CONTRL
out dx,al
mov eax,[divider]
and eax,0ffh
mov edx,PIT_COUNT0
out dx,al
mov eax,[divider]
shr eax,8
and eax,0ffh
out dx,al
return
endp
;----------------------------
Edit::
You have to take the current count register of the APIC to get the needed value!
Re:Multi-CPU stuff
Hi,
Ok, I've been messing with it - do you set the local APIC timer divider anywhere ("APIC_BASE_ADDR + 0x3E0")?
Anyway, here's what I changed it into (including setting the timer divider to divide by 128):
Sorry - did you get a bus speed of 0 MHz?
Cheers,
Brendan
Ok, I've been messing with it - do you set the local APIC timer divider anywhere ("APIC_BASE_ADDR + 0x3E0")?
Anyway, here's what I changed it into (including setting the timer divider to divide by 128):
Code: Select all
;----------------------------
; init PIT Timer
call idt_get_base, dword 20h
mov [pit_base],eax
call idt_set_base, dword 20h, dword irq_apic_timer_test
call pit_init, dword 37286, dword PIT_INIT
call pic_unmaskirq, dword 1
;----------------------------
; init APIC Timer
;Set the timer divider to divide by 128
mov dword [APIC_BASE_ADDR+apic_regs_t.lvt_timer_divider_reg],1010b
;----------------------------
; wait for PIT
mov eax,[cpu_bus_freq]
.wait1:
cmp [cpu_bus_freq],eax
je .wait1
;----------------------------
; start APIC Timer
;Get the timer interrupt vector (from the local APIC????)
mov eax,[APIC_BASE_ADDR+apic_regs_t.lvt_timer_reg]
and eax,0xFF
;Set the initial count
mov dword [APIC_BASE_ADDR+apic_regs_t.init_count_reg],0ffffffffh
;Set the local APIC timer interrupt vector, set timer for periodic mode and enable it
mov [APIC_BASE_ADDR+apic_regs_t.lvt_timer_reg],eax
;----------------------------
; wait for PIT
mov eax,[cpu_bus_freq]
.wait2:
cmp [cpu_bus_freq],eax
je .wait2
;----------------------------
; disable APIC Timer
mov dword [APIC_BASE_ADDR+apic_regs_t.lvt_timer_reg],APIC_MASK_INT
;----------------------------
; get the timer count and calc the bus freq in Hz (div of apic timer / Hz of pit)
mov eax,0ffffffffh
mov ebx,(128 / 32)
sub eax,[APIC_BASE_ADDR+apic_regs_t.init_current_count_reg]
mul ebx
mov [cpu_bus_freq],eax
;----------------------------
; stop PIT and set old base
call pic_maskirq, dword 1
call idt_set_base, dword 20h, dword [pit_base]
return
The pit_init code looks fine, as long as PIT_INIT is correct - something like 0x34 (channel 0, lobyte/hibyte, mode 2, binary). This routine can be optimised too (you don't need to do "and eax,0xFF" when you only use AL anyway), for e.g.:FlashBurn wrote:Maybe there is a failure in my pit_init function. I will have a look at the specs!
Code: Select all
mov eax,[divider]
mov edx,PIT_COUNT0
out dx,al
shr eax,8
out dx,al
FlashBurn wrote:You have to take the current count register of the APIC to get the needed value!
Sorry - did you get a bus speed of 0 MHz?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:Multi-CPU stuff
How can I be so stupid, the function for setting the devider for the apic timer is the missing one I wanted to write it, but in some way I have forgotten it! Now it should work and I will test the code later.
I know that with the optimisation, but this is old code and I was not that good at that time! I changed it before I read your post ;D
So thanks again, I will write if I get a new problem.
I know that with the optimisation, but this is old code and I was not that good at that time! I changed it before I read your post ;D
So thanks again, I will write if I get a new problem.
Re:Multi-CPU stuff
OK, maybe I found the problem, it is the way we are trying to calc the FSB. I think it should be "value that the apic timer has decreased" * "apic timer divider" * "Hz of PIT" = FSB! So it gives me on my AMD system 266MHz and this is a value that could be right!