Concept: "kernel core" (vs "app core"?)
Concept: "kernel core" (vs "app core"?)
I am referring to this thread (and multi-core concepts as a whole).
I had put my OS design / development ambitions on the back burner for several years, and my knowledge has gotten quite rusty. I am trying to get back into the "conceptual" discussion, so excuse me when I go into "noob camp" with this question.
Would it be conceivable / a good idea to have one core dedicated for "kernel work", with the other cores doing "app work" and delegating kernel stuff to the dedicated core? How efficient would that be in, say, a 32 core system, compared to having each core doing context switches?
I had put my OS design / development ambitions on the back burner for several years, and my knowledge has gotten quite rusty. I am trying to get back into the "conceptual" discussion, so excuse me when I go into "noob camp" with this question.
Would it be conceivable / a good idea to have one core dedicated for "kernel work", with the other cores doing "app work" and delegating kernel stuff to the dedicated core? How efficient would that be in, say, a 32 core system, compared to having each core doing context switches?
Every good solution is obvious once you've found it.
Re: Concept: "kernel core" (vs "app core"?)
I'd be more worried about cache coherency than IPIs. I didn't think IPIs were that expensive though...?berkus wrote:Aren't IPIs more expensive (for both cores)?
Re: Concept: "kernel core" (vs "app core"?)
Yep. I knew I had forgotten about something there.JamesM wrote:...cache coherency...
Thanks...
Every good solution is obvious once you've found it.
Re: Concept: "kernel core" (vs "app core"?)
With each core seeing different memory is cache coherency still a concern?
The current implementation I am doing is this:
Kernel memory: 0x0 - 0x3FFFFF (4 MiB)
App memory: 0xFFFF800000000000+
Each core has the same kernel mapping in the lower half but the upper half is different for each CPU core.
The current implementation I am doing is this:
Kernel memory: 0x0 - 0x3FFFFF (4 MiB)
App memory: 0xFFFF800000000000+
Each core has the same kernel mapping in the lower half but the upper half is different for each CPU core.
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Re: Concept: "kernel core" (vs "app core"?)
It's more the transfer of data between them during syscalls / ipc.ReturnInfinity wrote:With each core seeing different memory is cache coherency still a concern?
The current implementation I am doing is this:
Kernel memory: 0x0 - 0x3FFFFF (4 MiB)
App memory: 0xFFFF800000000000+
Each core has the same kernel mapping in the lower half but the upper half is different for each CPU core.
Re: Concept: "kernel core" (vs "app core"?)
Not really. In an SMP system at least, data doesn't have to be transferred. All cores can have equal access to all memory, so the destination core could access to the memory without the need for data transfer. I expect though, that when there are 32 cores, coherence of caches will no longer be guaranteed and the OS will have to manage this.JamesM wrote:It's more the transfer of data between them during syscalls / ipc.
The day of the message-passing microkernel is coming.
If a trainstation is where trains stop, what is a workstation ?
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Concept: "kernel core" (vs "app core"?)
Sending one is quite cheap. However, all interrupts are expensive thingsJamesM wrote:I'd be more worried about cache coherency than IPIs. I didn't think IPIs were that expensive though...?berkus wrote:Aren't IPIs more expensive (for both cores)?
Re: Concept: "kernel core" (vs "app core"?)
Hi,
The point is that this data transfer is not free. Cache coherency (as far as I understand the complexities of it on x86) is implemented using a broadcast message from one core to others, telling them to invalidate a cache line (when one core updates it). This invalidates the cache line in *all cache levels*, resulting in a read from main memory. This is extremely slow!
Sometimes you need to delve beneath the abstractions presented to see what harm can be caused by an algorithm.
That of course assumes that cache coherency is an invisible layer that affects nothing. On the contrary, as cache lines get shared and updated between cores, those cache lines need to be transferred between caches somehow. That this is managed transparently in hardware on the x86 is besides the point, although it can hide the details more effectively than other architectures such as POWER.gerryg400 wrote:Not really. In an SMP system at least, data doesn't have to be transferred. All cores can have equal access to all memory, so the destination core could access to the memory without the need for data transfer. I expect though, that when there are 32 cores, coherence of caches will no longer be guaranteed and the OS will have to manage this.JamesM wrote:It's more the transfer of data between them during syscalls / ipc.
The day of the message-passing microkernel is coming.
The point is that this data transfer is not free. Cache coherency (as far as I understand the complexities of it on x86) is implemented using a broadcast message from one core to others, telling them to invalidate a cache line (when one core updates it). This invalidates the cache line in *all cache levels*, resulting in a read from main memory. This is extremely slow!
Sometimes you need to delve beneath the abstractions presented to see what harm can be caused by an algorithm.
Re: Concept: "kernel core" (vs "app core"?)
Yes, if you want to move the kernel to just one core, then you also have to modify your design so that apps only call system functions (ie. the kernel) through a queue, and not directly. Then the queue can be processed on the kernel core exclusively, efficiently, and with no cache issues.
Re: Concept: "kernel core" (vs "app core"?)
I don't think kernel reetrancy has an effect here. A reentrant and non-reentrant (queued) system would still require the same data flow, and shouldn't have an effect on cache contention.bewing wrote:Yes, if you want to move the kernel to just one core, then you also have to modify your design so that apps only call system functions (ie. the kernel) through a queue, and not directly. Then the queue can be processed on the kernel core exclusively, efficiently, and with no cache issues.
Re: Concept: "kernel core" (vs "app core"?)
Hi,
You'd need code on each "application CPU" to handle TLB invalidation (the "kernel CPU" can't invalidate another CPU's TLBs directly) or to manage segments if you use segmentation instead of paging (the "kernel CPU" can't force another CPU to reload its GDT or segment registers directly). You'd also need code to read/write MSRs, and handle power management ("kernel CPU" can't directly put other CPU into or out of lower power states); and exception handling stubs, and a GDT and IDT. You'd also have to have some way for the application and kernel to communicate. Basically, you must have some kernel code running on every "application CPU" (e.g. a "kernel stub"). The "kernel stub" must run at CPL=0 to be able to access IPIs, MSRs, etc.
Of course the more CPUs there are the harder it is to ensure that all CPUs are actually doing useful work (and there's no point having more CPUs if they're aren't doing useful work); so let's talk about load balancing. Having 31 CPUs all doing nothing while they wait for the kernel's CPU to respond won't be good for performance (and having 31 CPUs going flat out and one kernel CPU doing nothing won't be ideal either). Depending on kernel load, you might want to be able to increase or decrease the number of CPUs the kernel uses. Then there's power management. If the kernel's CPU gets hot and starts running very slow due to thermal throttling, then it'd probably make sense migrate the kernel to a different (cooler/faster) CPU. If you want acceptable performance, then it's going to get complicated...
To simplify things (especially the load balancing), you could put the "kernel stub" on all CPUs (not just application CPUs). That way you could have all the load balancing in one place; and as an added bonus you'd be able to reduce the amount of code duplication (the kernel could use pieces of code that needs to be in the "kernel stub" anyway).
For protection, putting the kernel on a separate CPU makes no difference. For example, if you don't use software isolation then you'd have to run applications at CPL=3 (so they can't take over the computer), and you'd still have to switch to CPL=0 (and then back to CPL=3) when the application calls the "kernel stub". If you use software isolation, then you're not avoiding anything there either (e.g. you'd still need to make sure the application can't trash the "kernel stub").
Now, after you've taken care of all of this and implemented it, why not just change the names? The thing that is running on separate CPU/s from the applications could be renamed to something like "the system process" (instead of "the kernel"), and the thing running on each application CPU could be called "the kernel" (instead of "the kernel stub").
Cheers,
Brendan
How?Solar wrote:Would it be conceivable / a good idea to have one core dedicated for "kernel work", with the other cores doing "app work" and delegating kernel stuff to the dedicated core? How efficient would that be in, say, a 32 core system, compared to having each core doing context switches?
You'd need code on each "application CPU" to handle TLB invalidation (the "kernel CPU" can't invalidate another CPU's TLBs directly) or to manage segments if you use segmentation instead of paging (the "kernel CPU" can't force another CPU to reload its GDT or segment registers directly). You'd also need code to read/write MSRs, and handle power management ("kernel CPU" can't directly put other CPU into or out of lower power states); and exception handling stubs, and a GDT and IDT. You'd also have to have some way for the application and kernel to communicate. Basically, you must have some kernel code running on every "application CPU" (e.g. a "kernel stub"). The "kernel stub" must run at CPL=0 to be able to access IPIs, MSRs, etc.
Of course the more CPUs there are the harder it is to ensure that all CPUs are actually doing useful work (and there's no point having more CPUs if they're aren't doing useful work); so let's talk about load balancing. Having 31 CPUs all doing nothing while they wait for the kernel's CPU to respond won't be good for performance (and having 31 CPUs going flat out and one kernel CPU doing nothing won't be ideal either). Depending on kernel load, you might want to be able to increase or decrease the number of CPUs the kernel uses. Then there's power management. If the kernel's CPU gets hot and starts running very slow due to thermal throttling, then it'd probably make sense migrate the kernel to a different (cooler/faster) CPU. If you want acceptable performance, then it's going to get complicated...
To simplify things (especially the load balancing), you could put the "kernel stub" on all CPUs (not just application CPUs). That way you could have all the load balancing in one place; and as an added bonus you'd be able to reduce the amount of code duplication (the kernel could use pieces of code that needs to be in the "kernel stub" anyway).
For protection, putting the kernel on a separate CPU makes no difference. For example, if you don't use software isolation then you'd have to run applications at CPL=3 (so they can't take over the computer), and you'd still have to switch to CPL=0 (and then back to CPL=3) when the application calls the "kernel stub". If you use software isolation, then you're not avoiding anything there either (e.g. you'd still need to make sure the application can't trash the "kernel stub").
Now, after you've taken care of all of this and implemented it, why not just change the names? The thing that is running on separate CPU/s from the applications could be renamed to something like "the system process" (instead of "the kernel"), and the thing running on each application CPU could be called "the kernel" (instead of "the kernel stub").
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Concept: "kernel core" (vs "app core"?)
I think you just invented the microkernel.Now, after you've taken care of all of this and implemented it, why not just change the names? The thing that is running on separate CPU/s from the applications could be renamed to something like "the system process" (instead of "the kernel"), and the thing running on each application CPU could be called "the kernel" (instead of "the kernel stub").
If a trainstation is where trains stop, what is a workstation ?