Concept: "kernel core" (vs "app core"?)

Solar · Post by **Solar** » Tue Oct 26, 2010 2:27 am

I am referring to this thread (and multi-core concepts as a whole).

I had put my OS design / development ambitions on the back burner for several years, and my knowledge has gotten quite rusty. I am trying to get back into the "conceptual" discussion, so excuse me when I go into "noob camp" with this question.

Would it be conceivable / a good idea to have one core dedicated for "kernel work", with the other cores doing "app work" and delegating kernel stuff to the dedicated core? How efficient would that be in, say, a 32 core system, compared to having each core doing context switches?

JamesM · Post by **JamesM** » Tue Oct 26, 2010 4:59 am

berkus wrote:Aren't IPIs more expensive (for both cores)?

I'd be more worried about cache coherency than IPIs. I didn't think IPIs were that expensive though...?

Solar · Post by **Solar** » Tue Oct 26, 2010 5:01 am

JamesM wrote:...cache coherency...

Yep. I knew I had forgotten about something there.

Thanks...

IanSeyler · Post by **IanSeyler** » Tue Oct 26, 2010 6:46 am

With each core seeing different memory is cache coherency still a concern?

The current implementation I am doing is this:

Kernel memory: 0x0 - 0x3FFFFF (4 MiB)
App memory: 0xFFFF800000000000+

Each core has the same kernel mapping in the lower half but the upper half is different for each CPU core.

JamesM · Post by **JamesM** » Tue Oct 26, 2010 8:17 am

ReturnInfinity wrote:With each core seeing different memory is cache coherency still a concern?

The current implementation I am doing is this:

Kernel memory: 0x0 - 0x3FFFFF (4 MiB)
App memory: 0xFFFF800000000000+

Each core has the same kernel mapping in the lower half but the upper half is different for each CPU core.

It's more the transfer of data between them during syscalls / ipc.

gerryg400 · Post by **gerryg400** » Tue Oct 26, 2010 1:12 pm

JamesM wrote:It's more the transfer of data between them during syscalls / ipc.

Not really. In an SMP system at least, data doesn't have to be transferred. All cores can have equal access to all memory, so the destination core could access to the memory without the need for data transfer. I expect though, that when there are 32 cores, coherence of caches will no longer be guaranteed and the OS will have to manage this.

The day of the message-passing microkernel is coming.

Owen · Post by **Owen** » Tue Oct 26, 2010 4:10 pm

JamesM wrote:
berkus wrote:Aren't IPIs more expensive (for both cores)?
I'd be more worried about cache coherency than IPIs. I didn't think IPIs were that expensive though...?

Sending one is quite cheap. However, all interrupts are expensive things

JamesM · Post by **JamesM** » Wed Oct 27, 2010 2:57 am

Hi,

gerryg400 wrote:
JamesM wrote:It's more the transfer of data between them during syscalls / ipc.
Not really. In an SMP system at least, data doesn't have to be transferred. All cores can have equal access to all memory, so the destination core could access to the memory without the need for data transfer. I expect though, that when there are 32 cores, coherence of caches will no longer be guaranteed and the OS will have to manage this.

The day of the message-passing microkernel is coming.

That of course assumes that cache coherency is an invisible layer that affects nothing. On the contrary, as cache lines get shared and updated between cores, those cache lines need to be transferred between caches somehow. That this is managed transparently in hardware on the x86 is besides the point, although it can hide the details more effectively than other architectures such as POWER.

The point is that this data transfer is not free. Cache coherency (as far as I understand the complexities of it on x86) is implemented using a broadcast message from one core to others, telling them to invalidate a cache line (when one core updates it). This invalidates the cache line in *all cache levels*, resulting in a read from main memory. This is extremely slow!

Sometimes you need to delve beneath the abstractions presented to see what harm can be caused by an algorithm.

bewing · Post by **bewing** » Wed Oct 27, 2010 9:45 am

Yes, if you want to move the kernel to just one core, then you also have to modify your design so that apps only call system functions (ie. the kernel) through a queue, and not directly. Then the queue can be processed on the kernel core exclusively, efficiently, and with no cache issues.

JamesM · Post by **JamesM** » Wed Oct 27, 2010 10:40 am

bewing wrote:Yes, if you want to move the kernel to just one core, then you also have to modify your design so that apps only call system functions (ie. the kernel) through a queue, and not directly. Then the queue can be processed on the kernel core exclusively, efficiently, and with no cache issues.

I don't think kernel reetrancy has an effect here. A reentrant and non-reentrant (queued) system would still require the same data flow, and shouldn't have an effect on cache contention.

Brendan · Post by **Brendan** » Wed Oct 27, 2010 6:09 pm

Hi,

Solar wrote:Would it be conceivable / a good idea to have one core dedicated for "kernel work", with the other cores doing "app work" and delegating kernel stuff to the dedicated core? How efficient would that be in, say, a 32 core system, compared to having each core doing context switches?

How?

You'd need code on each "application CPU" to handle TLB invalidation (the "kernel CPU" can't invalidate another CPU's TLBs directly) or to manage segments if you use segmentation instead of paging (the "kernel CPU" can't force another CPU to reload its GDT or segment registers directly). You'd also need code to read/write MSRs, and handle power management ("kernel CPU" can't directly put other CPU into or out of lower power states); and exception handling stubs, and a GDT and IDT. You'd also have to have some way for the application and kernel to communicate. Basically, you must have some kernel code running on every "application CPU" (e.g. a "kernel stub"). The "kernel stub" must run at CPL=0 to be able to access IPIs, MSRs, etc.

Of course the more CPUs there are the harder it is to ensure that all CPUs are actually doing useful work (and there's no point having more CPUs if they're aren't doing useful work); so let's talk about load balancing. Having 31 CPUs all doing nothing while they wait for the kernel's CPU to respond won't be good for performance (and having 31 CPUs going flat out and one kernel CPU doing nothing won't be ideal either). Depending on kernel load, you might want to be able to increase or decrease the number of CPUs the kernel uses. Then there's power management. If the kernel's CPU gets hot and starts running very slow due to thermal throttling, then it'd probably make sense migrate the kernel to a different (cooler/faster) CPU. If you want acceptable performance, then it's going to get complicated...

To simplify things (especially the load balancing), you could put the "kernel stub" on all CPUs (not just application CPUs). That way you could have all the load balancing in one place; and as an added bonus you'd be able to reduce the amount of code duplication (the kernel could use pieces of code that needs to be in the "kernel stub" anyway).

For protection, putting the kernel on a separate CPU makes no difference. For example, if you don't use software isolation then you'd have to run applications at CPL=3 (so they can't take over the computer), and you'd still have to switch to CPL=0 (and then back to CPL=3) when the application calls the "kernel stub". If you use software isolation, then you're not avoiding anything there either (e.g. you'd still need to make sure the application can't trash the "kernel stub").

Now, after you've taken care of all of this and implemented it, why not just change the names? The thing that is running on separate CPU/s from the applications could be renamed to something like "the system process" (instead of "the kernel"), and the thing running on each application CPU could be called "the kernel" (instead of "the kernel stub").

Cheers,

Brendan

gerryg400 · Post by **gerryg400** » Wed Oct 27, 2010 6:18 pm

Now, after you've taken care of all of this and implemented it, why not just change the names? The thing that is running on separate CPU/s from the applications could be renamed to something like "the system process" (instead of "the kernel"), and the thing running on each application CPU could be called "the kernel" (instead of "the kernel stub").

I think you just invented the microkernel.

OSDev.org

Concept: "kernel core" (vs "app core"?)

Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)

Re: Concept: "kernel core" (vs "app core"?)