Multikernel theory

gerryg400 · Post by **gerryg400** » Sat Aug 14, 2010 5:51 am

I'm wondering whether anyone, with an eye on the future (100's or 1000's of cores), has considered a multikernel or 'kernel per core' design for his O/S. If so, what sort of inter-core communication have you considered ?

Brendan · Post by **Brendan** » Sat Aug 14, 2010 6:11 am

Hi,

gerryg400 wrote:I'm wondering whether anyone, with an eye on the future (100's or 1000's of cores), has considered a multikernel or 'kernel per core' design for his O/S. If so, what sort of inter-core communication have you considered ?

When papers for Barrelfish were first published, I thought about "kernel per core" for about 5 seconds before deciding "kernel per NUMA domain" makes more sense. Then I thought about "kernel per NUMA domain" for about 7 seconds before realising it makes more sense to have "kernel per computer" that uses both per CPU data structures and per NUMA domain structures to get the same benefits without the disadvantages. Then I realised that this is what everyone does already, and told myself off for wasting 12 seconds.

Mostly, it only makes sense when CPUs are very different (e.g. mixture of 80x86, Itanium and SPARC) or you have to support non-cache-coherent NUMA (all 80x86 NUMA is cache coherent).

Cheers,

Brendan

nikito · Post by **nikito** » Sat Aug 14, 2010 6:29 am

gerryg400 wrote:I'm wondering whether anyone, with an eye on the future (100's or 1000's of cores), has considered a multikernel or 'kernel per core' design for his O/S. If so, what sort of inter-core communication have you considered ?

I see the future not as 1000's of cores, I see the future as quantum computers with very small number of "registers" but with incomparable computational potential. So the future will not treat about managing thousand of cores, instead, the challenge will be to write an OS that run in very few "registers" as the very first days of the computers.

BTW, knowing the human longing, probable somebody will try to set up an cluster of thousand of quantum computers

gerryg400 · Post by **gerryg400** » Sat Aug 14, 2010 6:35 am

...and told myself off for wasting 12 seconds

Well, it's not such a lot of time to waste in the development of an operating system.

But I guess you're right. If the cpus are all identical they will all execute the same code. The core per kernel then dissolves down to single kernel with some core-private data and some global data with locking. Which, really, is what we have now as you said.

gravaera · Post by **gravaera** » Sat Aug 14, 2010 7:21 am

Hi:

Yea, like Brendan said, Barrelfish is a colossal flop, so far. Supposedly they intend to treat CPUs as devices, and have drivers for CPUs in the kernel, in the sense that you can have an ARM "driver" and x86 "driver" in the same kernel build and you can somehow have both types of CPU using the same code at once or something. Also, their source code is great for giggles. They talk about scalability and this and that, and they lots of nice academic papers on what's wrong with Linux and NT and all of the scalability issues those two have, and in their actual source, all they have are drivers for PCI and ACPI parsing, and no actual code that has anything to do with NUMA.

I recently visited their site again, within the last month or so, and I saw a notice: they're looking to hire PhD graduates to come get it to actually get somewhere. You look at their front page and there are at least 10 developers already working on it...

Conclusion: While multikernel may be a good idea sometime in the future, it's not feasible at all. In fact it never may be: first, it's difficult to link a kernel such that you can have multiple copies of it in the same coherent NUMA domain, and the headache you get from designing it isn't worth it at all. Cloning the kernel only makes sense when dealing with non-coherent domains. Otherwise, use shared memory and a single kernel. If anyone wants "zomg I wantz moar scale!", let them balance their domains properly, and have a max of 16 CPUs per domain with a sensible amount of per domain local memory.

Colonel Kernel · Post by **Colonel Kernel** » Sat Aug 14, 2010 8:54 am

Has anybody looked at Helios? They modified a version of Singularity so that they could run it on a CPU and GPU in the same system. This is where multikernel gets really interesting.

That said, I think you guys are misunderstanding the point of Barrelfish.

Brendan wrote:When papers for Barrelfish were first published, I thought about "kernel per core" for about 5 seconds before deciding "kernel per NUMA domain" makes more sense. Then I thought about "kernel per NUMA domain" for about 7 seconds before realising it makes more sense to have "kernel per computer" that uses both per CPU data structures and per NUMA domain structures to get the same benefits without the disadvantages. Then I realised that this is what everyone does already, and told myself off for wasting 12 seconds.

Ignoring the cases of non-coherent domains and heterogeneous architectures, you're correct that it doesn't matter a whole lot whether there are multiple copies of the kernel code. I think one per NUMA domain would be marginally faster than one per computer, but it doesn't matter because that isn't what Barrelfish is really about.

From a design point of view, Barrelfish is about two things:

How kernel data structures are managed, and
How IPC works.

In other kernel architectures, the default assumption is that the kernel has access to all memory in the machine, so all its data structures are shared by all CPUs. You can limit this for optimization purposes (e.g.: per-CPU scheduler queues, per-NUMA-domain memory management, etc.), but that's the point -- these are just optimizations. At its heart, a system built this way is a uniprocessor kernel with SMP and locks bolted on. As such systems get more complex, they get a lot harder to scale (witness how much work it took to implement scalability improvements in each major release of the NT kernel).

Big deal #1 about Barrelfish is that by default it assumes no shared memory between CPUs. This means that, like a distributed system, there is no "single version of the truth" that has to be maintained. The complexity gets flipped around: Instead of a really complex kernel running on all CPUs managing relatively simple data structures, you get many instances of a really simple uniprocessor kernel managing CPU-local data structures, and keeping them all in sync via IPC. The complexity is in the algorithms used to keep the system as a whole in sync, as anyone working on cloud computing (like me) can tell you.

As long as there is shared state in a system (any system, not just an OS), that shared state will be a scalability bottleneck. Barrelfish is trying to design that bottleneck completely out of the system.

Big deal #2 about Barrelfish is how it handles IPC. This part I'm not as clear on (it's been a while since I read the paper), but I think the idea is that the system adapts its IPC to the topology of the machine it's running on. In other words, you wouldn't need to write the kernel's IPC code any differently to handle optimizations for NUMA, non-coherent domains, lots of CPUs, not so many CPUs, etc. Given how diverse hardware is, and will continue to get, it seems like a good idea not to have to hard-code assumptions about the hardware topology in the kernel.

gravaera wrote:Yea, like Brendan said, Barrelfish is a colossal flop, so far.

How is it a flop? It's an experiment. Even if the ideas turn out to be completely unworkable in practice, at least they learned something. That's what research is for!

nikito · Post by **nikito** » Sun Aug 15, 2010 1:51 am

nikito wrote:BTW, knowing the human longing, probable somebody will try to set up an cluster of thousand of quantum computers

Sorry, this was a mistake. If humans achieve the quantum computation in the future, then humans will use (for sure) the quantum entanglement for communication between nodes. So even an geographically divided quantum computer would not be considered a cluster, but just one single computer. So in conclusion when the future arrive, low-efficient practice of distribute jobs between multiple cores will die forever.

Brendan · Post by **Brendan** » Sun Aug 15, 2010 2:26 am

Hi,

nikito wrote:
nikito wrote:BTW, knowing the human longing, probable somebody will try to set up an cluster of thousand of quantum computers
Sorry, this was a mistake. If humans achieve the quantum computation in the future, then humans will use (for sure) the quantum entanglement for communication between nodes. So even an geographically divided quantum computer would not be considered a cluster, but just one single computer. So in conclusion when the future arrive, low-efficient practice of distribute jobs between multiple cores will die forever.

And someone will write a process that does everything imaginable at the same time, so that nobody will ever need to run anything more than that one process. This will actually be easy due to the fact that fiction doesn't suffer from messy practical problems...

Cheers

Brendan

nikito · Post by **nikito** » Sun Aug 15, 2010 2:43 am

Brendan wrote:And someone will write a process that does everything imaginable at the same time, so that nobody will ever need to run anything more than that one process. This will actually be easy due to the fact that fiction doesn't suffer from messy practical problems...

Now my OS is multi kernel. But in the future if the hardware permit it, it will be faster to distribute all the tasks around one single core in linear implementation, than distributing the task around multiples cores in parallel implementation. Theoretically talking just changing all the GPU processors with just one that is fast as the sum of all GPU cores, this will be faster than GPUs.

PD: I have not an entire OS done, because I still can not install programs on it.

gravaera · Post by **gravaera** » Sun Aug 15, 2010 6:40 am

Hi:

Colonel Kernel wrote:
gravaera wrote:Yea, like Brendan said, Barrelfish is a colossal flop, so far.
How is it a flop? It's an experiment. Even if the ideas turn out to be completely unworkable in practice, at least they learned something. That's what research is for!

Heh, it's a flop for the reasons I pointed out in my post

Of course, experimentation is a completely valid point, so I'll concede and back off them

--All the best
gravaera

lemonyii · Post by **lemonyii** » Tue Aug 24, 2010 7:08 am

i thought about having multi os kernel for a computer: one for windows applications, one for linux, one for rescue....
but when it comes to real, i found that, computer is designed for processing DATA, not for CODE.
so our job is to use less code to process more data, and that's what DLLs and SOs designed for, and even OSs designed for.
Our aim of OS designing is resource sharing, especially code and hardware, so we can do more things within given time and space, say, processing more data.
if we use multi kernel, say, more code, we may compile every application together with the OS, to be a big binary file for executing, so, we may have "each core an OS", isn't it better? That's completely a joke!

OSDev.org

Multikernel theory

Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory

Re: Multikernel theory