Has anybody looked at
Helios? They modified a version of Singularity so that they could run it on a CPU and GPU in the same system. This is where multikernel gets really interesting.
That said, I think you guys are misunderstanding the point of Barrelfish.
Brendan wrote:When papers for Barrelfish were first published, I thought about "kernel per core" for about 5 seconds before deciding "kernel per NUMA domain" makes more sense. Then I thought about "kernel per NUMA domain" for about 7 seconds before realising it makes more sense to have "kernel per computer" that uses both per CPU data structures and per NUMA domain structures to get the same benefits without the disadvantages. Then I realised that this is what everyone does already, and told myself off for wasting 12 seconds.
Ignoring the cases of non-coherent domains and heterogeneous architectures, you're correct that it doesn't matter a whole lot whether there are multiple copies of the kernel code. I think one per NUMA domain would be marginally faster than one per computer, but it doesn't matter because that isn't what Barrelfish is really about.
From a design point of view, Barrelfish is about two things:
- How kernel data structures are managed, and
- How IPC works.
In other kernel architectures, the default assumption is that the kernel has access to all memory in the machine, so all its data structures are shared by all CPUs. You can limit this for optimization purposes (e.g.: per-CPU scheduler queues, per-NUMA-domain memory management, etc.), but that's the point -- these are just optimizations. At its heart, a system built this way is a uniprocessor kernel with SMP and locks bolted on. As such systems get more complex, they get a lot harder to scale (witness how much work it took to implement scalability improvements in each major release of the NT kernel).
Big deal #1 about Barrelfish is that by default it assumes no shared memory between CPUs. This means that, like a distributed system, there is no "single version of the truth" that has to be maintained. The complexity gets flipped around: Instead of a really complex kernel running on all CPUs managing relatively simple data structures, you get many instances of a really simple uniprocessor kernel managing CPU-local data structures, and keeping them all in sync via IPC. The complexity is in the algorithms used to keep the system as a whole in sync, as anyone working on cloud computing (like me) can tell you.
As long as there is shared state in a system (any system, not just an OS), that shared state will be a scalability bottleneck. Barrelfish is trying to design that bottleneck completely out of the system.
Big deal #2 about Barrelfish is how it handles IPC. This part I'm not as clear on (it's been a while since I read the paper), but I think the idea is that the system adapts its IPC to the topology of the machine it's running on. In other words, you wouldn't need to write the kernel's IPC code any differently to handle optimizations for NUMA, non-coherent domains, lots of CPUs, not so many CPUs, etc. Given how diverse hardware is, and will continue to get, it seems like a good idea not to have to hard-code assumptions about the hardware topology in the kernel.
gravaera wrote:Yea, like Brendan said, Barrelfish is a colossal flop, so far.
How is it a flop? It's an experiment. Even if the ideas turn out to be completely unworkable in practice, at least they learned something. That's what research is for!