Hello, I am wondering if there any operating systems ( or OS theory) that are fault tolerant . By fault tolerant I mean, if there is kernel process that is faulty and hogging a cpu or got some error is there a way to isolate the problem to that single kernel thread or some pool of the threads related to the faulty thread and let the rest of system work properly.
I know that getting an error in one part of the kernel can compromise the correctness of kernel but sometimes it might not affect most of the kernel. For example when a kernel thread stuck in while loop with no progress hogging the CPU without taking any locks we might be able to safely terminate it if some magic oracle says this kernel thread ( possibly a driver) is no good. But again having such an oracle is a major problem as well.
So are they any OS design that have fault tolerance ( of the kind I described ) as a design goal?
Thank you
Fault tolerant OS
Re: Fault tolerant OS
Hi,
To fix that, you want to isolate the pieces so you can know that if one piece has a problem it can't ruin other pieces. In other words; a micro-kernel ends up being necessary.
Of course a micro-kernel isn't enough on its own. You'd also need code to monitor, terminate and restart drivers; and (in some cases) ways to recover lost state.
This has been done before (e.g. Minix 3).
Cheers,
Brendan
Sometimes it might not effect most of the kernel, but you never know if it did or not so that doesn't help - you have to assume that almost all of the kernel might have been ruined regardless.bharathm1 wrote:I know that getting an error in one part of the kernel can compromise the correctness of kernel but sometimes it might not affect most of the kernel.
To fix that, you want to isolate the pieces so you can know that if one piece has a problem it can't ruin other pieces. In other words; a micro-kernel ends up being necessary.
Of course a micro-kernel isn't enough on its own. You'd also need code to monitor, terminate and restart drivers; and (in some cases) ways to recover lost state.
This has been done before (e.g. Minix 3).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Fault tolerant OS
Do you think there is any hope for fault tolerant monolithic kernels?
I think ideas such as nooks where we isolate the address space of kernel drivers is a good line of research though it pushes some burden to driver programmers.
What are the main principles that the present monolithic kernels (eg. Linux) violating that made kernel terrible at fault tolerance?
I think ideas such as nooks where we isolate the address space of kernel drivers is a good line of research though it pushes some burden to driver programmers.
What are the main principles that the present monolithic kernels (eg. Linux) violating that made kernel terrible at fault tolerance?
-
- Member
- Posts: 595
- Joined: Mon Jul 05, 2010 4:15 pm
Re: Fault tolerant OS
Department of Computer Science University of Illinois did a paper "Building a Self-Healing Operating System"
http://choices.cs.illinois.edu/selfhealing.pdf
It goes through a few techniques for "healing" an error.
http://choices.cs.illinois.edu/selfhealing.pdf
It goes through a few techniques for "healing" an error.
Re: Fault tolerant OS
Hi,
Linux is a special case - it maps all physical memory into kernel space (so any dodgy pointer anywhere in many millions of lines of code can corrupt anything that's in memory anywhere); so you can get all your hopes for fault tolerance and nail them to all your hopes for security, and glue on a few extra hopes (e.g. for decent NUMA optimisations), and then throw the that huge ball of hopes in the trash.
Cheers,
Brendan
That really depends on what kinds of faults you're trying to tolerate. A driver fails to initialise because it can't allocate enough memory? Easy. A single bit flip (if there's no memory encryption)? Maybe. A CPU failing while holding kernel locks? No.bharathm1 wrote:Do you think there is any hope for fault tolerant monolithic kernels?
If drivers are isolated it's either a micro-kernel or a hybrid (and is no longer a monolithic); regardless of whether that isolation is implemented with the hardware's virtual memory management or if it's done in software only, and regardless of whether the driver is still in an area that would've been considered "kernel space".bharathm1 wrote:I think ideas such as nooks where we isolate the address space of kernel drivers is a good line of research though it pushes some burden to driver programmers.
The main principle that is missing is isolation (that would prevent it from being called a true monolithic kernel if it existed).bharathm1 wrote:What are the main principles that the present monolithic kernels (eg. Linux) violating that made kernel terrible at fault tolerance?
Linux is a special case - it maps all physical memory into kernel space (so any dodgy pointer anywhere in many millions of lines of code can corrupt anything that's in memory anywhere); so you can get all your hopes for fault tolerance and nail them to all your hopes for security, and glue on a few extra hopes (e.g. for decent NUMA optimisations), and then throw the that huge ball of hopes in the trash.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.