OSDev.org

Posted: **Fri Dec 14, 2007 11:59 am**

stevenup7002 wrote:The mistake doesn't have to be a hard disk failiure, I didn't intend it for such big errors like that, what I meant was warnings done through a warning() or error() function. Its basically error reporting for the kernel that has a use.

Basically, to learn from its mistakes and develop a strategy of running.

OK, but you're still leaving things a bit vague, can you give an example? Are you saying, that if some driver has a bug in it, and the solution is to upgrade to the latest version of the driver, the OS remembers this solution? How would the OS recognize a bug, itself? And wouldn't it be more efficient to check for the latest driver upon seeing such an error rather than consulting a list of mistakes (over the network) and then deciding what to do based on that?

Surely you must have an example in your head while coming up with such a design where this would be an advantageous design? I don't understand why you haven't provided a simple example yet?

Posted: **Fri Dec 14, 2007 12:11 pm**

madeofstaples wrote:
stevenup7002 wrote:The mistake doesn't have to be a hard disk failiure, I didn't intend it for such big errors like that, what I meant was warnings done through a warning() or error() function. Its basically error reporting for the kernel that has a use.

Basically, to learn from its mistakes and develop a strategy of running.
OK, but you're still leaving things a bit vague, can you give an example? Are you saying, that if some driver has a bug in it, and the solution is to upgrade to the latest version of the driver, the OS remembers this solution? How would the OS recognize a bug, itself? And wouldn't it be more efficient to check for the latest driver upon seeing such an error rather than consulting a list of mistakes (over the network) and then deciding what to do based on that?

Surely you must have an example in your head while coming up with such a design where this would be an advantageous design? I don't understand why you haven't provided a simple example yet?

Ok, lets say the kernel or software wants to try somthing, but the software or kernel crashes trying it, it will record what caused the crash so it knows that it shouldnt try it again, or give a warning before retrying it, e.g. "The program crashed the last time this function was run, are you sure you want to try it again?"

I hope that explains it better, I'm not great at explaining stuff. Sorry

-Steve

Posted: **Fri Dec 14, 2007 12:18 pm**

That design is tried, true and existing. In high-availability environments systems are run with two, three or five at a time. The two at a time is for when one fails (knowingly) so the other can take over. The three-at-a-time is for similar systems where either the errors cannot be reliably determined or where no human can intervene to remove/replace the last system. The five setup is for paranoid people or really unreachable devices that are somewhat able to break.

There is a variation on the three-setup that consists of making the exact same specification three times - each on an as different as possible platform as possible. None could share the base library, OS, processor, disk manufacturer etc. so that a failure in any component could be detected by the other two systems agreeing and the third disagreeing (and hence being ignored).

Doing this over a network is more of a 1:N setup where N is arbitrarily large; similar to what Microsoft is doing with error reports arriving about any program failing.

Posted: **Fri Dec 14, 2007 12:21 pm**

Yes, but this isnt supposed to be a failsafe system, the networking part is just an example of two machines communicating with eachother at a very low level.

Posted: **Fri Dec 14, 2007 12:48 pm**

stevenup7002 wrote:Ok, lets say the kernel or software wants to try somthing, but the software or kernel crashes trying it, it will record what caused the crash so it knows that it shouldnt try it again, or give a warning before retrying it, e.g. "The program crashed the last time this function was run, are you sure you want to try it again?"

I hope that explains it better, I'm not great at explaining stuff. Sorry

Haha, that clears up quite a bit. I still find myself itching for some details, but I have a better sense of what you mean now.

Obviously, if the kernel crashes while tending to a problem, it has to write down what it's going to do before it actually does it. Then if the kernel finds itself initializing (because the computer crashed and restarted), it can then add the mistake experience to the list and make it available for other computers, is that right?

I don't know that I see this as a fundamentally different kernel design, but it's certainly an interesting idea. It'd be interesting to see an implementation, because I'd expect there're a lot of things that could go wrong (i.e, what if the kernel is working on the correct solution to a problem, and then there's a power failure? Kernel adds that method to the mistakes list and refuses to try it from now on -- never fixing the problem).

Posted: **Fri Dec 14, 2007 5:26 pm**

madeofstaples wrote:
stevenup7002 wrote:Ok, lets say the kernel or software wants to try somthing, but the software or kernel crashes trying it, it will record what caused the crash so it knows that it shouldnt try it again, or give a warning before retrying it, e.g. "The program crashed the last time this function was run, are you sure you want to try it again?"

I hope that explains it better, I'm not great at explaining stuff. Sorry
Haha, that clears up quite a bit. I still find myself itching for some details, but I have a better sense of what you mean now.

Obviously, if the kernel crashes while tending to a problem, it has to write down what it's going to do before it actually does it. Then if the kernel finds itself initializing (because the computer crashed and restarted), it can then add the mistake experience to the list and make it available for other computers, is that right?

I don't know that I see this as a fundamentally different kernel design, but it's certainly an interesting idea. It'd be interesting to see an implementation, because I'd expect there're a lot of things that could go wrong (i.e, what if the kernel is working on the correct solution to a problem, and then there's a power failure? Kernel adds that method to the mistakes list and refuses to try it from now on -- never fixing the problem).

Thats exactly what it is.

Posted: **Fri Dec 28, 2007 12:48 pm**

I seems that the idea is very good ...

Posted: **Fri Dec 28, 2007 10:02 pm**

Thanks

Posted: **Fri Mar 14, 2008 12:46 pm**

I think the problem is there's no way to reliably predict transient or permanent errors without a formal proof.

If you ask your kernel to factor a product of two large prime number in polynomial time, or similar question, and it can't do it, that's not to say it can't be done in the future because we still have proven P=NP or P!=NP. If in the future we find out how to solve that problem, all of a sudden your kernel can now do it, but if it's made a decision that this is a permanent error, your kernel won't respond to the user's request.

What if there's a mistake in reporting a mistake

If you want an honest feedback, which I think you do since you posted this, I would focus your kernel design on other issues, unless you want to delve deep in to AI and computability theory.

OSDev.org

Stereolithic Kernel?

The idea is good ...