Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Love4Boobies wrote:For hibernation, you don't need to serialize anything; you just need to gracefully handle things that can't be saved, save the rest (deciding on the two categories can be tricky, too; e.g., consider tasks where timing is relevant---but several solutions come to mind), and then restore them when you're up and running again.
I think we're just using terminology differently - "save" = "serialize and write to disk". Of course, that serilaization may be nearly transparent, making it essentially a binary dump, but it need not be. I used the term "serialize" as you might not be "saving" it ie. you might not do the "writing to disk" stage.
Love4Boobies wrote:However, the point about live updates is that you need to run this state through transfer functions that do some translation, otherwise you might end up with new code accessing different and/or buggy data that old code acquired. These transfer functions must have intimate knowledge about the differences between the behaviors of the two versions and can get extremely complex.
Hence I said it would probably only work for minor updates
madanra wrote:I think we're just using terminology differently
Fair enough.
madanra wrote:Hence I said it would probably only work for minor updates
The problem, however, is that someone has to asses whether whether the update is minor or not and that's an error-prone process, especially in a multi-threaded environment. A simple bug fix where a variable is incremented somewhere may result in a buffer overflow whereas a huge addition of some sort might have no negative consequences. I think that this approach neither scales nor is general enough.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
Love4Boobies wrote:The problem, however, is that someone has to asses whether whether the update is minor or not and that's an error-prone process, especially in a multi-threaded environment. A simple bug fix where a variable is incremented somewhere may result in a buffer overflow whereas a huge addition of some sort might have no negative consequences. I think that this approach neither scales nor is general enough.
I think this is where serialisation becomes more relevant - as long as the serialisation format is the same between two kernels, the new kernel should need no additional checks than it would perform anyway when deserialising. As for multithreading, during the kernel update everything else should be paused, so I don't think this is an issue.
It's a huge issue because the old code and the new code would most likely not produce the same data given the same circumstances. You somehow need to turn the old data into something the new code would produce given the same history.
I presented a dummy example a few posts ago... Consider that code and the following using the routines from there:
Notice how task has the same side effect in both versions. However, if you update between init and run, then count will be larger than it should by 1. A transfer function would know which state the program is in and decrement count in this particular situation. However, it wouldn't if updating occured before running init, for instance.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
Ah, I get what you're saying now. I think I had not realised this before because my OS design (little of it that there is) is a microkernel, and thus would not have any long running tasks in kernel mode. In this case, the situation you're describing shouldn't happen, or would be easily avoidable. I can see that you could not make the same assumption for a monolithic kernel, as there may be kernel processes which must be resumed rather than just restarted.
The modularity of a microkernel makes it a lot easier since you have a fixed collection of resources to care about rather than the entire world of drivers and infrastructure, and thus you need only a limited amount of migration code to deal with modifications to the .data section layout, rather than forcing driver resets of having state migration code working for everything at once.
Basically, you are likely to get sockets and files remaining open for free for as long as those drivers aren't affected by an upgrade themselves.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Well, hopefully, even in a monolithic kernel things would be loosely coupled. However, he claimed that the problem doesn't occur or can be avoided in microkernels, not that the damage is constrained. This I can agree to, of course.
That said, all it takes is a few precautions (i.e., design your system with live updates in mind as discussed earlier) and none of this will be an issue.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
Love4Boobies wrote:Well, hopefully, even in a monolithic kernel things would be loosely coupled. However, he claimed that the problem doesn't occur or can be avoided in microkernels, not that the damage is constrained. This I can agree to, of course.
That said, all it takes is a few precautions (i.e., design your system with live updates in mind as discussed earlier) and none of this will be an issue.
As I understand it, to avoid the problem you mentioned, you need to update the kernel when it isn't "in the middle of something". I think all I was trying to convey was that it's an easier condition to satisfy if you have a microkernel that does almost nothing and has no long running tasks compared to a monolithic kernel that might - in other words, the precautions you need to take are easier in a microkernel! But it's true that it doesn't come for free in a microkernel, and can still be done in a (modular/loosely coupled) monolithic kernel - I think we are basically on the same page, it's just taken me a while to realise it
Apologies to the OP - this thread has taken a turn to live kernel updates, rather than general OS updates, which has largely been my fault!
As for hibernation images not being binary compatible across updates, you could flag that this update requires all processes to close.
Another way of avoiding this incompatibility is moving the responsibility of being persistent from the kernel to the applications themselves. For example, send a "hibernate" message to the application and wait for it to save its own state to disk, then start the process with a "resume" message and the state file. This would only work for responsive event-based applications, a process stuck in a 60 minute processing loop wouldn't respond to this.
MessiahAndrw wrote:Another way of avoiding this incompatibility is moving the responsibility of being persistent from the kernel to the applications themselves. For example, send a "hibernate" message to the application and wait for it to save its own state to disk, then start the process with a "resume" message and the state file. This would only work for responsive event-based applications, a process stuck in a 60 minute processing loop wouldn't respond to this.
When I mentioned that there exist solutions to hibernation interfering with certain applications (e.g., those for which timing is important), I basically had two things in mind: a hibernation signal followed by a customizable timeout or the user's choice to force hibernation, or having the OS remember which applications used potentially problematic resources (e.g., timers, network connections) and then announcing to the user that they might not function correctly after waking up. Solution #1 is the preferred one, of course, but #2 works better with legacy programs.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
Depends on your definition for "best." According to mine, I would send a hibernation signal to notify processes that they should prepare for hibernation, since they may need to take special actions (e.g., Should connections remain open? Will there be timing issues?). If they don't signal back, either kill them or force hibernation and notify the user which processes did not cooperate and that there might be side effects. It's possible that some things (esp. OS services and/or drivers) can safely be killed and restarted later so these won't need to be saved. Next, save the states that do need saving to NV storage, provided such a thing exists and there is enough space (either make sure there is in advance or kill things according to some policy, possibly asking for user intervention). When the system wakes up, restore the states from where they had been saved and resume all activies.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
Love4Boobies wrote:possibly asking for user intervention
The problem is that the user may not be able to see requests for intervention. What if the machine is a laptop and hibernation was triggered by closing the lid?
I think I worded my response carefully. If the lid is closed and one or more processes don't signal back in time, there are plenty of things you could do. First of all, you may use sound to notify the user. If he doesn't take action in time, you could
forcefully hibernate and notify later that things may not be reliable.
start killing.
act according to some configurable policy (e.g., the user might decide in advance on a per-program basis).
etc.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]