OS in a non-volatile RAM system?

Artlav · Post by **Artlav** » Tue Feb 04, 2014 5:20 pm

I envision this as a discussion thread, to get opinions on how can one design an OS for the system described.

Let's say we have a computer with most of it's RAM being non-volatile.
However, the CPU registers are volatile, a page of fast RAM (i.e. for core OS code or stack) is volatile, some of the hardware state is volatile.

The goal is to make this computer power loss proof.
You turn it off cold, turn it on, and it keeps on going from the same place.
An old computer designer dream.

How do you handle this, on OS/software level?
Case A: There is a power loss interrupt, triggered a few hundred cycles before the system dies.
Case B: There is no indication that a power failure is imminent.

In both cases we have a problem of resuming operation after a power loss.
Most of the data will still be in place, but we will be missing critical information, like registers, IP, SP, whatever was put into fast RAM, and the hardware might need to be re-initialized.

The death point could have been during regular operation of some task or the other, during an I/O operation, or even during a kernel operation.

In both cases it makes sense to store task manager state in the main RAM.

In case A, once the interrupt fires we can save the current task state in main RAM, set a flag, and go into a loop awaiting death or all clear.
On resume, we see a flag, and resume operations from saved state.

Problem 1 - if it was in the middle of an I/O operation, it would have to be restarted. Who's job is it?
Should a driver get a power loss exception of some sort?
Problem 2 - power loss interrupt should have the highest priority, so it can interrupt any int handler or critical section, even in the kernel, or risk being missed. Can that lead to corrupt state?

In case B, any restart is handled as a power loss, unless a shutdown flag is set.
We should know with a certain degree of certainty that we were in a middle of such and such task (task manager info is stored in main RAM, OS entry and exits can be flagged in there too?).
So, we can resume this task from a default point.

Problem 1 - if power fails during the task manager updating the current task state, we will end up in an invalid state. Is there an algorithm to ensure such an update going on without data loss, while under threat of a sudden stop?
Problem 2 - we don't know where exactly in the task/driver/OS the power loss happened, so how do we get back to work? A general power loss exception, and let the part handle it on it's own?

Can case B be sufficient for getting the job done, or is an interrupt from case A necessary?
Can case A be sufficient for application-level transparency?
Are there any more large problems in either case?

Basically, given such a system and goal, how would you design an OS?

Love4Boobies · Post by **Love4Boobies** » Tue Feb 04, 2014 9:32 pm

I don't get it. If the hardware support for such an interrupt is available to you on a system with a battery, why don't you save the whole state, including the registers? I can think of three things you may need to take special action with: real time tasks, network connections, and privacy. I won't go into details because it obviously depends from case to case.

Artlav · Post by **Artlav** » Wed Feb 05, 2014 7:51 am

Love4Boobies wrote:I don't get it. If the hardware support for such an interrupt is available to you on a system with a battery, why don't you save the whole state, including the registers?

Because there might not be a hardware support for such an interrupt.

Love4Boobies · Post by **Love4Boobies** » Wed Feb 05, 2014 7:53 am

But the interrupt you speak of already is hardware support for this. You just invented a poorly designed fault-tolerant system where you wish to solve a hardware problem in software. Normal systems don't have such an interrupt and if fault tolerance is desired, other techniques need to be employed (rather than taking special actions on a power outage, you want to take precautions that your data is always saved, as rarely as possible in a volatile state, and that you can recover it).

Artlav · Post by **Artlav** » Wed Feb 05, 2014 8:37 am

Hm. So, not to include such an interrupt would be poor hardware design.
Makes sense, even if it makes things harder for the hardware designer.

Then, you mentioned possible problems with privacy on resume - what were these about?

Owen · Post by **Owen** » Wed Feb 05, 2014 9:53 am

Artlav wrote:Then, you mentioned possible problems with privacy on resume - what were these about?

For one - not saving cleartext cryptographic keys to non-volatile storage

(This probably requires some form of TPM in order to protect said keys)

bluemoon · Post by **bluemoon** » Wed Feb 05, 2014 9:56 am

Instead of speaking an "OS in a non-volatile RAM system", what's the goal?

If you want to solve the power failure issue, there are already many working/economic solutions.
Basically you design a system with backup battery/UPS, once you detect outage from normal power supply, you do data flush / hibernate.
This idea works from RAM-based hybrid hard-disk to full computer system.

OS in a non-volatile RAM system, however, is totally different thing.
Also note that most hardware components require reset upon power failure, this makes non-volatile RAM less useful, or otherwise very complex to work with.
And since non-volatile RAM usually works at terrible performance or extreme high cost, most people would just do the battery / hibernate way.

If you really want to test your idea, try configure your computer with 16MB ram and 8G of swap space, it's slow even on SSD.

Owen · Post by **Owen** » Wed Feb 05, 2014 10:11 am

bluemoon wrote:Instead of speaking an "OS in a non-volatile RAM system", what's the goal?

If you want to solve the power failure issue, there are already many working/economic solutions.
Basically you design a system with backup battery/UPS, once you detect outage from normal power supply, you do data flush / hibernate.
This idea works from RAM-based hybrid hard-disk to full computer system.

OS in a non-volatile RAM system, however, is totally different thing.
Also note that most hardware components require reset upon power failure, this makes non-volatile RAM less useful, or otherwise very complex to work with.
And since non-volatile RAM usually works at terrible performance or extreme high cost, most people would just do the battery / hibernate way.

If you really want to test your idea, try configure your computer with 16MB ram and 8G of swap space, it's slow even on SSD.

Please go and investigate up and coming technologies like FRAM and MRAM. If your concern is speed, note that DRAM speed has, in real terms, increased about 10% in the last decade (bandwidth has been increased by ever wider banking and ever higher latencies) and is running out of steam on the shrinkage front (Part of this, granted, is because there is no money in DRAM). Flash has physical issues as it gets ever smaller (Today flash needs intensive and scary levels of error correction; every shrink getting more and more precarious and scary)

bluemoon · Post by **bluemoon** » Wed Feb 05, 2014 10:35 am

Owen wrote:Please go and investigate up and coming technologies like FRAM and MRAM.

Interesting. Technologies advanced everyday.

Artlav · Post by **Artlav** » Wed Feb 05, 2014 1:19 pm

bluemoon wrote:what's the goal?

Doing something unusual with a hobby project.

bluemoon wrote:OS in a non-volatile RAM system, however, is totally different thing.
Also note that most hardware components require reset upon power failure, this makes non-volatile RAM less useful, or otherwise very complex to work with.
And since non-volatile RAM usually works at terrible performance or extreme high cost, most people would just do the battery / hibernate way.

The system in question is sitting on my table.
Not much to brag about - a 12.5Mhz 32bit CPU with 4Kb of volatile RAM (2 clocks per access), 64Kb of non-volatile FRAM (4 clocks per access, and that only due to my inability to solder bigger chips), 32Kb of slow FRAM (essentially a memory-mapped EEPROM), MicroSD slot, UART, some GPIO.
It's a hobby project in hardware design, which have no software but some tests at the moment.
So, i wanted to try making something interesting out of it.

The big idea is a computer that can be powered off (no battery tricks except for RTC) and resumed without it losing any data.
Like an old school mechanical adding machine - you stop turning the handle, it stops, you resume turning the handle - it resumes working as if nothing happened.
I've tried to achieve something like that when i made an open-source solar powered ebook reader.
From the user point of view it worked, but on the low level it was just a regular microcontroller saving relevant state at every possible point to a small piece of FRAM memory, giving the illusion.

So, i wanted to try making the real thing.
And that raised the question on how would the software on such a system differ from software on regular computers.
Thus, this thread, posted before i begin doing anything.
Just in case there are some interesting ideas i haven't though of.

Combuster · Post by **Combuster** » Wed Feb 05, 2014 3:56 pm

Well, what I would imagine is that you keep the stack in non-volatile RAM, and only use the fast volatile RAM (and registers) for temporaries - and declare it all to be caller-saved so it has to be stored on a function call if meaningful. If you make the arguments to a function immutable, then it would be possible to restart execution on a per-function boundary if you end up losing the register state. It would still take some further specification to make it work but the ABI would basically say that the faster you want to make things, the more you have to redo on a power loss event.

At any rate, you will have to do some trickery with the compiler to make it emit code that functions like this.

Brendan · Post by **Brendan** » Wed Feb 05, 2014 8:17 pm

Hi,

Combuster wrote:Well, what I would imagine is that you keep the stack in non-volatile RAM, and only use the fast volatile RAM (and registers) for temporaries - and declare it all to be caller-saved so it has to be stored on a function call if meaningful. If you make the arguments to a function immutable, then it would be possible to restart execution on a per-function boundary if you end up losing the register state. It would still take some further specification to make it work but the ABI would basically say that the faster you want to make things, the more you have to redo on a power loss event.

At any rate, you will have to do some trickery with the compiler to make it emit code that functions like this.

Register state is the least of your problems. I'd worry more about write-back caching (needed for speed) - if there is no warning that power is about to fail, then all modified data in cache gets lost and your non-volatile RAM ends up being corrupted.

Note: There is no warning signal/interrupt for power loss in most desktop computers (which don't have a UPS). For laptops there is a low battery warning (which won't work if the battery is removed while the OS is running), and for UPSs there may be some communication (which won't work if computer is simply unplugged).

In general; I'd be more tempted to expect applications to do their own "checkpointing". E.g. every X seconds the application saves the minimum state needed to allow the application to be restored. This solves problems involving network connections and not knowing when a power failure is about to occur in advance; and also avoids the need to care about register and cache state. The interesting thing is that this technique is quite possible now (saving checkpoint data to disk instead), but no OS has bothered. This makes me wonder if most OSs would continue doing the same things in the same ways if/when non-volatile RAM becomes common; with no real difference (and no benefit from non-volatile RAM) other than for "hibernate".

Cheers,

Brendan

Nable · Post by **Nable** » Thu Feb 06, 2014 2:50 am

Brendan wrote:The interesting thing is that this technique is quite possible now (saving checkpoint data to disk instead), but no OS has bothered.

This link may be relevant - http://criu.org/Main_Page (a project to implement checkpoint/restore functionality for Linux).
And here's a link to a page about Checkpoint/Restart for Open MPI applications - http://www.crest.iu.edu/research/ft/ompi-cr/ .
There were other attempts to bring such support. And some applications have such support; for example, when MS Office or Open Office crashes, it restores the document after restart and it seems to me (I had such crashes a lot of times) that data loss is rather minimal.

Upd: hm, i've found an aggregated article: http://en.wikipedia.org/wiki/Application_checkpointing

OSDev.org

OS in a non-volatile RAM system?

OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?

Re: OS in a non-volatile RAM system?