Well, this week has been interesting so far...
So today the power went out (no big deal, I had stuff to do outside anyways), so when it finally came back on (after almost 5 hours...) I start up my computer, and walk out the room to get coffee knowing lubuntu would ask for my password by the time I got back.
Upon sitting back at my desk, I notice the boot logo is still on the screen, and (as most of us would do) I pressed Esc to see where it was in the process. Turns out it was recovering orphaned files, at 9000 something of 15000ish.
So needless to say the boot was slow this time around, and thankfully all the files were recovered correctly (as far as I can tell thus far, over 4TB of data spread across 6 internal and 3 external drives [yes most are small] so it's hard to say...)
This is not the first power outage, though they are rare. But this is the first time I have ever seen thousands of orphaned files (sometimes it's one or two - usually none) and the stranger thing is, I was at idle, I was reading WiKi pages - not accessing the drive myself, usually I am accessing the drive when the power goes out and have way fewer to no orphaned files.
And as with most computer issues this got me to thinking. I do not want a reactive approach in my OS (when it actually gets there) I want more of a proactive approach (if that's even possible).
So, shy of directly reading and writing to the drive (as in no cache), how would one prevent this type of thing from happening (at least to the scale it did this time)? And if not possible what would be the steps to recover from this sort of issue?
Currently (on version 0.0.3) I have only cached FAT and Root Directory (yea... I only have FAT support thus far...) and after every change (or a chain of changes) this is flushed back to the HDD. The reading and writing of actual files is live to the disk. Now obviously I have not even tested pulling the plug on it while transferring data (plus I hate having to reset the BIOS - OLD POS "dumpster" PCs).
Thanks for your time.
Orphaned Files.
- BASICFreak
- Member
- Posts: 284
- Joined: Fri Jan 16, 2009 8:34 pm
- Location: Louisiana, USA
Orphaned Files.
BOS Source Thanks to GitHub
BOS Expanded Commentary
Both under active development!
BOS Expanded Commentary
Both under active development!
Sortie wrote:
- Don't play the role of an operating systems developer, be one.
- Be truly afraid of undefined [behavior].
- Your operating system should be itself, not fight what it is.
Re: Orphaned Files.
No cache plus journaled writes?
Super slow but it should be bulletproof and recovery time should just be a few milliseconds.
That won't help with bad sectors, though. That will require additional infrastructure.
How about simply writing every sector twice? Is it worth losing half of your hard drive for guaranteed data integrity? Sort of a RAID 0.5?
How about double sector writes combined with compression?
Super slow but it should be bulletproof and recovery time should just be a few milliseconds.
That won't help with bad sectors, though. That will require additional infrastructure.
How about simply writing every sector twice? Is it worth losing half of your hard drive for guaranteed data integrity? Sort of a RAID 0.5?
How about double sector writes combined with compression?
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
- BASICFreak
- Member
- Posts: 284
- Joined: Fri Jan 16, 2009 8:34 pm
- Location: Louisiana, USA
Re: Orphaned Files.
Upon supporting EXTx FS (or finally rolling mine) I'll start doing journaled writes, until then I'll just hope not to have power issuesSpyderTL wrote:No cache plus journaled writes?
Super slow but it should be bulletproof and recovery time should just be a few milliseconds.
I'm still debating on how much cache I want in the OS, currently the FAT and RDir are cached to make finding files quick (no need to keep reading FAT over and over again, and every single access needs the RDir) and soon I may also cache recently used Dirs.
Though all honesty I really do not like the idea of write caches and the reason is simple:
So many times I have copied large amounts of data to a removable drive - Windows says the file copy is complete... but it is anything but, remove the drive (obviously not safely - as only 10% of the time does windows EVER let me "safely" eject a device, even if it has NEVER been accessed) place it in another computer. Open the file, NO DATA! Just to have to do the copy again (then I just let it sit for 5-10 mins before trying to remove it).
Just another reason I don't use Windows regularly...
(The main reason I use Windows is only for games that do not run [well] on linux, and I don't use Windows because said games BSoD TOO often... Driver Failures I assume... "IRQ Not Less Than Or Equal" mostly)
Well the good thing is I'm not worried about bad sectors too much. In my 7 (professional) years in IT I have seen more drives where the PCB goes out way before any sector issues. (plus if I were that worried I would have replaced two of my drives years ago when SMART started telling me they are failing - one may be but the other seems false positive)SpyderTL wrote:That won't help with bad sectors, though. That will require additional infrastructure.
How about simply writing every sector twice? Is it worth losing half of your hard drive for guaranteed data integrity? Sort of a RAID 0.5?
How about double sector writes combined with compression?
I hope it is needless to say, but I do not store important info (only windows and games) on the mentioned two drives.
EDIT: HA, just checked my e-mail and because of the title to this topic GMail marked the Topic Reply Notification as spam. (silly google)
BOS Source Thanks to GitHub
BOS Expanded Commentary
Both under active development!
BOS Expanded Commentary
Both under active development!
Sortie wrote:
- Don't play the role of an operating systems developer, be one.
- Be truly afraid of undefined [behavior].
- Your operating system should be itself, not fight what it is.
Re: Orphaned Files.
Aren't orphaned inodes things like files that are unlinked, but were still open when the crash occurred? In that case the "recovery" is just freeing the space they occupy. Though 15000 temporary files sounds like a lot indeed...BASICFreak wrote:Turns out it was recovering orphaned files, at 9000 something of 15000ish.
So needless to say the boot was slow this time around, and thankfully all the files were recovered correctly
I think first of all you need to understand the exact scenario we're talking about. If it's indeed unlinked, but opened files, you don't prevent it from happening. You just need to make sure that after a crash, you still know which inodes are orphaned, so you can clean them up. You can do that with a fsck-like operation, but that's obviously slow. As an optimisation, you might instead want to write the information to the journal.So, shy of directly reading and writing to the drive (as in no cache), how would one prevent this type of thing from happening (at least to the scale it did this time)? And if not possible what would be the steps to recover from this sort of issue?
The "only" thing you really want for FAT is that you order your FAT updates correctly if they touch more than one sector.Currently (on version 0.0.3) I have only cached FAT and Root Directory (yea... I only have FAT support thus far...) and after every change (or a chain of changes) this is flushed back to the HDD.
Specifically, you would want to avoid marking clusters as allocated, but not actually hooking them up anywhere, or you would leak those clusters. You definitely also want to avoid hooking up a cluster which isn't allocated yet, otherwise it could be allocated again and then you end up with cross-linked files (that is, filesystem corruption).
Unfortunately, FAT isn't made to avoid both problems at the same time when two consecutive clusters are described by different sectors in the FAT (in the same sector, you can update them atomically, so ordering is not a problem there). You get to choose which of the two problems to keep. Obviously, you should keep the leaked clusters rather than the corruption. This means that you need to flush between marking a cluster as allocated and actually hooking it up to the cluster chain of a file.
- BASICFreak
- Member
- Posts: 284
- Joined: Fri Jan 16, 2009 8:34 pm
- Location: Louisiana, USA
Re: Orphaned Files.
Well guys thanks for the input here.
I looked at my boot log and I was off by 10x (was only 1500 not 15000) mostly in /tmp and /var partition.
I have only found one file that I needed was "not right" (the last write was restored but not the most recent) - thankfully sublime text saved it's changes in a separate file, so I didn't loose all my ELF code.
Hopefully next post I have here will be under "What does your OS look like" - if not I may start a new thread about where I'm stuck now, but till then I'm trying my hardest to figure out my new API/ABI/IPC designs.
It's Wednesday night and I've only added ~10 lines of code since Sunday... (as first post said it has been an interesting week - I have been interrupted every single time I sat down to code thus far...)
I looked at my boot log and I was off by 10x (was only 1500 not 15000) mostly in /tmp and /var partition.
I have only found one file that I needed was "not right" (the last write was restored but not the most recent) - thankfully sublime text saved it's changes in a separate file, so I didn't loose all my ELF code.
Hopefully next post I have here will be under "What does your OS look like" - if not I may start a new thread about where I'm stuck now, but till then I'm trying my hardest to figure out my new API/ABI/IPC designs.
It's Wednesday night and I've only added ~10 lines of code since Sunday... (as first post said it has been an interesting week - I have been interrupted every single time I sat down to code thus far...)
BOS Source Thanks to GitHub
BOS Expanded Commentary
Both under active development!
BOS Expanded Commentary
Both under active development!
Sortie wrote:
- Don't play the role of an operating systems developer, be one.
- Be truly afraid of undefined [behavior].
- Your operating system should be itself, not fight what it is.
Re: Orphaned Files.
BTW it's lose, loose is it to free, remove ropes, etc.loose
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
- Alan Kay