Interesting filesystem features
Interesting filesystem features
I'm designing the filesystem for my OS, and I'm collecting ideas for features. I already have a pretty good idea about a robust on-disk structure, so the kind of features I'm looking for are not journaling or B-trees.
I'm more interested in things like undelete, file versioning or file expiry dates. Some kind of hierarchical storage is also interesting: files that look like they are on the file system, but are actually stored as pointers to some other location and are cached locally (or the other way around). One feature I have already decided to have is to overwrite files when they are deleted (called ERASE_ON_DELETE in VMS).
Anyone wanna share and argue for some features?
I'm more interested in things like undelete, file versioning or file expiry dates. Some kind of hierarchical storage is also interesting: files that look like they are on the file system, but are actually stored as pointers to some other location and are cached locally (or the other way around). One feature I have already decided to have is to overwrite files when they are deleted (called ERASE_ON_DELETE in VMS).
Anyone wanna share and argue for some features?
Re: Interesting filesystem features
VMS ODS FS versioning is a feature that I really admire.
Eg when you edit a txt file and save it, a version number gets incremented appended to it , you can redit the previous file using EDIT FILE;-1 . or directly specify the version number EDIT FILE;2
--Thomas
Eg when you edit a txt file and save it, a version number gets incremented appended to it , you can redit the previous file using EDIT FILE;-1 . or directly specify the version number EDIT FILE;2
That is not the default VMS behaviour.(called ERASE_ON_DELETE in VMS).
--Thomas
Re: Interesting filesystem features
Many modern operating systems include their own form of versioning - File History in Windows, Time Machine in OS X. ZFS allows the same on FreeBSD and Solaris. And this was an integral feature of Tivoli Storage Manager that made life so easy for me when I administered a large network.
Re: Interesting filesystem features
File versioning is absolutely an excellent feature, and immutable versioned files is something I'm seriously considering (after reading about the Cedar File System).
It seems to me that if ZFS snapshots (and similar) are meant to be used for file versioning then they can at best be a form of ersatz file versioning. File versioning is widely misunderstood (there is also a widespread belief that revision control systems are equivalent to it).
I have a tendency to believe too much good about a new idea when I first hear it. I've read about an old OS (probably from DEC) where the filesystem really knew about the files it was storing (or had stored). Files would routinely (but not repeatedly) be automatically copied to tape. This system had hierarchical storage, so if space was low, it would reclaim a file's sectors but leave a pointer to the tape. Later if the user wanted to read the file then the tape would be read (I think even prompting the sysadmin to change tapes if necessary). When files were deleted, normally they weren't deleted until they had hit tape. The system knew where all these files were and had been. If the sysadmin wanted to restore a directory from some previous date then the system would tell him which tapes to load. The granularity of this system is much finer than what snapshots can provide.
Imagine my disappointment when I later realised that ZFS snapshots aren't quite like that. I'd really appreciate it if someone knew the name of that system. Do any modern systems work similarly, maybe Tivoli?
Filesystems have a property that is rarely mentioned. If some features are not built-in from the very start then they can't be added later. Some features are equivalent to changing the API. I want to get a good set of features from the start and POSIX compatibility is not going to be in the way.
It seems to me that if ZFS snapshots (and similar) are meant to be used for file versioning then they can at best be a form of ersatz file versioning. File versioning is widely misunderstood (there is also a widespread belief that revision control systems are equivalent to it).
I have a tendency to believe too much good about a new idea when I first hear it. I've read about an old OS (probably from DEC) where the filesystem really knew about the files it was storing (or had stored). Files would routinely (but not repeatedly) be automatically copied to tape. This system had hierarchical storage, so if space was low, it would reclaim a file's sectors but leave a pointer to the tape. Later if the user wanted to read the file then the tape would be read (I think even prompting the sysadmin to change tapes if necessary). When files were deleted, normally they weren't deleted until they had hit tape. The system knew where all these files were and had been. If the sysadmin wanted to restore a directory from some previous date then the system would tell him which tapes to load. The granularity of this system is much finer than what snapshots can provide.
Imagine my disappointment when I later realised that ZFS snapshots aren't quite like that. I'd really appreciate it if someone knew the name of that system. Do any modern systems work similarly, maybe Tivoli?
Filesystems have a property that is rarely mentioned. If some features are not built-in from the very start then they can't be added later. Some features are equivalent to changing the API. I want to get a good set of features from the start and POSIX compatibility is not going to be in the way.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Interesting filesystem features
ZFS Snapshots are designed to allow reliable upgrades - create a new "copy on write" snapshot of the system volume, do the upgrade on that, and, if successful, reboot into it (else delete the snapshot)
Re: Interesting filesystem features
I think it's even more interesting if file versioning is optional on a per-file basis. There are really only a few types of files where versioning is quite useful.
All my commercial OSes fill my HD with crap -- I'd like to know which files have never been accessed during their entire existence (so I can delete them), so I want a "last accessed time" that really means what it says (not just "touched" or "searched" or whatever).
I want every file that is created to be backwards associated with the software package that created it, so I know what software is responsible for filling my drive with crap. WITH a short text description provided by the software, regarding the purpose of the file.
Many types of media degrade over time, so I'd also like a "magnetic recording time" that a daemon can scan to keep files "magnetically fresh" on the disk, and less prone to corruption or having sectors become unreadable.
One of my favorite features in my FS is "file heads". If you have a file that is 10 bytes long, or 522 bytes long, or whatever -- how do you store it efficiently in 512 byte sectors? You peel off the extra bytes from the beginning of the file, and store them in the file's directory entry instead.
Another feature of mine that I like is dynamic partition allocation. A partition full of unused inodes will be slow to use sometimes, because all the unused inodes need to be scanned (for corruption, or whatever) or cached (uselessly). Also, the empty space in a partition can be used very effectively for swap space, so leaving that space open has other advantages. So I only keep a reasonable number of empties (clusters in my FS) and grow dynamically into the unused space as I need to.
Also, a truly standard optional per-file transparent automatic compression is nice.
But for me, the very most important feature of all is for the FS to have a very dense info structure, so the entire tree of directories and files can be cached in RAM. (This goes along with minimizing the number of trash files in the OS.)
All my commercial OSes fill my HD with crap -- I'd like to know which files have never been accessed during their entire existence (so I can delete them), so I want a "last accessed time" that really means what it says (not just "touched" or "searched" or whatever).
I want every file that is created to be backwards associated with the software package that created it, so I know what software is responsible for filling my drive with crap. WITH a short text description provided by the software, regarding the purpose of the file.
Many types of media degrade over time, so I'd also like a "magnetic recording time" that a daemon can scan to keep files "magnetically fresh" on the disk, and less prone to corruption or having sectors become unreadable.
One of my favorite features in my FS is "file heads". If you have a file that is 10 bytes long, or 522 bytes long, or whatever -- how do you store it efficiently in 512 byte sectors? You peel off the extra bytes from the beginning of the file, and store them in the file's directory entry instead.
Another feature of mine that I like is dynamic partition allocation. A partition full of unused inodes will be slow to use sometimes, because all the unused inodes need to be scanned (for corruption, or whatever) or cached (uselessly). Also, the empty space in a partition can be used very effectively for swap space, so leaving that space open has other advantages. So I only keep a reasonable number of empties (clusters in my FS) and grow dynamically into the unused space as I need to.
Also, a truly standard optional per-file transparent automatic compression is nice.
But for me, the very most important feature of all is for the FS to have a very dense info structure, so the entire tree of directories and files can be cached in RAM. (This goes along with minimizing the number of trash files in the OS.)
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Interesting filesystem features
How do you define "last accessed"? Does the file browser opening the file (to try and extract metadata) count as "last accessed"?bewing wrote:All my commercial OSes fill my HD with crap -- I'd like to know which files have never been accessed during their entire existence (so I can delete them), so I want a "last accessed time" that really means what it says (not just "touched" or "searched" or whatever).
I have a simple method of establishing the originator of a file: Applications write access is locked down to special directories (plus read access is also locked down); some examples:bewing wrote:I want every file that is created to be backwards associated with the software package that created it, so I know what software is responsible for filling my drive with crap. WITH a short text description provided by the software, regarding the purpose of the file.
- R/W ~/Library/Local/Cache/Application - Cache (deleted as needed to reclaim space in LRU order)
- R/W ~/Library/Local/Preferences/Application - Machine local preferences (e.g. game graphics options, which are unlikely to be correct on a different machine)
- R/W ~/Library/Preferences/Application - Preference files
- R/W ~/Library/Saved Games/Application - Saved game files
- R/w /Library/Preferences/Application - Global preferences (Writable if user is admin)
- R/w /Library/Preferences/Local/Application - Machine-specific global preferences (Writable if user is admin)
- R/W /Library/Local/Protected/Cache/Application - Protected global cache (i.e. R/W only to that application and admins, same LRU policy to user cache)
- R/W /Library/Protected/Application - Protected global data (i.e. R/W only to that application and admins)
- R/_ /
- _/_ ~
File open/save dialogs would be provided by a service running with the "User" token; applications would then only be able to open files relative to the selected file(s) with certain limitations (e.g. only files in the same directory and deeper) without requesting explicit permission. "Recent documents" lists would be handled by marshalling the file reference with a digitally signed security token; whether the user is asked for permission again when opening the file is dependent upon your local security policy.
~ and /Library are "remoted" (i.e. network mounted, with caching) when the machine is joined to a domain (they mount the directories pointed to by the session initator' directory server (i.e. whoever logged in at the login dialog)). ~/Library/Local and /Library/Local are both redirected back to the local machine
Re: Interesting filesystem features
reiser4 plugins?Meddler wrote:Filesystems have a property that is rarely mentioned. If some features are not built-in from the very start then they can't be added later.
Learn to read.
Re: Interesting filesystem features
Amongst other uses. They also provide a very effective file versioning ability - Solaris uses them for this purpose, as does FreeNAS (just examples).Owen wrote:ZFS Snapshots are designed to allow reliable upgrades
Re: Interesting filesystem features
Obviously not. I want to know when the actual contents of a file are used. Not merely scanned or copied. That is my point about "last accessed" -- the way it's usually implemented is not useful to me, or to most users, I think.Owen wrote:How do you define "last accessed"? Does the file browser opening the file (to try and extract metadata) count as "last accessed"?bewing wrote:All my commercial OSes fill my HD with crap -- I'd like to know which files have never been accessed during their entire existence (so I can delete them), so I want a "last accessed time" that really means what it says (not just "touched" or "searched" or whatever).
Good stuff. I like it. The only quibbles I would have to that setup would be for a more aggressive tempfile setup (that is, specifically aimed at files that become worthless on program termination, not just LRU), and for sandboxing. Also, that's a lot of directories per executable that may end up unused, so I'd hope there would be a way of getting rid of empty app directories.I have a simple method of establishing the originator of a file: Applications write access is locked down to special directories (plus read access is also locked down); ....
Re: Interesting filesystem features
That's pretty vague, using a file's contents can include anything. I don't know what you mean by it and thus what I'm interested in is how you determine whether the contents are used or not.bewing wrote:Obviously not. I want to know when the actual contents of a file are used. Not merely scanned or copied. That is my point about "last accessed" -- the way it's usually implemented is not useful to me, or to most users, I think.Owen wrote: How do you define "last accessed"? Does the file browser opening the file (to try and extract metadata) count as "last accessed"?
On topic stuff: I'm surprised no one mentioned file checksumming and mirroring (a la ZFS) as a very nice to have feature, unless you include file checksumming under journaling.
Re: Interesting filesystem features
if I understand correctly, he may want to add another access counter that is increased only by the "Open..." mechanism (i.e. browse file dialog, "Open Recent File" menu, etc ), to provide a protocol for application to hint the OS and differentiate "actual access by user"
By the way, the way last access time is implemented on most OS is not SSD friendly...and I agree it is not useful and thus I've disabled it on my SSD drive.
By the way, the way last access time is implemented on most OS is not SSD friendly...and I agree it is not useful and thus I've disabled it on my SSD drive.
Re: Interesting filesystem features
My point would be (and the point of Owen's app directory structure is) that almost all files on any system are application-specific. For application-specific files, only accesses by that one application should update the last-accessed time. (If "last accessed time" is supported by the media, as bluemoon says. WORM drives have the same issue.)
There are also a handful of files that have generic formats (text files, & etc.). For those files, the application that opens it should update the time. Basically, user apps update the time, and system apps do not.
In both cases, the application knows whether or not to update the time. The OS cannot and should not guess, and should not simply always do it.
There are also a handful of files that have generic formats (text files, & etc.). For those files, the application that opens it should update the time. Basically, user apps update the time, and system apps do not.
In both cases, the application knows whether or not to update the time. The OS cannot and should not guess, and should not simply always do it.
Re: Interesting filesystem features
Okay, got it. Your option may lead to some behavior inconsistency between applications of course, but reading from your response that is allowed.My point would be (and the point of Owen's app directory structure is) that almost all files on any system are application-specific. For application-specific files, only accesses by that one application should update the last-accessed time. (If "last accessed time" is supported by the media, as bluemoon says. WORM drives have the same issue.)
Imho it doesn't sound like a file system feature but an API choice. The features necessary for your idea to work is to make sure your file system supports meta data (for storing file types and be able to store an access timestamp. Of course, you don't need meta data for determining a file type (i.e. determine it with the filename's suffix) but I think that's hacky. Unfortunately, it's the only portable way to transfer file types between hosts.
Another nice feature to have is being able to store meta data about a file such as type, source (URI, DVD, etc), author of the file, etc.