Hi,
Colonel Kernel wrote:Brendan wrote:Of course I should probably mention that nobody really uses extended attributes, because (for e.g.) users have better things to do than make up thousands of descriptions for thousands of pictures that their wife doesn't know about. Basically extended attributes sound like a great idea until you try to figure out where the data for these extended attributes is meant to come from.
I like your examples!
In practice, people are more patient than you think. Even so, it helps when the source of metadata can be automated somehow. For example, CDDB can be used to automatically download metadata for music. I'm sure a similar database service exists for movies. I believe that a real "killer app" would be an AI that is able to take a few example photos and auto-tag the rest by learning what people and places look like, what clues in the picture suggest an approximate date (e.g. -- snow, leaves, holiday decorations, sunny beaches, etc.) It wouldn't have to be exact, just good enough to save people from doing most of the grunt work themselves.
If the metadata can be obtained automatically, then it could be obtained automatically when it's needed and not stored anywhere. In this case the extended attributes would cache the metadata (e.g. to improve the time it takes to obtain) rather than adding something new.
I'm also wondering how you'd avoid problems with context. For example, a friend takes some photos with a phone and sets the metadata to "My family on holiday", and then sends these pictures to you (including the metadata). Now you've got pictures that say "My family" when it's not your family at all. There's also a major internationalization issue here - e.g. metadata in one language isn't going to be useful to someone who doesn't understand that language.
mbluett wrote:Brendan wrote:Usually people who say they want to use a database instead of a file system only really want to change the terminology.
Your assumption of me wanting a normal filesystem" would be incorrect. I have a very comprehensive understanding of filesystems and how they work. I also understand how they can be viewed as a database. However, in comparison to a tru database there are some significant differences.
I am actually proposing the use of a "real" database and a database engine to drive it. Essentially there would be one file (or potentially several files that make up various table structures comprising the database). Inside this database, would exist all kinds of different information stored not necessarily in the form of a conventional file.
I'm proposing that a plain old boring file system (e.g. FAT) *is* a kind of database and code that supports a plain/boring file system is a kind of database engine.
I'm looking for things that make your database different to a normal filesystem, and reasons why it's better.
mbluett wrote:Databases are made up of table structures that contain fields of specific types and that typically have specific attributes. The database engines of today typically use SQL to retrieve from and store information to a database.
You could add an "SQL style query" front-end to any normal file system - it changes how people find the files, not the way those files are stored.
mbluett wrote:1. To get rid of the notion of having to understand how to traverse an hierarchical filesystem. Files stored in folders,
stored in folders, etc. The concept is usually easy to grasp. The problem is that the general public have great difficulty
determining where to find a file. In addition, they typically can't remember what the file was called even if they wanted
to do a search. So, they must search within files to find what they are looking for.
I'm the opposite - a hierarchical filesystem helps me find exactly what I'm after. For example, for OS development information I've got a large directory called "info" that contains hundreds of files, but this directory has many subdirectories (e.g. "video", "network", "CPU", etc). Using the hierarchical filesystem, if I want to find some errata for Pentium 4 CPUs I can find it fast because it's in the "/info/CPU/Intel/Pentium4/errata" directory. Without a hierarchical filesystem I'd be forced to use searches, which can be slow but more importantly aren't very precise - for example, maybe I use the word "netburst" in my SQL query and find nothing because I forgot the metadata called these files "Pentium 4" instead.
Of course adding an "SQL style query" front-end to a normal file system (and using indexing for the metadata to speed it up) would provide the best of both methods - people that aren't smart enough to store their files an an organized directory structure could open the "search" dialog box, try a few different searches until they get the query right, and then try to find the file they actually want from the list of files the query shows them (and smart people can just go directly to the file they want in their organized directory structure).
mbluett wrote: A query is passed to the database engine (usually in the form of SQL), the engine then implements the query and
returns the result. Granted as a database grows in size the searches can take longer. However, think of the searches
you do via Google: The responses are very fast. The Google database is probably huge.
Google is very fast because they're using over 450000 servers in parallel. You enter a query and the front-end sends that query to lots of computers, and each of these computers searches a tiny fraction of the database and returns it's results, then the results from all of the computers are combined. Unfortunately, most people don't have that many computers (and even if they did they wouldn't want to keep them running all the time just so they can find files).
mbluett wrote:2. To create more flexibility in being able to make simple additions to the descriptive information that can be stored with
files in todays O/S's. For example, if I wanted to add a descriptive field to indicate which application created a file it
would be a trivial matter with a database. However, with a conventional filesystem, it would involve re-compiling and
testing the new changes. And what about backward compatibility issues?
This can be done with the extended attributes that most OSs already support. You don't need to recompile anything just to add a new attribute (although for Linux almost everyone would need to enable support for extended attributes and recompile the kernel, because this sort of thing is so useful that almost everyone doesn't bother enabling it). Windows/NTFS always supports it.
The problem with existing systems is that most applications don't support extended attributes, there's often no tools to index and/or search for files based on information in the extended attributes, and there's no standards to say how extended attributes should be used. For example, I could write an application for Windows or Linux that creates an "application = my_application_name" extended attribute, and other applications might create a "program = my_application_name" extended attribute, and some applications might set a "my_application_name = yes" attribute; and nobody will know what to search for. Of course the same can happen for your database - e.g. my application creates a field in the database called "application" and sets this field to "my_application_name" for it's files, while other people's applications create different fields for the same purpose, and end users have no idea which database field/s they need to search in their query.
To avoid that problem you'd need to have some sort of specification that describes "standard fields" and their intended purpose (so all software uses the database fields in a compatible way), but if you're going to do that you might as well just add the "standard fields information" to the directory entries (e.g. filename, file size, file permission flags, file owner, file description, keyword list for the file, application that created the file, application that modified the file last, etc).
Cheers,
Brendan