Hi,
mbluett wrote:I'm the opposite - a hierarchical filesystem helps me find exactly what I'm after.
Again, I'm not intending this O/S for people like yourself. I am intending it for the general public who do not know how to do the things you do.
Ok, so how many people (in your intended market) are like me and how many people aren't? If it helps some people but annoys others, are you helping more people than you annoy? Is there any reason you couldn't provide both methods (SQL style query and hierarchical tree), and annoy nobody?
I'd assume that the majority of company/office/business computers do use a deliberately organised hierarchical tree (e.g. something setup but some sort of administrator, involving shared directories used by many client machines). I'd also assume that about 25% of home users are like me, but it's hard to guess without checking a decent sample of machines (as opposed to checking the misleading sample of machines who's owners needed help).
mbluett wrote:You could be right. I have no knowledge of how they run their servers. How did you come across the information that told you this is the way Google searches work?
I read an article about it a few years ago. The "over 450000 servers" part came from Wikipedia. Also note that not all computers would be used for each query - for example (numbers made up here) they might have 45 sites with 10000 servers per site, where each query goes to 1000 servers (and where this set of 1000 servers is duplicated 450 times to increase the ability to handle many queries at the same time).
mbluett wrote:I am aware of the existing fields where metadata can be added under XP. However, how does one add new fields? I have seen that you can add pre-defined field types to a Word Document, for example. But tell me how you would do this with a URL file. If you right-click on the file I cannot see any place where you can add ANY metadata.
A more interesting question would be: if OSs like Windows and Linux support the underlying functionality, why hasn't anybody bothered to add user controls to it (like a metadata search tool and indexing, and dialog boxes to set/change the metadata)? Regardless of what you think of Microsoft's developers and/or Linux developers, if it was a potentially very useful feature then surely it'd exist everywhere already. Is it possible that all of these Microsoft/Linux developers don't think it's worthwhile (except perhaps in specific situations, like indexing a user's photo or music collection)?
mbluett wrote:In a lot of the general public's machines I have seen, they have files scattered all over the place. Sometimes there are copies of copies of copies. Sometimes this has occurred by accident. Once this occurs they have no idea if they can delete any of the files without losing something.
This kind of condition can be prevented when using a database as there is no need to make copies of anything unless it is being saved for backup purposes.
Prevented, or just changed? For your database to remain useful you need to rely on users to make sure the metadata is set correctly (e.g. when they download files without any metadata the user will need to waste a few minutes writing a description, setting keywords, etc). Would a sloppy user just end up with a badly organised database instead of a badly organised directory structure?
mbluett wrote:All applications can easily make use of the same data. Currently, this is not easily possible.
Sure. I'd be able to download an "MS office" spreadsheet and open it with a bitmap editor, because the bitmap editor would be able to understand the spreadsheet and can easily make use of the same data.
Maybe you meant "All applications can easily make use of the same
metadata."? In this case for normal file systems everything already understands the metadata, it's just that the metadata is limited (e.g. doesn't include a description of each file in Japanese or a list of keywords in Turkish).
I'm also wondering...
Without any support for normal heirarchical file systems, how are you intending to support removable media (floppies, CD-ROMs, USB flash, removable hard drives, etc) - will it be possible to shift data between computers (e.g. from your OS to another OS, or from another OS to your OS)? What about things like NFS and/or the SMB/CIFS networking protocol; and archives (tar, zip), and standard programming tools (make, compilers, etc)?
Unfortunately, "compatible" beats "technically better" in the marketplace; where "compatible" means compatible with the stuff people have invested their time and money in, including training (for normal users, system administrators, etc), purchased applications, software developed specifically for a company, etc. Most users aren't willing to throw away the time and money they've invested in current technologies. OS's like OS X and Linux are different to Windows so they have a hard time, but they're also very similar to Windows in many ways (same desktop metaphor, similar file system principles).
mbluett wrote:Brendan wrote:To estimate how useful metadata is, see how many attempts it takes you to find a query for Google that finds less than 20 files including at least 5 (of the 7) Pentium 4 errata documents.
I tried about 20 queries before I succeeded, but I also spent 10 minutes looking through Google's documentation (advanced search and advanced operators) to narrow down the list. After you've tried, here's my best attempt:
Google...
It takes me 5 mouse clicks to find "info/cpu/intel/pent4/errata/24919966.pdf" from a blank desktop. Unless Mbluett's SQL searches can improve that, then it's not making the OS easier to use for me, and probably not making the OS easier to use for a lot of other people too.
There are a number of issues here:
1. The fact that you had to hunt through various documents just means that Google does not search inside those
documents for you. That problem is not an issue with an O/S database search.
Google does search inside the documents (more correctly, they index the contents of everything inside the document and then search the resulting index).
mbluett wrote:2. Another problem in making comparisons to Google searches is that your search results greatly depend on the websites
ability to organize information so that the public can find what they are looking for. There is a HUGE amount of
inconsistency in this department. As well, there is a HUGE amount of duplication of the same articles or articles that
make reference to the article you are looking for.
Note really - as long as Google's spider can find it Google can index it, and it doesn't matter much where it is.
Google tracks some specific metadata for each document, including title, contents, keywords, file type, location and date. Google also supports relatively complex queries. The idea was to use this metadata to create a query that finds a specific file, in the same way that a user (and applications) on your OS would need to create a query that finds a specific file from similar metadata.
mbluett wrote: These issues can be overcome in a local database.
That's nice, but how? Are you going to have an "I'm looking for this data" field, so that I only need to click a "find all the data I'm looking for" button and don't need to find ways to construct optimal queries?
mbluett wrote:When you do this from your desktop, what are you searching for? The same thing as you searched for on Google? If so, and you happen to have this document socked away on your HD, then comparing the Google search to a search from your desktop is a little unfair, don't you think?
I normally don't need to search anything because I keep my files in a logically structured heirarchical file system. I chose Pentium 4 Errata as an example to compare the user friendliness between searches and logically structured heirarchical file systems. Using Google search is unfair, because Google does a lot of work to create and maintain their metadata, while for my desktop there is no metadata (even if the OS supported it properly I'd be too lazy to waste time maintaining the metadata, so the thousands of files I've collected/downloaded have would no metadata at all).
Apart from that, IMHO using Google's metadata and thousands of servers to search a large number of files is a fair comparison to using a small number of computers (one) to search a small number of files. Although I should point out that by restricting my search to files that are within the "intel.com" domain name, I did effectively limit the scope of the search to a few computers (it wasn't a search of the entire internet, only Intel's web site).
mbluett wrote:1. Whole sentences that occur in multiple documents can be reduced to one copy.
You can't do that with the current file structure (at least not easily).
One advantage to this is it would reduce the amount of time to accomplish a search. Another is that it is very fast to
present a list of documents this sentence occurs in. The amount of time to do this with the conventional file system
is not near as fast. I realize this time difference would be small given the fast HDs these days; however, instant list
production is much better than 10 sec (or longer) list production.
How exactly do you plan on implementing that?
For example, are you planning on having some sort of dictionary of words/sentances, where the metadata for every file contains a bitfield to indicate which words/sentances are present in the file? This is the best method I can think of (in terms of per file overhead). In this case, if there's ten thousand words/sentances in the dictionary it'll cost you 1250 bytes for the bitfield per file. According to "find * | wc -l " I've got 398983 files on this computer (but again, I'm unusual because I only use this computer for OS development and another computer contains all my other files - most home users tend to use one computer for everything). That works out to 498728750 bytes (or 476 MiB) of metadata just for the bitfields alone.
Based on this, what made you think all of your metadata (and everything else - the indexes, the dictionary, etc) will actually fit in memory? More specifically, what makes you think you won't need to read several GiB of metadata from disk for every search (every time any software attempts to open any file); and what makes you think this won't take longer than (for e.g.) me searching the contents of every file within my "info/cpu/intel" directory with something like "grep"?
mbluett wrote:2. The organizing is done by the database, not so much by the user.
How exactly do you plan on implementing that?
For example, I go on the internet and download a file called "latest.jpg" and your magic elves automatically decide that this data should be associated with the keywords "Ubuntu" and "screenshot"?
mbluett wrote:These are advantages and I am sure, in time, I could come up with more.
There's probably disadvantages too - usability problems and practical/implementation problems. Unfortunately, none of us (including you, I assume) have attempted to implement something like this yet, and therefore none of us are fully aware of those disadvantages yet. It's much easier to spot the disadvantages in normal filesystems because everyone's been using them for years. AFAIK the only people who have attempted to implement something like this is Microsoft (WinFS), but they failed - maybe they didn't have the resources needed to implement it properly, and gave up because they don't want their users to have better features...
With this in mind, have you considered implementing a prototype? For example, you could construct a utility that converts a file system into a database (e.g. implement code to insert a file into your database, as if it was downloaded from the internet or something), and then you could attempt to search for data within this database. It'd give you the chance to see how large the metadata becomes, to see how fast searches can be done, to test ideas, etc; and it'd be much quicker to do than writing an OS first; and it wouldn't be wasted time because things like this tend to be re-implemented several times before they work how you want them to work anyway.
Cheers,
Brendan