Page 4 of 4

Re: Simplest Possible O/S Design

Posted: Sat Nov 29, 2008 5:45 pm
by mbluett
Colonel Kernel wrote:
mbluett wrote:You could be right. I have no knowledge of how they run their servers. How did you come across the information that told you this is the way Google searches work?
I thought it was common knowledge. I remember hearing about their data center's power requirements on the news. Where exactly have you been for the past 12 years...?
I guess you thought wrong. I am sure I could come up with something in the computing field you have never heard of either. In other words I think your comment is a little trite and doesn't accomplish anything positive.
Colonel Kernel wrote:
mbluett wrote:I am aware of the existing fields where metadata can be added under XP. However, how does one add new fields? I have seen that you can add pre-defined field types to a Word Document, for example. But tell me how you would do this with a URL file. If you right-click on the file I cannot see any place where you can add ANY metadata.
It isn't exposed in the GUI. NTFS supports "streams" which allow you to programmatically read & write arbitrary data into a part of the file that is normally hidden from file I/O operations. You have to know that they're there to make use of them.

BTW, having a database does not magically make adding new fields any easier. Adding a new column to a table is an extremely disruptive change, requiring lots of records to be re-organized on disk and re-indexed.
I am well aware that programming is a solution for this. However, that route is much more tedious and time consuming.

Actually adding a new field can be made to happen very easily. I can do this in approximately 10 secs. Most of that time is consumed in thinking up an appropriate name. For the actual operation to complete on a large database might take some time, but this would not be evident to the user.

You are correct that multiple records need to be changed, but not necessarily re-indexed. Even though this might be necessary, where is the problem?
Colonel Kernel wrote:
mbluett wrote:Databases have to be organized for them to work efficiently. One of the standard design methods in databases is to reduce redundancy as much as possible: It's called "normalizing" a database.
Have you ever tried designing a normalized schema that will take into account every single type of data a user could possibly, ever, be interested in using, including those that haven't even been invented yet?
That is not even an issue from my point of view. The point is that the schema would be layed out by professionals not novices. And all of the required data types would already been known about and could be accounted for at development time. There is no requirement to worry about normalizing data that hasn't been invented yet.

I'm sorry, this makes no sense to me.

Re: Simplest Possible O/S Design

Posted: Sat Nov 29, 2008 5:53 pm
by mbluett
M-Saunders wrote:
mbluett wrote:Yes, it is a serious reply. Regardless of what I think, I again ask you, why do they spend so much time and money degrading MS in TV adds?
Umm, because the company wants to keep improving its marketshare instead of standing still? Apple is aware of the widespread dissatisfaction with Vista and is capitalising on that, like any company would in a similar position. You may not like the ads, but they know they're on to something and have an effective hook with this approach.

It doesn't mean they're not doing well, as your original post seemed to imply. They're doing extremely well, and also know how to capitalise on a slip-up by a competitor. Those two things are not mutually exclusive.

M
If they were doing really well, they would be happy with their present market share. Obviously, they are not happy with what they have.

In any case, arguing over semantics gets neither one of us anywhere.

Re: Simplest Possible O/S Design

Posted: Sat Nov 29, 2008 6:03 pm
by M-Saunders
mbluett wrote:If they were doing really well, they would be happy with their present market share.
And just sit still? Stop trying to win new customers? You don't appear to understand how companies work. Companies have to make progress for their shareholders. They can't sit around and say "Hey, everything's great, so let's just chill!"
mbluett wrote:Obviously, they are not happy with what they have.
It has nothing to do with being "happy". You're trying to ascribe human characteristics to a company which has a job to make money for its shareholders. A company needs to grow otherwise it will become stagnant.

I'm not sure why I'm even debating this point, as the market share stats show that Apple is enjoying incredible growth and is extremely successful. You clearly haven't looked at the stats, and assume that a marketing campaign that capitalises on Vista's shortcomings is some kind of indication that the company is not performing well.

With all respect, you really need to look at the statistics and understand how businesses operate before making assumptions :-)

M

Re: Simplest Possible O/S Design

Posted: Sat Nov 29, 2008 9:41 pm
by Colonel Kernel
mbluett wrote:1. The fact that you had to hunt through various documents just means that Google does not search inside those documents for you. That problem is not an issue with an O/S database search.
Actually, Google and other search engines do index the content of documents, including things like PDF and PostScript files.
mbluett wrote:When you do this from your desktop, what are you searching for? The same thing as you searched for on Google? If so, and you happen to have this document socked away on your HD, then comparing the Google search to a search from your desktop is a little unfair, don't you think?
Agreed. :)
mbluett wrote:1. Whole sentences that occur in multiple documents can be reduced to one copy.

You can't do that with the current file structure (at least not easily).

One advantage to this is it would reduce the amount of time to accomplish a search. Another is that it is very fast to present a list of documents this sentence occurs in. The amount of time to do this with the conventional file system is not near as fast. I realize this time difference would be small given the fast HDs these days; however, instant list production is much better than 10 sec (or longer) list production.
There is no speed advantage to sharing partial bits of content this way. If "The quick brown fox jumped over the lazy dog" exists in several files on my HD, then it will be in the index several times, and searching for this sentence will yield multiple results in about the same amount of time as if the sentence were in only one file. You'd be saving some disk space, but these days disk space is cheap, so why bother?
mbluett wrote:2. The organizing is done by the database, not so much by the user.

Today you have to organize your files manually. Most of the time this just never occurs for the general public. This is a significant issue and that is why there are a fair number of applications that have been written to provide a way of viewing files, that are spread all over the place, in one concise view. The problem is the MS O/S's don't do a very good job of this.
Yes, users need help with this. Databases do not magically help though. The contents of the database must still follow some sort of schema. The schema could be implemented just as easily on top of the file system (see my iPhoto example in an earlier post).

You are confusing user-visible abstractions with lower-level system abstractions. Databases are not magic.
mbluett wrote:3. Text within files can be indexed.

Current O/S's don't do this
Oh yes they do! Please read the links in my previous post regarding Spotlight, Beagle, Windows Desktop Search, and Google Desktop before you continue making false claims.
mbluett wrote:If Apple was doing really well, why do they spend so much time and money degrading MS in TV adds?
So that they can do even better by convincing more Windows users to switch. Tim Cook, Apple COO, admitted this in a recent press briefing (skip to 3:36). It's a clever strategy on their part.
mbluett wrote:I agree with you. As a result, we can never expect Apple to supercede MS unless they can convince the market place that their hardware is the best. Despite amazing innovation, they still haven't succeeded in doing that. I can only assume it has something to do with the price. Why do people keep buying Windows machines?
I think you're right, price is a big issue for a lot of people. There are a lot of really cheap PCs out there, while Macs seem to never go on sale. There is also a lot of momentum behind Windows -- it's what a lot of people are familiar with, and not everybody is willing to learn to use a new system. I also think the lack of decent games for Mac OS X is holding the platform back.
mbluett wrote:Actually adding a new field can be made to happen very easily. I can do this in approximately 10 secs. Most of that time is consumed in thinking up an appropriate name. For the actual operation to complete on a large database might take some time, but this would not be evident to the user.
Why wouldn't it be evident? In a database, when you alter a table, it's locked until the alteration is complete. The user would be unable to access huge swaths of their data until several gigabytes of it had been shuffled around to make room for new fields!

On the other hand, extending the schema for the user's data isn't a very common use case. I don't see how file systems could be any worse either way.
mbluett wrote:That is not even an issue from my point of view. The point is that the schema would be layed out by professionals not novices. And all of the required data types would already been known about and could be accounted for at development time. There is no requirement to worry about normalizing data that hasn't been invented yet.

I'm sorry, this makes no sense to me.
If you're assuming a fixed schema, why talk about extending it by adding fields earlier? That's a contradiction.

It's fine to design and deploy one database schema that you think will work for everyone, but what if it doesn't? It doesn't matter how "professional" the people designing the schema are -- they cannot foresee every possible need. This limits the flexibility of the system in the same way that forbidding third party applications does.

The first rule of software requirements is: Requirements change over time. Failing to design for change is a huge source of software defects and maintenance headaches.

Re: Simplest Possible O/S Design

Posted: Sat Nov 29, 2008 10:44 pm
by Brendan
Hi,
mbluett wrote:
I'm the opposite - a hierarchical filesystem helps me find exactly what I'm after.
Again, I'm not intending this O/S for people like yourself. I am intending it for the general public who do not know how to do the things you do.
Ok, so how many people (in your intended market) are like me and how many people aren't? If it helps some people but annoys others, are you helping more people than you annoy? Is there any reason you couldn't provide both methods (SQL style query and hierarchical tree), and annoy nobody?

I'd assume that the majority of company/office/business computers do use a deliberately organised hierarchical tree (e.g. something setup but some sort of administrator, involving shared directories used by many client machines). I'd also assume that about 25% of home users are like me, but it's hard to guess without checking a decent sample of machines (as opposed to checking the misleading sample of machines who's owners needed help).
mbluett wrote:You could be right. I have no knowledge of how they run their servers. How did you come across the information that told you this is the way Google searches work?
I read an article about it a few years ago. The "over 450000 servers" part came from Wikipedia. Also note that not all computers would be used for each query - for example (numbers made up here) they might have 45 sites with 10000 servers per site, where each query goes to 1000 servers (and where this set of 1000 servers is duplicated 450 times to increase the ability to handle many queries at the same time).

mbluett wrote:I am aware of the existing fields where metadata can be added under XP. However, how does one add new fields? I have seen that you can add pre-defined field types to a Word Document, for example. But tell me how you would do this with a URL file. If you right-click on the file I cannot see any place where you can add ANY metadata.
A more interesting question would be: if OSs like Windows and Linux support the underlying functionality, why hasn't anybody bothered to add user controls to it (like a metadata search tool and indexing, and dialog boxes to set/change the metadata)? Regardless of what you think of Microsoft's developers and/or Linux developers, if it was a potentially very useful feature then surely it'd exist everywhere already. Is it possible that all of these Microsoft/Linux developers don't think it's worthwhile (except perhaps in specific situations, like indexing a user's photo or music collection)?
mbluett wrote:In a lot of the general public's machines I have seen, they have files scattered all over the place. Sometimes there are copies of copies of copies. Sometimes this has occurred by accident. Once this occurs they have no idea if they can delete any of the files without losing something.

This kind of condition can be prevented when using a database as there is no need to make copies of anything unless it is being saved for backup purposes.
Prevented, or just changed? For your database to remain useful you need to rely on users to make sure the metadata is set correctly (e.g. when they download files without any metadata the user will need to waste a few minutes writing a description, setting keywords, etc). Would a sloppy user just end up with a badly organised database instead of a badly organised directory structure?
mbluett wrote:All applications can easily make use of the same data. Currently, this is not easily possible.
Sure. I'd be able to download an "MS office" spreadsheet and open it with a bitmap editor, because the bitmap editor would be able to understand the spreadsheet and can easily make use of the same data.

Maybe you meant "All applications can easily make use of the same metadata."? In this case for normal file systems everything already understands the metadata, it's just that the metadata is limited (e.g. doesn't include a description of each file in Japanese or a list of keywords in Turkish).


I'm also wondering...

Without any support for normal heirarchical file systems, how are you intending to support removable media (floppies, CD-ROMs, USB flash, removable hard drives, etc) - will it be possible to shift data between computers (e.g. from your OS to another OS, or from another OS to your OS)? What about things like NFS and/or the SMB/CIFS networking protocol; and archives (tar, zip), and standard programming tools (make, compilers, etc)?

Unfortunately, "compatible" beats "technically better" in the marketplace; where "compatible" means compatible with the stuff people have invested their time and money in, including training (for normal users, system administrators, etc), purchased applications, software developed specifically for a company, etc. Most users aren't willing to throw away the time and money they've invested in current technologies. OS's like OS X and Linux are different to Windows so they have a hard time, but they're also very similar to Windows in many ways (same desktop metaphor, similar file system principles).
mbluett wrote:
Brendan wrote:To estimate how useful metadata is, see how many attempts it takes you to find a query for Google that finds less than 20 files including at least 5 (of the 7) Pentium 4 errata documents.

I tried about 20 queries before I succeeded, but I also spent 10 minutes looking through Google's documentation (advanced search and advanced operators) to narrow down the list. After you've tried, here's my best attempt: Google...

It takes me 5 mouse clicks to find "info/cpu/intel/pent4/errata/24919966.pdf" from a blank desktop. Unless Mbluett's SQL searches can improve that, then it's not making the OS easier to use for me, and probably not making the OS easier to use for a lot of other people too.
There are a number of issues here:

1. The fact that you had to hunt through various documents just means that Google does not search inside those
documents for you. That problem is not an issue with an O/S database search.
Google does search inside the documents (more correctly, they index the contents of everything inside the document and then search the resulting index).
mbluett wrote:2. Another problem in making comparisons to Google searches is that your search results greatly depend on the websites
ability to organize information so that the public can find what they are looking for. There is a HUGE amount of
inconsistency in this department. As well, there is a HUGE amount of duplication of the same articles or articles that
make reference to the article you are looking for.
Note really - as long as Google's spider can find it Google can index it, and it doesn't matter much where it is.

Google tracks some specific metadata for each document, including title, contents, keywords, file type, location and date. Google also supports relatively complex queries. The idea was to use this metadata to create a query that finds a specific file, in the same way that a user (and applications) on your OS would need to create a query that finds a specific file from similar metadata.
mbluett wrote: These issues can be overcome in a local database.
That's nice, but how? Are you going to have an "I'm looking for this data" field, so that I only need to click a "find all the data I'm looking for" button and don't need to find ways to construct optimal queries?
mbluett wrote:When you do this from your desktop, what are you searching for? The same thing as you searched for on Google? If so, and you happen to have this document socked away on your HD, then comparing the Google search to a search from your desktop is a little unfair, don't you think?
I normally don't need to search anything because I keep my files in a logically structured heirarchical file system. I chose Pentium 4 Errata as an example to compare the user friendliness between searches and logically structured heirarchical file systems. Using Google search is unfair, because Google does a lot of work to create and maintain their metadata, while for my desktop there is no metadata (even if the OS supported it properly I'd be too lazy to waste time maintaining the metadata, so the thousands of files I've collected/downloaded have would no metadata at all).

Apart from that, IMHO using Google's metadata and thousands of servers to search a large number of files is a fair comparison to using a small number of computers (one) to search a small number of files. Although I should point out that by restricting my search to files that are within the "intel.com" domain name, I did effectively limit the scope of the search to a few computers (it wasn't a search of the entire internet, only Intel's web site).
mbluett wrote:1. Whole sentences that occur in multiple documents can be reduced to one copy.

You can't do that with the current file structure (at least not easily).

One advantage to this is it would reduce the amount of time to accomplish a search. Another is that it is very fast to
present a list of documents this sentence occurs in. The amount of time to do this with the conventional file system
is not near as fast. I realize this time difference would be small given the fast HDs these days; however, instant list
production is much better than 10 sec (or longer) list production.
How exactly do you plan on implementing that?

For example, are you planning on having some sort of dictionary of words/sentances, where the metadata for every file contains a bitfield to indicate which words/sentances are present in the file? This is the best method I can think of (in terms of per file overhead). In this case, if there's ten thousand words/sentances in the dictionary it'll cost you 1250 bytes for the bitfield per file. According to "find * | wc -l " I've got 398983 files on this computer (but again, I'm unusual because I only use this computer for OS development and another computer contains all my other files - most home users tend to use one computer for everything). That works out to 498728750 bytes (or 476 MiB) of metadata just for the bitfields alone.

Based on this, what made you think all of your metadata (and everything else - the indexes, the dictionary, etc) will actually fit in memory? More specifically, what makes you think you won't need to read several GiB of metadata from disk for every search (every time any software attempts to open any file); and what makes you think this won't take longer than (for e.g.) me searching the contents of every file within my "info/cpu/intel" directory with something like "grep"?
mbluett wrote:2. The organizing is done by the database, not so much by the user.
How exactly do you plan on implementing that?

For example, I go on the internet and download a file called "latest.jpg" and your magic elves automatically decide that this data should be associated with the keywords "Ubuntu" and "screenshot"?
mbluett wrote:These are advantages and I am sure, in time, I could come up with more.
There's probably disadvantages too - usability problems and practical/implementation problems. Unfortunately, none of us (including you, I assume) have attempted to implement something like this yet, and therefore none of us are fully aware of those disadvantages yet. It's much easier to spot the disadvantages in normal filesystems because everyone's been using them for years. AFAIK the only people who have attempted to implement something like this is Microsoft (WinFS), but they failed - maybe they didn't have the resources needed to implement it properly, and gave up because they don't want their users to have better features...

With this in mind, have you considered implementing a prototype? For example, you could construct a utility that converts a file system into a database (e.g. implement code to insert a file into your database, as if it was downloaded from the internet or something), and then you could attempt to search for data within this database. It'd give you the chance to see how large the metadata becomes, to see how fast searches can be done, to test ideas, etc; and it'd be much quicker to do than writing an OS first; and it wouldn't be wasted time because things like this tend to be re-implemented several times before they work how you want them to work anyway.


Cheers,

Brendan

Re: Simplest Possible O/S Design

Posted: Mon Dec 01, 2008 10:19 pm
by Love4Boobies
Brendan wrote:For example, I go on the internet and download a file called "latest.jpg" and your magic elves automatically decide that this data should be associated with the keywords "Ubuntu" and "screenshot"?

[...]

There's probably disadvantages too - usability problems and practical/implementation problems. Unfortunately, none of us (including you, I assume) have attempted to implement something like this yet, and therefore none of us are fully aware of those disadvantages yet. It's much easier to spot the disadvantages in normal filesystems because everyone's been using them for years. AFAIK the only people who have attempted to implement something like this is Microsoft (WinFS), but they failed - maybe they didn't have the resources needed to implement it properly, and gave up because they don't want their users to have better features...
WinFS is still a research project. It was supposed to be included in Vista, yet was considered to not be fast enough and require more working on. It can be downloaded and used however. The way it comes up with all the metadata is using stuff like the Windows Organizer (or Calendar or whatever it's called), e-mails, etc. For instance, if your organizer sais that on a specific date you were on some buisness trip and you upload some pictures that were taken at that date, WinFS will assume that they were from that specific buisness trip. Sure, it won't know for sure and you can tell it if they're something else, but that's the main idea. I'm not sure many people will be using this. I mean, I'm a Windows user myself and I'm not even sure how the organizer is called and I'm too lazy to reach for my mouse and check... Not to mention that many (many) users will have problems just when thinking that Microsoft's software is going through their e-mails, etc (even if it's not really a privacy thing, they won't know).

Re: Simplest Possible O/S Design

Posted: Tue Dec 02, 2008 4:10 pm
by Craze Frog
I don't know if the database stuff really is a good idea, but I can say for sure that the best (even if terrible) implementation of a database filesystem for casual end-users has not yet been made.

Try to think about what casual users do with their file systems. They write documents and spreadsheets, they write and receive email, they listen to music, watch movies and download porn. All of these things can be tagged with meaningful tags in less time than it would take to create meaningful folders for them with current graphical interfaces.

I can't see why selecting tags when things are saved (when the document is saved, select or type in the tags "work", "top priority", "hasselhoff" instead of navigating to and creating the correct directories) would be such an enormous hassle. In fact I'd say it'd be easier on the users if they could give their files arbitrary (and also predefined) tags in any order rather than having to cram the files into a hiererarchy. For emails tagging with type=email and sender and such can be done completely automatically. The same with music.

Then you could also access the files in any order. You can get all work hasselhoffs or all home hasselhoffs or just all hasselhoffs (work or not). This is possible today, but it's very slow, because file systems are optimized for hierarchial search instead of tag search.

IMHO, indexing and searching the actual contents of documents has little to do with whether you're using a database filesystem or not.

Re: Simplest Possible O/S Design

Posted: Wed Dec 03, 2008 2:15 am
by Love4Boobies
This concept has been tried before. It was even discussed here, but I do not agree with everyone there. Filesystems are not meant to be slow, they are designed to do just what they are needed to. Unfortunately, I don't really have time right now to go on about it but I wanted to give you that link. I'll be back, though.

Re: Simplest Possible O/S Design

Posted: Sun Feb 15, 2009 10:00 am
by demirtos
OK lets take a look in RTOS oses that are used in MCU.

mbluett wrote: 1. No memory segmentation !!!!
2. No paging ??
Moust of them have not even segment and paging registres and MMU
mbluett wrote: 3. No traditional filesystem (instead use a single file which is a database containing various entities)
These entities would not resemble the current files we are used to. However, the database would contain all the
descriptive information (and more) that these files currently contain.
There is no DMA features you can fogot about buffering and file system in MCU
mbluett wrote: 4. One large function library that all applications and the kernel use.
5. Library functions are loaded on demand and based on this system design would not necessarily have to be unloaded.
With everything loaded there is a very good chance it would not exhaust all available memory (with the exception of
machines that have under 1 Gig of Ram which this O/S would not support anyway).
6. No windowing GUI
9. Drivers should run at a level so as to never prevent the Kernel or a user application from running.The most that would
occur is access to a particular device (or set of devices would be compromised). If a failure of this type occurs, the
driver would automatically be reloaded. The assumption is that most driver problems would be of intermittent nature.
Hopefully, any bad problems would be caught while the O/S is in development/testing.
There is no binary executable files you can fogot about ELF and COFF formats all functions are linked staticly.

Таке а look at simple MCU OS and write you "Meta-data" system and this is all that you need.

Re: Simplest Possible O/S Design

Posted: Sun Feb 15, 2009 7:51 pm
by purage
3. No traditional filesystem (instead use a single file which is a database containing various entities)
These entities would not resemble the current files we are used to. However, the database would contain all the
descriptive information (and more) that these files currently contain.
You mean like the Windows NTFS Master File Table?