Why I don't like files

Brendan · Post by **Brendan** » Fri Mar 09, 2012 5:42 pm

Hi,

Essentially, what (I think) you're suggesting is to replace files with objects; so that applications don't need to understand the file's format and only need to understand the class' methods.

The problem with this approach is that you can't call a method that doesn't exist. For example, if your kernel's "image" class only has a "resize()" method and your application wants to rotate the image, then you're screwed. The kernel would have to have a class for every type of file, and each of these classes would need a method for every possible operation that anyone could ever want.

This would make applications much simpler. However, the kernel would need to be massive bloated mess full of functionality that someone might want but most people never actually use. For an example, for image files alone, what methods would your kernel need? The basic things (display, crop, resize, etc) probably aren't too bad, but all the various transformations (blur, sharpen, emboss, etc) are stretching it, and then there's there's things like OCR.

Then, on top of this you add single-level store; so now you've a class for every type of file, a method for every possible operation, plus every scrap of code and data mapped into the virtual address space somewhere; all of which is "just in case someone might want it one day" (and 95% of it would be unused 95% of the time). My prediction is that you're going to have to implement it for true 64-bit CPUs, because the 48-bit virtual addresses that 80x86 supports in long mode isn't going to be enough to cope with the overwhelming quantities of bloat.

Cheers,

Brendan

Gigasoft · Post by **Gigasoft** » Fri Mar 09, 2012 6:36 pm

This is actually somewhat similar to what I have been planning.

The problem with traditional operating systems is that every application is responsible for providing menus for loading and saving, asking the user which file to open, figuring out the file type, dealing with filesystem paths, and handling the contents of files. This has the result that different vendors' applications vary in the conditions under which they will function. For example, one video editing application might use DirectShow filters to load videos, while another might use Video for Windows codecs, and a third can only load a fixed set of formats that the developer chose to implement. Some applications will refuse to load and save files to a deeply nested location, and others will not be able to load a file whose name contains foreign letters.

My aim is to standardize common tasks so that applications will not implement anything unrelated to what these applications actually do. Most importantly, they will not be responsible for passing information between the user and the operating system. Applications will therefore not need to have access to everything the user has access to. Instead, they will access files and other resources only by the explicit direction of the user using well defined operating system interfaces. Various file formats are handled using interfaces registered with specific file extensions or other methods of naming file types. By knowing which format handlers are registered for a named file format, and which applications are registered as accepting that interface type, the system knows which applications can be used to perform an action on a certain file, and can present those applications as menu choices when that file is selected. Other data sources besides files are possible too, of course, as long as they provide a mechanism for presenting the appropriate interface to an application. For example, an image editing application might be a data source for another application that uses images. To the image editing application, it doesn't matter whether it is saving an image file or passing the image to another application, as this is handled with no involvement on part of the application developer. These data sources are a bit like the OPs "storage nodes".

For applications that operate on several data sources or outputs at once, a special UI element called a "connector" will be provided. The user drags data sources to input connectors to select the input for the application. Permanent connections to named items can also be made, for applications that use the same data sources repeatedly. These connections are then stored in the user's profile. Distinct data objects can also refer to each other. In this case, they are either collected in a "composite file" and refer to each other or they refer to a path relative to an external location. Those referring to external sources have an associated configuration record storing the locations of external resources on the user's computer in the same manner as for an application that has a permanent data connection.

Stored data (apart from that which originates from another operating system) will always have an unambigously identified format type, using a combination of vendor and product ID, thereby eliminating the name collisions inherent in the standard 3-letter extension scheme. This also helps the operating system to automatically download and install the correct software.

However, no matter what kind of solution one ends up with, although it may be complicated under the hood, it's crucial that one makes it as easy as possible for the application developer to implement things in the correct manner according to one's design. If it's easier to program things in the same way as they've done for the past 10 years, chances are they'll do just that.

Gigasoft · Post by **Gigasoft** » Fri Mar 09, 2012 6:55 pm

Brendan wrote:The problem with this approach is that you can't call a method that doesn't exist. For example, if your kernel's "image" class only has a "resize()" method and your application wants to rotate the image, then you're screwed. The kernel would have to have a class for every type of file, and each of these classes would need a method for every possible operation that anyone could ever want.

This would make applications much simpler. However, the kernel would need to be massive bloated mess full of functionality that someone might want but most people never actually use. For an example, for image files alone, what methods would your kernel need? The basic things (display, crop, resize, etc) probably aren't too bad, but all the various transformations (blur, sharpen, emboss, etc) are stretching it, and then there's there's things like OCR.

I doubt that this is what the OP intended. An "image" class would typically contain functions to access the image data, whereas the application would be responsible for actually manipulating it. If the system can load any type of image file format for which a handler is installed, the mission is accomplished. For shell script usage, having a few toy commands such as "resize" might not be such a bad idea, though.

AnoHito · Post by **AnoHito** » Sat Mar 10, 2012 9:59 am

berkus wrote:Didn't read all the comments, only the OPs post so pardon if I repeat someone else's words.

1) You seem to favor passive objects, which is nice and dandy, but requires very fine grained locking to be able to scale in the SMP case.
2) You seem to favor object inheritance instead of interfaces which is a not scalable feature as well.
3) When you have interfaces implemented by components you don't really bother about file formats and storage anymore. Once your component is able to support some basic interfaces (like Image related ones), applications in general couldn't care less what format it uses in the back end, and whether it's backed by files, storage nodes, hyperwave clusters or proton clouds.

1) What do you mean by SMP? Or passive objects for that matter.
2) I wasn't aware inheritance vs. interfaces was a choice. Or that it wasn't scalable. Care to elaborate?
3) Pretty much, yes.

berkus wrote:This is called Single Address Space. And it has its own limitations, of course. But I also see a lot of benefits in using it.

Ok, good term to know I guess. I think the biggest drawback with using single address space is running out of virtual addresses. On 32-bit systems that's more or less a deal breaker, but on 64-bit systems, even though you can't really use the whole 64-bit virtual address space, it's not really an issue. And on my OS, there are several elements of the design that do demand a single address space architecture. Having a simple mechanism for sharing nodes across memory spaces, for example, is one my design goals. That's another aspect of my design that proved to be much more difficult to get down than I ever could have anticipated, but I have a pretty good idea of how I'm going to approach it now.

Hey look, I found a paper from 20 years ago suggesting that it might not be a such a bad idea to switch to single address space for 64-bit OS's. I might want to read that some time.

Gigasoft wrote:This is actually somewhat similar to what I have been planning.

The problem with traditional operating systems is that every application is responsible for providing menus for loading and saving, asking the user which file to open, figuring out the file type, dealing with filesystem paths, and handling the contents of files. This has the result that different vendors' applications vary in the conditions under which they will function. For example, one video editing application might use DirectShow filters to load videos, while another might use Video for Windows codecs, and a third can only load a fixed set of formats that the developer chose to implement. Some applications will refuse to load and save files to a deeply nested location, and others will not be able to load a file whose name contains foreign letters.

This is what I'm saying. Seriously, this is nuts. Why should things be this inconstant and convoluted? We pretty obviously need something better than the status quo here.

Gigasoft wrote:However, no matter what kind of solution one ends up with, although it may be complicated under the hood, it's crucial that one makes it as easy as possible for the application developer to implement things in the correct manner according to one's design. If it's easier to program things in the same way as they've done for the past 10 years, chances are they'll do just that.

My thoughts exactly. In order to get people to adapt something, it has to make their lives easier. And not just a little easier either. To get people to give up decades of established software, you have to do a lot better. I strongly think I'm onto something in that respect. I think I'm getting to the point where I could say it would take 1/10 as much code on average to program something in my OS as it would in C. And the resulting code would be simpler, more reusable, and more maintainable by a significant margin.

Gigasoft wrote:I doubt that this is what the OP intended. An "image" class would typically contain functions to access the image data, whereas the application would be responsible for actually manipulating it. If the system can load any type of image file format for which a handler is installed, the mission is accomplished. For shell script usage, having a few toy commands such as "resize" might not be such a bad idea, though.

Basically, yea. Having a resize command baked in is a convenience thing, and I wouldn't go overboard with stuff like that. What I would actually do in practice for this kind of thing is to have an Image::Filter module that contained classes with typical image manipulation capabilities like resize or blur. Only the really commonly used stuff would go into methods of the image class, for the sake of convenience.

I should also mention that my OS does not really favor implementing functionality in applications. What I mean by that is, when you want to write code that actually does stuff, you have to fit it into the OS's class hierarchy. In my OS an application is nothing but another interface to functional code. In fact, all "applications" in my system must by subclasses of a class in the Interface module. Interfaces are intended to get stuff done by using code that exists elsewhere in the system. In theory, if you can do it in an application, you should be able to do it in the shell, or any other component in the system. You never write code just for one application, you write it for the system itself.

AnoHito · Post by **AnoHito** » Sat Mar 10, 2012 1:29 pm

berkus wrote:The memory spaces that may run zero or more threads in them are called passive objects (in contrast to active objects, which create and maintain their own threads in order to service clients). SMP is short for Symmetric MultiProcessing. I hope you're intelligent enough to realize the need for synchronization.

Don't confuse ignorance with stupidity.

I'm familiar with the concept, I just didn't recognize the acronym. But even then, I'm still not sure what you mean by saying "requires very fine grained locking to be able to scale". I'm not really doing anything that much different from existing OS's, which have mostly the same challenges when it comes to multi-threading. I'm not doing crazy stuff like allowing threads to hop memory spaces. Actually I considered that, but I couldn't even think of a reason it was a good idea to try it. On my system when you create a thread, you have to specify a memory space, and if none is specified, the memory space of the current thread is used. The thread is then locked to that memory space, and there is no way to change it. I think that's mostly the same as in current OS's, except in my case you could launch a thread into a foreign memory space if you had a valid reference to it.

berkus wrote:A component may implement many interfaces without ever being a subclass of any base class, thus no inheritance. In OS design this is a more flexible and scalable choice. Inheritance has many implementation related issues, that take some effort to implement in flexible and language-agnostic way.

Yea, about that... I'm already writing an OS that is heavily object oriented. I can't call it a pure object oriented OS, because it still supports rudimentary data types, but I actually intend to handle those through some compiler magic that allows you to pretend they are objects most of the time. The bottom line is, I'm already writing something I know won't allow me to support any existing software, and as such my interest in supported a variety of languages is limited. Right now the plan is, a Ruby derivative for compiled code (Ruby being the best pure object oriented language available, in my opinion of course), and a Ruby shell. I can worry about supporting other languages if and when the OS actually becomes useful.

berkus wrote:I'm using a hybrid model where SAS is per-node only and cross-node addressing is translated.

Per-node SAS? I'm not exactly sure how you define a node in your OS, so I'm not sure exactly what you mean by this. If you mean having a single memory space that is only consistent across threads that are mapped to it, that wouldn't really be any different from current OS's.

aod · Post by **aod** » Mon Mar 12, 2012 12:30 pm

Is this idea something like good, old datatypes from Amiga OS?
http://en.wikipedia.org/wiki/Amiga_supp ... #Datatypes

Solar · Post by **Solar** » Tue Mar 13, 2012 3:10 am

Somewhat similar, yes.

Essentially, AmigaOS Datatypes enabled applications to implement "open file through datatype" or "save file through datatype", and if you had a GIF datatype, all those applications could open / save GIF images. You add a datatype for PNG, and all applications could open / save those, too.

Radian · Post by **Radian** » Tue Mar 13, 2012 4:58 am

AnoHito wrote:It was fairly obvious I needed to have some way to store data from a class instance to a disk that wouldn't also involve storing every other class instance it held a reference too. But no matter how hard I tried, I couldn't think of anything that was actually workable, let alone something that would work better than just handling data stored on a device as a special case.

There is similar problem in the database world, it is how to actually store class instance and the like to disk. The dominance in the database world are Relational database (because it is worked good). In relational theory, every things is called Entities, an Entity then broke down into Files (yes, it could be, but not necessary, the same as OS's file), the file then organized into records and fields. The relational theory fits nicely with the conventional file system.

Then come the OO fan guys who tried to treat things as classes and objects. There is nothing wrong with OO. Up to the time when they need to store their created classes and objects to disk for future uses. What look like the class and object on the disk really is, that's their problem. Many of them ended up with just ORM, i.e. Object-Relational Mapper.
AFAIK, there is no Object-Oriented Database yet. If would one day it came to existence, maybe you could learn from it.

Jezze · Post by **Jezze** » Tue Mar 13, 2012 8:33 am

You mean a mainstream one I hope? There are a bunch of oo databases out there.

Anyway I like this thread. I can't really imagine how it would work under the hood but the concept is interesting.

Would a program execute by running something like:

Code: Select all

$ myProgram.run();

as well?

brain · Post by **brain** » Tue Mar 13, 2012 4:49 pm

This would actually make sense if methods were commands, and the object to apply the method to was its parameter, obj-C style. so for example "run myprogram" not myprogram.run

AnoHito · Post by **AnoHito** » Tue Mar 13, 2012 7:54 pm

Jezze wrote:Would a program execute by running something like:
Code: Select all
$ myProgram.run();
as well?

Originally my thinking was to do things that way, but later on I realized that it was a poor design choice. You always want the method you are using to be attached to the object that is doing the thing you are asking it to do. So for example movie.play() is a bad idea, because the movie doesn't play itself. You need a separate player class for that. Likewise, myProgram.run() should be something more like Shell.run(myProgram), since the shell is what's actually doing the launching. This design is really critical for making sure class interdependency moves down a hierarchy. Otherwise you start having issues where you develop insane class dependency structures, which can be problematic in an object oriented OS.

Solar · Post by **Solar** » Wed Mar 14, 2012 7:42 am

AnoHito wrote:This design is really critical for making sure class interdependency moves down a hierarchy. Otherwise you start having issues where you develop insane class dependency structures, which can be problematic in an object oriented OS.

I assume that you are familiar with writings like Inheritance Tax by Jon Skeet, talking about the problems you can run into when you want to solve everything via OO?

Radian · Post by **Radian** » Wed Mar 14, 2012 9:19 pm

You mean a mainstream one I hope? There are a bunch of oo databases out there.

Yep, there is oodbms. Though their use case is still limited.

That's database. It is interesting to see OO concepts are attempted to brought into the OS design, storage design in this case.

AnoHito · Post by **AnoHito** » Thu Mar 15, 2012 6:04 pm

So on the whole inheritance vs interface thing, without turning this post into another lengthy design exposition, I'd just like to say that for my OS, inheritance just makes sense. I even went so far as to make a rare break from Ruby's paradigm and allow for multiple inheritance, because I think there are cases where it's the best solution for the problem. That being said, I know there are issues with relying too heaving on inheritance, which is why my OS proposes a general design philosophy of keeping classes extremely simple and modular. Generally speaking, I think that if you do things right, inheritance should usually work. But I'm not precluding the possibility of using interfaces where it makes sense. It's just that for the things that I've designed so far, inheritance has always seemed like a good solution.

AnoHito · Post by **AnoHito** » Fri Mar 16, 2012 6:34 pm

And I like mix-ins. I fully intend to implement them in my OS. I just think inheritance has it's place too. I may eventually decide to drop multiple inheritance if it creates too many practical issues, but I'm still not convinced it will if used with restraint. Besides, mix-ins don't solve every problem with multiple inheritance. You can still have two methods named the same thing in two different mix-ins.

OSDev.org

Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files

Re: Why I don't like files