Page 1 of 3

Why I don't like files

Posted: Fri Mar 09, 2012 12:05 pm
by AnoHito
Last year, I made a few post on this forum under a different name. I switched names because I decided I might want to start using a name people can actually pronounce. :P And it matches the name I'm now using on my blog

The last time I posted here, I said I was ready to start looking for help developing my new operating system, which would be significantly different from anything anyone had seen before, and make major improvements on existing OS design. That didn't really happen. I was planning on releasing all the details of what I'd done up until that point, so other people could comment and contribute. But then I decided things needed a little more work, and a little later I decided things needed a lot more work. Then I got busy with other projects, had some health issues, and more or less got nothing done for a really long time. I'm still busy on other projects at the moment, at least one of which will be a cool proof of concept for some of my design theories. I'm not really sure when I'll have time to actually implement my OS on a real world platform, so I thought it might be fun just to talk about some ideas I'm working on.

One of those ideas, is to get rid of the concept of a file. At a certain level, I guess a file seems like a simple, logical solution to an obvious problem. You have data, you need to store it, you copy it to a stream, and you're done. Simple, elegant, functional... right? Yes and no. Like many simple ideas, files are not a concept that scales well. In a modern system, you will often have thousands of different file formats, and almost as many programs just to deal with them. Said programs may have limitations, bugs, inconsistencies, insane overhead, not have source code available, etc.

The biggest and most obvious problem with the way we currently handle files is format support. Let's you develop a cool new image format for people to use that provides better image quality at higher levels of compression. Even if the improvements you make are fairly obvious, you will have a difficult road ahead in terms of getting other applications to support it. The only programs initially capable of dealing with it will be the ones you write yourself, effectively making it nearly useless. This can be a seriously stifling factor for innovation. How often do people have cool ideas, but the barriers for implementing said ideas are just too high? We may never know until we remove those barriers.

The question then becomes just how exactly do we fix things? The problem as I see it is, interpreting and manipulating all stored data at the application level is a really lousy idea. It's just something that we're forced to do because all current operating systems don't do a very good just of dealing with code at a sub application level. If they do it at all. Code generally speaking, must exist inside an application to be used. The only exception is when operating systems allow you to load code from external modules. But that code is still ultimately run in a process, which is ultimately attached to a single executable file.

My operating system does not have processes, it has memory spaces. Those memory spaces can contain one or more threads, or none. All memory spaces can run any executable code that is currently loaded into the system, through the magic of fixed memory locations, although page based restrictions may be applied if desired. Given this model, the old rules no longer apply, and we can treat files a lot more like we treat an instance of a class. Let's look at how that works.

Instead of a file, we have a storage node. A storage node is a basic class that all classes that wish to store data on an external device must inherit from. I should point out, I was initially not a fan of even having a special class to deal with data on non-memory devices. I was convinced there was a good way to handle things that would allow you to map data from a class instance in memory to a device without having it inherit from a special class. But no matter what I did, I couldn't find a way to handle nested references to instances of other classes. It was fairly obvious I needed to have some way to store data from a class instance to a disk that wouldn't also involve storing every other class instance it held a reference too. But no matter how hard I tried, I couldn't think of anything that was actually workable, let alone something that would work better than just handling data stored on a device as a special case. And lord knows I tried. I scraped more ideas on that one problem than everything else about the OS put together. As it stands though, I think what I ended up with is still vastly superior that what existing OS's do.

A storage node is designed to be the singular interface to data stored in a particular format on the system. You can create a storage node with the following command:

node = Storage::Node.new()

The new() method has an optional parameter for a device. If no device is specified, the node will default to storing data in memory.

In order to access data in the node, you can call it's read or write method. For example:

Code: Select all

node.write(:test, "data")
node.read(:test) #returns "data"
You can also access data as a stream by calling:

Code: Select all

node.stream(:test) #returns a stream that the data can be read from
A storage node exists on only one device at a time, but it may be referenced in a path on other devices. For the sake of behaving more like you would expect from a traditional OS however, the default action when assigning a storage node to a path is to move it onto the same device as the device of the node to which you are assigning it. For example:

Code: Select all

disknode["path/to/node"] = node #node is now stored on the same device as disknode
If that isn't desirable, simply call:

Code: Select all

disknode.link("path/to/node", node) #a new reference to node was created in the specified path, which points to node on it's current device
Should the node's device be unavailable when an attempt is made to load it, an exception will be thrown. Should the device be available and should node not be found on it, the reference will be purged automatically.

That is the interface we provide to data on the disk, but we still have haven't really established how we actually handle things at a low level. The anwser is, it doesn't really matter. At the device level, the device gets to decide how it stores data. It should of course, avoid overhead to the greatest degree possible. You could, for example, figure out a scheme where storing individual strings of data in named locations wouldn't be such a grossly inefficient waste of space. Which I am of course aware that it is, but the idea is to provide an interface for the developer that will be as simple as possible to just get stuff implemented, and then work in optimizations in the margins. I don't want to get too deep into that particular aspect in this post, but it is something I've put some thought into.

When we go back to the earlier example of the developer with a new image format, we haven't really solved their problem yet. They can develop their format as a storage node instead of an application, which does make it easier to implement in other software, though no more so than a simple extension module would, should the application support such a feature. However, in my OS we can take things a step further. If we can have a storage node class, we can have an image node class.

When you get down to it, the only thing modern OS's are really missing in terms of not making out lives a living hell when dealing with different file formats, is a standard interface to basic data types. We already more or less know what an image is, and what we need to do in order to deal with it. The specifics of the format are what's in our way. What we need is a lowest common denominator class that every class on the system must inherit from to be considered an image. Here is more or less how things would work:

Code: Select all

class MyImageClass < Image::Node < Storage::Node

  def format(format)
    #return a string containing the image data in the format specified by "format". ":rgb" must be supported.
  end

  def width
    #return the image's width
  end

  def height
    #return the images height
  end

end
And now we have an image format that is at the very least readable, by every other class on the system. You can specifically code other classes that deal with the nuances of your format, but for the 99% of usage cases where that stuff isn't needed, you are covered.

What really baffles me, is that this is a really simple idea. I mean, it's so simple, I really can't figure out why the traditional file/application model has held up as long as it has. Sure you can add support for new formats relatively easily in open source software, but it's still one hell of a lot more work than this is. Any thoughts on this guys? I'm almost scared to take this approach for fear that if no one has tried it before, there might be a reason why.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 12:31 pm
by Yoda
IMHO, you may get rid of concept of file at application level, although it would be very difficult if we'll consider compatibility with all existing OSes and media. But on storage level it is almost impossible, keeping in mind interchange of information with the world.
I think that in OS it is realizable as a set of libraries, providing a set of common abstractions - picture, sound, etc.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 12:48 pm
by invalid
Could you explain more precisely the solution of "new format" problem?

Let's say I create a new image file format - which means that I write bytes in some order and then use some algorithm to de-code them into rgb/bitplane. How would a user receiving this file know the decoding algorithm? Do you want to attach some binaries or "uncompressed legacy" image to each single file?

Re: Why I don't like files

Posted: Fri Mar 09, 2012 12:51 pm
by FallenAvatar
ydoom wrote:Could you explain more precisely the solution of "new format" problem?

Let's say I create a new image file format - which means that I write bytes in some order and then use some algorithm to de-code them into rgb/bitplane. How would a user receiving this file know the decoding algorithm? Do you want to attach some binaries or "uncompressed legacy" image to each single file?
I think that the idea is you would "install" the format onto your computer once, and all applications would now be able to read it. Or in the example case, all applications dealing with images.

This is definitely an interesting idea, and treating nodes as files and vice verse for communication with "legacy" devices seems trivial. Hmm... I will think on this idea some and post back later.

- Monk

Re: Why I don't like files

Posted: Fri Mar 09, 2012 12:55 pm
by AnoHito
Yoda wrote:IMHO, you may get rid of concept of file at application level,
And that's really the most important place to get rid of it for all intents an purposes.
Yoda wrote:although it would be very difficult if we'll consider compatibility with all existing OSes and media.
In cases where backward compatibility is required, you could always write a wrapper class for an existing file format.
Yoda wrote:But on storage level it is almost impossible, keeping in mind interchange of information with the world.
Doesn't really matter since nothing exists anywhere in the OS specifications that is called a file. How a device stores data is up to the device. I don't see any major interchangeability issues because at the most basic level it's still just data stored at named locations. It's just as possible to shoehorn storage nodes into files with a compatibility layer as the other way around. The key question is, it the compatibility barrier justified by what you gain? In this case I strongly think it is.
Yoda wrote:I think that in OS it is realizable as a set of libraries, providing a set of common abstractions - picture, sound, etc.
A library is the wrong way to think of it. The image class for example, contains no actual code. It just acts as a standard interface for developers.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 1:11 pm
by bluemoon
AnoHito wrote:The biggest and most obvious problem with the way we currently handle files is format support. Let's you develop a cool new image format for people to use that provides better image quality at higher levels of compression. Even if the improvements you make are fairly obvious, you will have a difficult road ahead in terms of getting other applications to support it. The only programs initially capable of dealing with it will be the ones you write yourself, effectively making it nearly useless.
This is not a big issue as you may thought.
Take movie for example, the container format is usually generic like iso (includes mp4, 3gp, mov) or avi, mkv, rmvb. What people improves are the codec, which usually can install on modern platform and suddenly all application can support such codec.

And there won't be too much format since the container itself have little room (or need) to be improved.
The question then becomes just how exactly do we fix things? The problem as I see it is, interpreting and manipulating all stored data at the application level is a really lousy idea.
What you need is data handler that can be shared amount application, be that COM on Windows, .so on *nix or whatever; there is no problem with files.
It's just something that we're forced to do because all current operating systems don't do a very good
not very good compare to what? You can't compare thing with non-existence.
My operating system does not have processes, it has memory spaces. Those memory spaces can contain one or more threads, or none. All memory spaces can run any executable code that is currently loaded into the system, through the magic of fixed memory locations, although page based restrictions may be applied if desired. Given this model, the old rules no longer apply, and we can treat files a lot more like we treat an instance of a class. Let's look at how that works.
It may solve some problems, but I'm sure it introduce more issues.
It was fairly obvious I needed to have some way to store data from a class instance to a disk that wouldn't also involve storing every other class instance it held a reference too. But no matter how hard I tried, I couldn't think of anything that was actually workable, let alone something that would work better than just handling data stored on a device as a special case. And lord knows I tried. I scraped more ideas on that one problem than everything else about the OS put together. As it stands though, I think what I ended up with is still vastly superior that what existing OS's do.
If you are trying to store nested class data, learn serialization technique. Again, this has nothing to do with the underlying storage design, unless you are doing some kind of optimization.
When we go back to the earlier example of the developer with a new image format, we haven't really solved their problem yet. They can develop their format as a storage node instead of an application, which does make it easier to implement in other software, though no more so than a simple extension module would, should the application support such a feature. However, in my OS we can take things a step further. If we can have a storage node class, we can have an image node class.
So, instead of writing a codec one need to develop a storage driver?

Re: Why I don't like files

Posted: Fri Mar 09, 2012 1:15 pm
by Yoda
AnoHito wrote:Doesn't really matter since nothing exists anywhere in the OS specifications that is called a file. How a device stores data is up to the device.
Hmmm. Let's clear out some details. I have a flash drive with pictures, movies, music, texts... and put it on box with your OS. How apps running under your OS will see and manage all this content?
AnoHito wrote:A library is the wrong way to think of it. The image class for example, contains no actual code. It just acts as a standard interface for developers.
But you, for example, have developed the new image compression method. How it is supposed to share this method with OSs that don't yet have support of this format?

Re: Why I don't like files

Posted: Fri Mar 09, 2012 1:34 pm
by AnoHito
bluemoon wrote:This is not a big issue as you may thought.
Take movie for example, the container format is usually generic like iso (includes mp4, 3gp, mov) or avi, mkv, rmvb. What people improves are the codec, which usually can install on modern platform and suddenly all application can support such codec.

And there won't be too much format since the container itself have little room (or need) to be improved.
That's all good an fine until you start to question the wisdom of having all these container formats that do exactly the same thing with varying degrees of success. And have you ever tried to implement an mp4 parser? Not a whole lot of fun.
bluemoon wrote:What you need is data handler that can be shared amount application, be that COM on Windows, .so on *nix or whatever; there is no problem with files.
We'd still have the endless mess of redundant file formats we have today. And realistically, my example is ludicrously easy to implement compared to those things.
bluemoon wrote:not very good compare to what? You can't compare thing with non-existence.
That's only fair I guess. My OS doesn't exist yet, so it is only in theory superior to modern OS's in this respect. But according to the theory, it handles code at a sub-application level very well.
bluemoon wrote:It may solve some problems, but I'm sure it introduce more issues.
For example?
bluemoon wrote:If you are trying to store nested class data, learn serialization technique. Again, this has nothing to do with the underlying storage design, unless you are doing some kind of optimization.
I think you are misunderstanding the problem I was trying to solve. I was trying to figure out a way to deal with a case in which you ended up trying to store a reference that led to an endless chain of other references that you didn't necessarily want to store. And technically you can, it's just that any solution that allowed me to do so that I could think of also was obviously a worse idea than just using storage nodes.
bluemoon wrote:So, instead of writing a codec one need to develop a storage driver?
In what way is implementing files as classes of nodes instead of in applications the same as making every developer write a driver? And furthermore, on Windows you did used to have to write codecs as drivers (for audio/video), which just shows you how screwed up OS design can get if you try hard enough.
Yoda wrote: Hmmm. Let's clear out some details. I have a flash drive with pictures, movies, music, texts... and put it on box with your OS. How apps running under your OS will see and manage all this content?
You'd need to write wrappers for the specific file types, or else they would default to streams. Any files in formats for which a wrapper could be found would be treated as nodes by the OS.
AnoHito wrote:But you, for example, have developed the new image compression method. How it is supposed to share this method with OSs that don't yet have support of this format?
To put the feature into an OS with a traditional design, you'd have to do it the traditional way. I have no control over how other OSs handle things.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 2:06 pm
by Yoda
AnoHito wrote:To put the feature into an OS with a traditional design, you'd have to do it the traditional way. I have no control over how other OSs handle things.
I meant sharing between different installations of your OS, not with other types of OS. What it would be other than libraries?

Re: Why I don't like files

Posted: Fri Mar 09, 2012 2:07 pm
by bluemoon
AnoHito wrote:And have you ever tried to implement an mp4 parser? Not a whole lot of fun.
Yes I wrote a streaming server and broadcaster. The atom tree is in fact as simple as JSON, and the mov container documented from apple is quite complete.
AnoHito wrote:We'd still have the endless mess of redundant file formats we have today.
IMO having infinite number of format is not an issue. The user will decide what they need and, in reality only a couple formats is widely used.
bluemoon wrote:It may solve some problems, but I'm sure it introduce more issues.
>For example?
The magic of fixed location is the obvious concern. Who would organize those location? does the system generate it or there is an organization handle them like ethernet addresses?

Introducing new "process management" to tackle storage problems seems to over-complicate things and usually not a good direction.
Anyway it may not seem that bad now since what we talking here is too abstract concepts to identify any problems.
I was trying to figure out a way to deal with a case in which you ended up trying to store a reference that led to an endless chain of other references that you didn't necessarily want to store.
Why there is endless chain of reference at the beginning? is that a bug or what?
A well designed serializer do not try to serialize "back reference", and there is technique to serialize a circle loop of objects .
And furthermore, on Windows you did used to have to write codecs as drivers (for audio/video), which just shows you how screwed up OS design can get if you try hard enough.
No, on windows that's usually a dshow filter(COM object) that execute as same level as user application. Alternative solution is ffmpeg, gstreamer or ogg that you just pull in as static-link or DLLs. Whenever you updated the codec, you get new features to your application without recompile.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 2:14 pm
by AnoHito
Yoda wrote:I meant sharing between different installations of your OS, not with other types of OS. What it would be other than libraries?
Oh right, so in term of the specific implementation of the image class, you would need to have the class installed on whatever system you wanted to use it. That would be accomplished by a system similar to repositories on *nix.
bluemoon wrote:IMO having infinite number of format is not an issue. The user will decide what they need and, in reality only a couple formats is widely used.
...until you consider the fact that there are already an unjustifiably huge number of formats out there that do the same thing and that are widely used, and often a user doesn't have a choice because they are stuck using an application that only supports a format they don't normally like to use.
bluemoon wrote:The magic of fixed location is the obvious concern. Who would organize those location? does the system generate it or there is an organization handle them like ethernet addresses?
They aren't fixed across systems. The OS just guarantees that if code is loaded at a certain address in one memory space, it will be at the same address in all the others. On a different computer, or after restarting your computer, or even after the code is garbage collected, the guarantee no longer applies.
bluemoon wrote:Introducing new "process management" to tackle storage problems seems to over-complicate things and usually not a good direction.
Anyway it may not seem that bad now since what we talking here is too abstract concepts to identify any problems.
I'm not really sure what you mean. I didn't create a new form of process management just to tackle this issue. It is critical for a lot of aspects of the OS design, which I will get into in more detail later.
bluemoon wrote:Why there is endless chain of reference at the beginning? is that a bug or what?
A well designed serializer do not try to serialize "back reference", and there is technique to serialize a circle loop of objects .
Because I was trying to account for the possibility of allowing any node for anything in the entire OS to be stored. I would have no way of knowing what the node would contain in advance. It could potentially contain any number of references to any amount of other nodes, and I would have no way to tell what needed to be stored and what didn't without the node telling me.
bluemoon wrote:No, on windows that's usually a dshow filter(COM object) that execute as same level as user application. Alternative solution is ffmpeg, gstreamer or ogg that you just pull in as static-link or DLLs. Whenever you updated the codec, you get new features to your application without recompile.
What I'm talking about predates that stuff. Think Windows 95. There are still a few codecs out there that you have to install as drivers to get them to be supported in modern windows.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 2:55 pm
by piranha
How does this solve the underlying problem of files? Your OP says that the solution is to get rid of a concept of a file.
Wikipedia wrote:A computer file is a block of arbitrary information, or resource for storing information.
So how exactly is your method of storing information not a resource for storing information? It seems like you are just shuffling things around instead of actually solving the problem that you set out to solve. Okay, storage nodes and classes. But these still have to be stored somehow, and that is dependent on writing out some kind of data somewhere on disk in some format. So you're already back to where you started.

Oh, and the idea of files themselves is not flawed given your arguments, since file formats (at least, on sane operating systems) are stored as simple data in whatever way the programs that use those files require. A JPEG is just a bunch of bits, just like a text file. So in OS design, the file format is irrelevant. The problem instead lies within the applications that access them. So creating an interface would solve this. I don't really see how. If you create a new format, you'd still have to write code that interfaces with the format in the correct way, and getting every application to support this abstraction system is never going to happen.

To me it just seems to complicate things and not really change much in the end.

-JL

Re: Why I don't like files

Posted: Fri Mar 09, 2012 3:24 pm
by VolTeK
I like to think that, if my design wont work, because the tool doesn't allow it. Then its my designs fault and i need to go back to the drawing boards.

Or build my own tool.


I like to think that if a design is a certain way that my software that i have made cannot be run on,

I need to redesign my software.

If i don't like files

Id better get over it and make a better design that uses its concept or makes its idea better, because my current one probably isn't working for a reason that is a fault of my own.

Re: Why I don't like files

Posted: Fri Mar 09, 2012 3:40 pm
by AnoHito
piranha wrote:How does this solve the underlying problem of files? Your OP says that the solution is to get rid of a concept of a file.
Wikipedia wrote:A computer file is a block of arbitrary information, or resource for storing information.
So how exactly is your method of storing information not a resource for storing information? It seems like you are just shuffling things around instead of actually solving the problem that you set out to solve. Okay, storage nodes and classes. But these still have to be stored somehow, and that is dependent on writing out some kind of data somewhere on disk in some format. So you're already back to where you started.
Not really. I did fundamentally eliminate the concept of a "file" from the OS itself. There is nothing in my OS called a file, and generally speaking no one who isn't writing a new storage node ever had to deal with the fact that a storage node is anything else than just another class. You may not be able to eliminate the concept entirely, but you can at least marginalize it to the point where you don't have to deal with it that much.

Keep in mind that there are two things I'm really trying to accomplish here, which are really more important than just not having something called a file. The first is to break down the distinction between data in memory and data on a device. My design does that fairly well I think. The second is create a framework that encourages developers to implement stored data in a way that makes it interoperable across the system, without exceptions. My design does that pretty well too. You can argue all you want about how there are tools that make dealing with multiple file formats manageable in other OS's, but you still can't account for new formats, or dealing with proprietary formats and applications. With my design, those are no longer issues. And I should also point out that to get things to the point they are now at in terms of having useful mainstream applications that support a broad variety of formats, took an insane amount of work. If people had been doing it my way from the beginning, it would have been no work at all. How much more time should we waste on a "good enough" approach before we decide to look for something better?
piranha wrote:Oh, and the idea of files themselves is not flawed given your arguments, since file formats (at least, on sane operating systems) are stored as simple data in whatever way the programs that use those files require. A JPEG is just a bunch of bits, just like a text file. So in OS design, the file format is irrelevant. The problem instead lies within the applications that access them. So creating an interface would solve this. I don't really see how. If you create a new format, you'd still have to write code that interfaces with the format in the correct way, and getting every application to support this abstraction system is never going to happen.

To me it just seems to complicate things and not really change much in the end.
I think it changes quite a bit. Even if in the final analysis, it really doesn't change much beyond putting interfaces to files at the OS level, that one little change could make a lot of difference. How about opening a shell and typing:

Code: Select all

video = ~/somevideo
image = video.frame[255]
image.resize(640,480)
~/mypicture = image
I guess you probably could do that with the support of several topheavy applications in *nix. But you have to admit, the reality of having it be just that simple, that you could actually type that into the shell and it would work, is pretty sweet. If you can show me a shorter, more simple way to do that in any other OS, I'll be forced to agree that I'm not simplifying anything.
VolTeK wrote:I like to think that, if my design wont work, because the tool doesn't allow it. Then its my designs fault and i need to go back to the drawing boards.
99% of the time you'd be absolutely right. I have made numerous attempts to reinvent the wheel in my OS design, and most met with spectacular failure. The thing is, that if you keep trying new approach after new approach, sooner or later you might find something that actually is a real improvement. Or not. But if you don't try you'll never know. To be honest, my approach to sidestepping the concept of files is one the the aspects of of my OS design I am the least confident in. That's why I decided to go with it for my first post. The thing is, it's easy to argue that you should keep doing things a certain way because that's the way it's always been done. Try turning it around. Try starting with the assumption that every computer out there uses my theoretical OS, and you want to improve things by making all stored data in the system into files. Is that an improvement?

Re: Why I don't like files

Posted: Fri Mar 09, 2012 3:51 pm
by bubach
Good thread, nice to see some new and fresh takes on things no matter if this is actually going to work well in practice or not.