Why I don't like files
Posted: Fri Mar 09, 2012 12:05 pm
Last year, I made a few post on this forum under a different name. I switched names because I decided I might want to start using a name people can actually pronounce. And it matches the name I'm now using on my blog
The last time I posted here, I said I was ready to start looking for help developing my new operating system, which would be significantly different from anything anyone had seen before, and make major improvements on existing OS design. That didn't really happen. I was planning on releasing all the details of what I'd done up until that point, so other people could comment and contribute. But then I decided things needed a little more work, and a little later I decided things needed a lot more work. Then I got busy with other projects, had some health issues, and more or less got nothing done for a really long time. I'm still busy on other projects at the moment, at least one of which will be a cool proof of concept for some of my design theories. I'm not really sure when I'll have time to actually implement my OS on a real world platform, so I thought it might be fun just to talk about some ideas I'm working on.
One of those ideas, is to get rid of the concept of a file. At a certain level, I guess a file seems like a simple, logical solution to an obvious problem. You have data, you need to store it, you copy it to a stream, and you're done. Simple, elegant, functional... right? Yes and no. Like many simple ideas, files are not a concept that scales well. In a modern system, you will often have thousands of different file formats, and almost as many programs just to deal with them. Said programs may have limitations, bugs, inconsistencies, insane overhead, not have source code available, etc.
The biggest and most obvious problem with the way we currently handle files is format support. Let's you develop a cool new image format for people to use that provides better image quality at higher levels of compression. Even if the improvements you make are fairly obvious, you will have a difficult road ahead in terms of getting other applications to support it. The only programs initially capable of dealing with it will be the ones you write yourself, effectively making it nearly useless. This can be a seriously stifling factor for innovation. How often do people have cool ideas, but the barriers for implementing said ideas are just too high? We may never know until we remove those barriers.
The question then becomes just how exactly do we fix things? The problem as I see it is, interpreting and manipulating all stored data at the application level is a really lousy idea. It's just something that we're forced to do because all current operating systems don't do a very good just of dealing with code at a sub application level. If they do it at all. Code generally speaking, must exist inside an application to be used. The only exception is when operating systems allow you to load code from external modules. But that code is still ultimately run in a process, which is ultimately attached to a single executable file.
My operating system does not have processes, it has memory spaces. Those memory spaces can contain one or more threads, or none. All memory spaces can run any executable code that is currently loaded into the system, through the magic of fixed memory locations, although page based restrictions may be applied if desired. Given this model, the old rules no longer apply, and we can treat files a lot more like we treat an instance of a class. Let's look at how that works.
Instead of a file, we have a storage node. A storage node is a basic class that all classes that wish to store data on an external device must inherit from. I should point out, I was initially not a fan of even having a special class to deal with data on non-memory devices. I was convinced there was a good way to handle things that would allow you to map data from a class instance in memory to a device without having it inherit from a special class. But no matter what I did, I couldn't find a way to handle nested references to instances of other classes. It was fairly obvious I needed to have some way to store data from a class instance to a disk that wouldn't also involve storing every other class instance it held a reference too. But no matter how hard I tried, I couldn't think of anything that was actually workable, let alone something that would work better than just handling data stored on a device as a special case. And lord knows I tried. I scraped more ideas on that one problem than everything else about the OS put together. As it stands though, I think what I ended up with is still vastly superior that what existing OS's do.
A storage node is designed to be the singular interface to data stored in a particular format on the system. You can create a storage node with the following command:
node = Storage::Node.new()
The new() method has an optional parameter for a device. If no device is specified, the node will default to storing data in memory.
In order to access data in the node, you can call it's read or write method. For example:
You can also access data as a stream by calling:
A storage node exists on only one device at a time, but it may be referenced in a path on other devices. For the sake of behaving more like you would expect from a traditional OS however, the default action when assigning a storage node to a path is to move it onto the same device as the device of the node to which you are assigning it. For example:
If that isn't desirable, simply call:
Should the node's device be unavailable when an attempt is made to load it, an exception will be thrown. Should the device be available and should node not be found on it, the reference will be purged automatically.
That is the interface we provide to data on the disk, but we still have haven't really established how we actually handle things at a low level. The anwser is, it doesn't really matter. At the device level, the device gets to decide how it stores data. It should of course, avoid overhead to the greatest degree possible. You could, for example, figure out a scheme where storing individual strings of data in named locations wouldn't be such a grossly inefficient waste of space. Which I am of course aware that it is, but the idea is to provide an interface for the developer that will be as simple as possible to just get stuff implemented, and then work in optimizations in the margins. I don't want to get too deep into that particular aspect in this post, but it is something I've put some thought into.
When we go back to the earlier example of the developer with a new image format, we haven't really solved their problem yet. They can develop their format as a storage node instead of an application, which does make it easier to implement in other software, though no more so than a simple extension module would, should the application support such a feature. However, in my OS we can take things a step further. If we can have a storage node class, we can have an image node class.
When you get down to it, the only thing modern OS's are really missing in terms of not making out lives a living hell when dealing with different file formats, is a standard interface to basic data types. We already more or less know what an image is, and what we need to do in order to deal with it. The specifics of the format are what's in our way. What we need is a lowest common denominator class that every class on the system must inherit from to be considered an image. Here is more or less how things would work:
And now we have an image format that is at the very least readable, by every other class on the system. You can specifically code other classes that deal with the nuances of your format, but for the 99% of usage cases where that stuff isn't needed, you are covered.
What really baffles me, is that this is a really simple idea. I mean, it's so simple, I really can't figure out why the traditional file/application model has held up as long as it has. Sure you can add support for new formats relatively easily in open source software, but it's still one hell of a lot more work than this is. Any thoughts on this guys? I'm almost scared to take this approach for fear that if no one has tried it before, there might be a reason why.
The last time I posted here, I said I was ready to start looking for help developing my new operating system, which would be significantly different from anything anyone had seen before, and make major improvements on existing OS design. That didn't really happen. I was planning on releasing all the details of what I'd done up until that point, so other people could comment and contribute. But then I decided things needed a little more work, and a little later I decided things needed a lot more work. Then I got busy with other projects, had some health issues, and more or less got nothing done for a really long time. I'm still busy on other projects at the moment, at least one of which will be a cool proof of concept for some of my design theories. I'm not really sure when I'll have time to actually implement my OS on a real world platform, so I thought it might be fun just to talk about some ideas I'm working on.
One of those ideas, is to get rid of the concept of a file. At a certain level, I guess a file seems like a simple, logical solution to an obvious problem. You have data, you need to store it, you copy it to a stream, and you're done. Simple, elegant, functional... right? Yes and no. Like many simple ideas, files are not a concept that scales well. In a modern system, you will often have thousands of different file formats, and almost as many programs just to deal with them. Said programs may have limitations, bugs, inconsistencies, insane overhead, not have source code available, etc.
The biggest and most obvious problem with the way we currently handle files is format support. Let's you develop a cool new image format for people to use that provides better image quality at higher levels of compression. Even if the improvements you make are fairly obvious, you will have a difficult road ahead in terms of getting other applications to support it. The only programs initially capable of dealing with it will be the ones you write yourself, effectively making it nearly useless. This can be a seriously stifling factor for innovation. How often do people have cool ideas, but the barriers for implementing said ideas are just too high? We may never know until we remove those barriers.
The question then becomes just how exactly do we fix things? The problem as I see it is, interpreting and manipulating all stored data at the application level is a really lousy idea. It's just something that we're forced to do because all current operating systems don't do a very good just of dealing with code at a sub application level. If they do it at all. Code generally speaking, must exist inside an application to be used. The only exception is when operating systems allow you to load code from external modules. But that code is still ultimately run in a process, which is ultimately attached to a single executable file.
My operating system does not have processes, it has memory spaces. Those memory spaces can contain one or more threads, or none. All memory spaces can run any executable code that is currently loaded into the system, through the magic of fixed memory locations, although page based restrictions may be applied if desired. Given this model, the old rules no longer apply, and we can treat files a lot more like we treat an instance of a class. Let's look at how that works.
Instead of a file, we have a storage node. A storage node is a basic class that all classes that wish to store data on an external device must inherit from. I should point out, I was initially not a fan of even having a special class to deal with data on non-memory devices. I was convinced there was a good way to handle things that would allow you to map data from a class instance in memory to a device without having it inherit from a special class. But no matter what I did, I couldn't find a way to handle nested references to instances of other classes. It was fairly obvious I needed to have some way to store data from a class instance to a disk that wouldn't also involve storing every other class instance it held a reference too. But no matter how hard I tried, I couldn't think of anything that was actually workable, let alone something that would work better than just handling data stored on a device as a special case. And lord knows I tried. I scraped more ideas on that one problem than everything else about the OS put together. As it stands though, I think what I ended up with is still vastly superior that what existing OS's do.
A storage node is designed to be the singular interface to data stored in a particular format on the system. You can create a storage node with the following command:
node = Storage::Node.new()
The new() method has an optional parameter for a device. If no device is specified, the node will default to storing data in memory.
In order to access data in the node, you can call it's read or write method. For example:
Code: Select all
node.write(:test, "data")
node.read(:test) #returns "data"
Code: Select all
node.stream(:test) #returns a stream that the data can be read from
Code: Select all
disknode["path/to/node"] = node #node is now stored on the same device as disknode
Code: Select all
disknode.link("path/to/node", node) #a new reference to node was created in the specified path, which points to node on it's current device
That is the interface we provide to data on the disk, but we still have haven't really established how we actually handle things at a low level. The anwser is, it doesn't really matter. At the device level, the device gets to decide how it stores data. It should of course, avoid overhead to the greatest degree possible. You could, for example, figure out a scheme where storing individual strings of data in named locations wouldn't be such a grossly inefficient waste of space. Which I am of course aware that it is, but the idea is to provide an interface for the developer that will be as simple as possible to just get stuff implemented, and then work in optimizations in the margins. I don't want to get too deep into that particular aspect in this post, but it is something I've put some thought into.
When we go back to the earlier example of the developer with a new image format, we haven't really solved their problem yet. They can develop their format as a storage node instead of an application, which does make it easier to implement in other software, though no more so than a simple extension module would, should the application support such a feature. However, in my OS we can take things a step further. If we can have a storage node class, we can have an image node class.
When you get down to it, the only thing modern OS's are really missing in terms of not making out lives a living hell when dealing with different file formats, is a standard interface to basic data types. We already more or less know what an image is, and what we need to do in order to deal with it. The specifics of the format are what's in our way. What we need is a lowest common denominator class that every class on the system must inherit from to be considered an image. Here is more or less how things would work:
Code: Select all
class MyImageClass < Image::Node < Storage::Node
def format(format)
#return a string containing the image data in the format specified by "format". ":rgb" must be supported.
end
def width
#return the image's width
end
def height
#return the images height
end
end
What really baffles me, is that this is a really simple idea. I mean, it's so simple, I really can't figure out why the traditional file/application model has held up as long as it has. Sure you can add support for new formats relatively easily in open source software, but it's still one hell of a lot more work than this is. Any thoughts on this guys? I'm almost scared to take this approach for fear that if no one has tried it before, there might be a reason why.