OSDev.org

Posted: **Thu Jul 09, 2015 2:59 pm**

File systems put names on things you open, read, write, seek and close.

I originally posted these three articles to http://www.osnews.com eons ago:

http://williamedwardscoder.tumblr.com/p ... ilesystems
http://williamedwardscoder.tumblr.com/p ... ic-streams
http://williamedwardscoder.tumblr.com/p ... -mime-type

Posted: **Thu Jul 09, 2015 3:39 pm**

Hi,

Schol-R-LEA wrote:@SpyderTL: are you familiar with Project Xanadu and the concepts behind xanalogical document storage? One of my primary goals in Kether is to implement xanalogical storage (independent of the current PX code base, but with the idea licensed from Nelson) rather than a file system.

For those unfamiliar with it - which will probably be the majority of those reading this - the idea is simple, but difficult to implement, and often poorly explained (including by Ted Nelson, sadly; while he's good at inspiring enthusiasm, he has a lot of trouble getting the idea across). From a technical perspective, the core of a xanalogical system is a collection of out-of-band internal hyperlinks to fragments of data, each of which holds the information about the source, creation date, size, ownership, visibility and format of the data fragment. Each document consists of several such links.

Your explanation makes perfect sense to me, and it actually sounds like a good idea with multiple benefits.

However; I'd replace the word "document" with "file", replace "data segment" with "extent", and replace "logical address" with "extent ID". Basically; instead of saying this:

Documents consist of one or more data segments, where each data segment has a logical address and the same data segment can be used by multiple documents.

It becomes this:

Files consist of one or more extents, where each extent has an ID and the same extent can be used by multiple files.

This makes it easier to understand for people familiar with existing file systems; and lets them concentrate on the differences in how it works rather than the differences in terminology used to describe it (which I suspect is part of the reason why Ted Nelson has trouble explaining it well).

The problems I can see with the idea (the "world wide distributed system" part of it) are that:

it needs some way for software to generate the "permanent universal IDs" that avoids conflicts; where both DNS names and IP addresses can change and some systems simply don't have either.
it needs some way to ensure the "permanent universal IDs" actually are permanent (or at least, to ensure the ID doesn't disappear while there's still something in the world that refers to it)
it needs to be able to withstand malicious participants (e.g. changing an extent's contents without changing its ID or version).

Note that these problems would be easy to solve if it was (e.g.) a distributed file system for one specific OS rather than something intended to be world wide.

Cheers,

Brendan

Posted: **Thu Jul 09, 2015 5:07 pm**

This reminds me of magnet: links for BitTorrent.

The hash of the file is used as a globally unique identifier. So you can request any part of any file by sending a request for a specific hash and block index. If you were to index your locally stored files, and perhaps your local network files by their hash, you could request any file, regardless of where it was located using a single value.

This only works for read-only files, since the hash changes every time you modify it... But you could probably get around this by hashing several other values that do not change.

Assuming all of this already existed, you could start up a machine with no OS, and NetBoot by requesting a boot loader from the network, which would request the kernel from the network, and so on.

That would be an interesting project.

Posted: **Thu Aug 13, 2015 7:42 pm**

Sorry for the delay in getting back to you on this; I have been occupied with other things, and in any case wanted a chance to collect my thoughts on the matter, since it isn't as simple as it sounds.

Brendan wrote:Your explanation makes perfect sense to me, and it actually sounds like a good idea with multiple benefits.

However; I'd replace the word "document" with "file", replace "data segment" with "extent", and replace "logical address" with "extent ID". Basically; instead of saying this:

Documents consist of one or more data segments, where each data segment has a logical address and the same data segment can be used by multiple documents.
It becomes this:

Files consist of one or more extents, where each extent has an ID and the same extent can be used by multiple files.
This makes it easier to understand for people familiar with existing file systems; and lets them concentrate on the differences in how it works rather than the differences in terminology used to describe it (which I suspect is part of the reason why Ted Nelson has trouble explaining it well).

While I can see your point with using the familiar terms, the problem is that is can be misleading. Indeed, the terms I used here are not the Xanadu team's terminology, but terms I chose because they were more familiar. Nelson et. al. deliberately coined a unique set of terms to put some conceptual distance between what they were doing and the conventional systems, not out of hubris or to make simple ideas sound exotic, but to avoid confusion with the existing ideas. Why? Because while 'documents' sound a lot like 'files', they are actually quite different.

I would actually say that xanalogical documents are more like derived relvars (views) on a schema (database) than they are like files, since they only exist as a conceptual structure when they are being used, and are generated at runtime each time they are accessed. To push this analogy further, a data segment is more like a tuple (a record) than a file extent, and a logical link is more like a key than a FAT entry or i-node. However, even this analogy breaks down, as there is no equivalent of relations (tables) or attributes (column fields) in Xanadu, at least not as a fixed aspect of the system.

Note that my juxtaposition of the formal and informal terms for RBDMS elements is deliberate, because it illustrates a related problem: the difference between concept, representation, presentation and transformer (or, if you prefer, model, instantiation, view, and controller). While we usually use the informal terms for describing databases, this can cause problems, because it leads to confusing the representation with what it represents (and a presentation and what it displays), which is why Codd came up with the formal terms in the first place. A relation is not a table; you can display a relation as a table of rows and columns, but the table just happens to be an easy visual presentation for the data relationships. The relation is a higher-order idea, and can be presented in several ways, tables just being the easiest for most people to casually understand; for that matter, 'table' is just as misleading when applied to the data structure representation, as there are several ways of representing a 'table'. Trying to force-fit relations into the structure of tables leads to all kinds of conceptual mismatches, and even though 95% of the time it can be kinda-sorta made to work, it causes serious issues at the edge cases.

To use yet another analogy, one of the problems that comes up in both mathematics and physics is the use of 'familiar' terms for formal structures that only bear a weak resemblance to the informal ideas whose name they overload. In physics, a lot of problems occur when you use terms like 'particle' and 'wave' to describe quantum mechanical fields, because while fields seem from the outside to have properties of a wave at times, and properties of a particle at others, they are neither waves nor particles, and their behavior is actually self-consistent - it only seems confusing when you impose the ideas of 'wave' and 'particle' on them. While physicists still use the term 'particle', their definition of 'particle' is a formalism that is quite different from the usual understand of what a particle is, and in many ways it would be better if they stopped using the term in this context entirely, because it leads to conceptual confusion. The same occurs when discussing the terms 'line' and 'point' in non-Euclidean geometry; once you stop applying Euclid's fifth postulate, you end up with 'lines' and 'points' that behave very differently from the familiar ideas of what lines and points are.

While it would be possible to use 'file' and 'extent' for these ideas, and simply say that they are formalisms for Xanadu and that 'everybody knows' the difference between Xanadu files and the files seen elsewhere, it would lead to the same kinds of conceptual mismatches that you get when trying to understand a line in elliptical geometry from the perspective of plane geometry.

I'll try to get back the other half of your post later.

Posted: **Fri Aug 14, 2015 2:35 pm**

bluemoon wrote:The main operations on file include: open, close, read, write and seek (thanks alexfru). This pretty much covers everything in the world, or do they?

Then you have special cdrom that acts as file, but you need to support a function to eject the disc.
Then you have linear frame buffer that acts as a file, but you need a way to adjust video mode.
Then you have special sound card that acts a a file, but you need to provide device control function like volume and mechanism for injecting audio filters.

Then you want to have event-based notifications beside the basic read/write (e.g. use press the eject button).

All those extra control is not unified into the file concept (although you assign function in ioctl, it just doesn't make sense to eject disc on a sound card).

On Linux, the OS give up and the file concept works just like IPC. You then open up a data transfer file and a control file and do it yourself for everything other that a real file.

Actually I believe that the way modern UNIX systems do this is wrong. One should be able to eject the disk via a "cdrom device control file", change the video mode by writing to a "video mode setting file" and control the sound card volume by writing to a "sound channel volume file" (one for each sound channel - left, right, etc.). I guess they get halfway there with /proc and /sys, but it could be better.

SpyderTL wrote:The hash of the file is used as a globally unique identifier. So you can request any part of any file by sending a request for a specific hash and block index. If you were to index your locally stored files, and perhaps your local network files by their hash, you could request any file, regardless of where it was located using a single value.

Clever idea, but a hash is not really globally unique. Get a large enough torrent network and sooner or later you're going to have more than one file with the same hash. Furthermore, one cannot reasonably store a unique identifier for every file in the world, as there is an ever-growing number of files and trying to have enough numerical space to uniquely identify every one of them would not be feasible, and would likely result in merely identifying the file by something of similar magnitude to its actual contents (which is redundent for obvious reasons).

OSDev.org

Sockets as a filesystem?

Re: Sockets as a filesystem?

Re: Sockets as a filesystem?

Re: Sockets as a filesystem?

Re: Sockets as a filesystem?

Re: Sockets as a filesystem?