OSDev.org

Posted: **Fri Jul 05, 2013 10:43 pm**

Hi,

Combuster wrote:What occurred to me is that the "file" is basically untyped, just as any data access in assembly lacks types. We can do better than that by starting to make the entire filesystem mime-type aware in some fashion.

If files had types (like variables in a high level language), then your VFS could do automatic type conversion (like casting a float to an int in a high level language). The VFS could use some sort of "file format converter plug-ins"; so that (e.g.) if an application wants to open a word processor file as a picture the VFS would convert the file and give the application a picture. Of course you'd want to allow the VFS to use multiple file format converter plug-ins - e.g. if you don't have a plug-in to convert a spreadsheet file directly into a picture file but you do have "spreadsheet file -> word processor file" and "word processor file -> picture" then with a little intelligence built into the VFS it still works.

The main problem is that nothing is designed for this. For example, if you download 1234 files with FTP (using a utility like "wget") then you don't want to make the user manually enter the type of each of those files; and it's extremely difficult (and sometimes impossible) to reliably "auto-guess" the file type using heuristics. To do it well you've got 2 choices - invent replacements for things that mess up the file type (e.g. replace or extend protocols like FTP and HTTP, file systems like FAT and ISO9660, archive file formats like zip and tar, etc. so that each file's type is preserved) or make it extremely easy to reliably "auto-guess" what type files are (e.g. define a standard header that include file's type to avoid all guess-work). Sadly, if you really want to improve things sometimes you need to break compatibility with something.

Then next thing to consider is performance. If a file is often converted to a different type; it'd be really nice if the VFS would automatically cache the converted file in the file system and skip conversion. For example, native file systems could be designed to so that each file has an original file type and the original file's data, plus (none or more) extra types with extra file data. Of course if the original file is modified you'd discard any of the file's extra types with extra data (forcing any conversions to be done again if/when the original file is modified).

Now think about "single source file". You could have a networked file system (e.g. something like NFS) running on a server, and a file called "foo" containing the source code for an application. When computer #1 asks the server to read the file "foo" as a "32-bit 80x86 native application" file type, the server would use a "file format converter plug-in" to compile the source file (if it's not already cached) and computer #1 automatically gets a file it can execute natively; and when computer #2 on the network asks the server to read the same file "foo" as an "64-bit ARM native application" file type it also gets a file it can execute natively. An office could have a network of 25 disk-less computers all connected to the same file server and everything would automatically work seamlessly (even if all the disk-less computers have very different types of CPUs in them).

Of course compiling from "plain text" source code has a few problems - it's slower (e.g. tokenising, parsing, grammar checking and some pre-optimisation to convert the source into some form of "intermediate representation" for the back-end to work with) and creates problems for companies that want to publish applications as closed source. What you want is some sort of "portable byte-code" file format; then the VFS can have a "source -> byte-code converter" and "byte-code -> native executable converters" (and chain them together when needed). That way a company can provide "portable byte-code" instead of giving away their source code, and for people that do provide source code you'd avoid a reasonable amount of overhead if/when the source code is compiled for several different targets.

The only other thing we'd need to figure out is what such an amazingly advanced OS should be called. I suggest we call it "a mere sub-set of Brendan's Clustering Operating System" because this is only really part of my plan...

Cheers,

Brendan

Posted: **Sat Jul 06, 2013 2:38 am**

Brendan wrote:If files had types (like variables in a high level language), then your VFS could do automatic type conversion (like casting a float to an int in a high level language). The VFS could use some sort of "file format converter plug-ins"; so that (e.g.) if an application wants to open a word processor file as a picture the VFS would convert the file and give the application a picture. Of course you'd want to allow the VFS to use multiple file format converter plug-ins - e.g. if you don't have a plug-in to convert a spreadsheet file directly into a picture file but you do have "spreadsheet file -> word processor file" and "word processor file -> picture" then with a little intelligence built into the VFS it still works.

We've talked about this before, and I still think this doesn't quite work out because the VFS needs to figure out things it can't easily know. The problem here is however not the typed files (I agree that the information should really be in the file system), nor the converter plugins for the VFS, but that you want to trigger the conversion automagically.

It becomes totally reasonable once you make the conversion explicit, like in programming languages. Let's assume that we have a file test.pdf that we want to convert to sound. In our specific case it's text to be read (something the VFS couldn't figure out by itself). Let's create a kind of extended symlink test.txt that refers to test.pdf converted to text/plain (using the default converter). And then create another test.ogg that refers to test.txt converted to audio/vorbis using the converter "TTS German male" (the language is another thing the VFS would potentially fail to figure out reliably, and the voice I want it can't possibly know).

Directly opening the PDF as a sound file wouldn't work out, but with some explicit steps it does work. What's left is to figure out how to present this nicely. I guess with some good (per-application, not supplied by the OS) defaults you can already do a lot to avoid additional user interaction in the common cases.

The main problem is that nothing is designed for this. For example, if you download 1234 files with FTP (using a utility like "wget") then you don't want to make the user manually enter the type of each of those files; and it's extremely difficult (and sometimes impossible) to reliably "auto-guess" the file type using heuristics.

Auto-guessing works well enough today that it doesn't trouble me. When I click on the file, it generally does the right thing. (Well, a reasonable thing at least. It might open the file not with the exact application I was thinking of, but another one that can handle the same file type.)

So I think it would be perfectly fine to use the file system metadata whereever it's present. If it's not there, use auto-guessing, and provide an option so that the user can manually correct a wrong guess if ever needed. So you don't invent replacements or improve auto-guessing, but you do both. Breaking compatibility is great for experiments, but in real use cases it's not an option. An OS not being able to use HTTP or iso9660 would be ridiculous.

Now think about "single source file". You could have a networked file system (e.g. something like NFS) running on a server, and a file called "foo" containing the source code for an application. When computer #1 asks the server to read the file "foo" as a "32-bit 80x86 native application" file type, the server would use a "file format converter plug-in" to compile the source file (if it's not already cached) and computer #1 automatically gets a file it can execute natively

Why accept the limitation to a single file and not go a step further: Converters for entire directories? (I'm not entirely sure if typed directories that describe what files they can contain would be a good idea, but it's not directly related anyway.)

Posted: **Sat Jul 06, 2013 8:14 am**

Hi,

Kevin wrote:
Brendan wrote:If files had types (like variables in a high level language), then your VFS could do automatic type conversion (like casting a float to an int in a high level language). The VFS could use some sort of "file format converter plug-ins"; so that (e.g.) if an application wants to open a word processor file as a picture the VFS would convert the file and give the application a picture. Of course you'd want to allow the VFS to use multiple file format converter plug-ins - e.g. if you don't have a plug-in to convert a spreadsheet file directly into a picture file but you do have "spreadsheet file -> word processor file" and "word processor file -> picture" then with a little intelligence built into the VFS it still works.
We've talked about this before, and I still think this doesn't quite work out because the VFS needs to figure out things it can't easily know. The problem here is however not the typed files (I agree that the information should really be in the file system), nor the converter plugins for the VFS, but that you want to trigger the conversion automagically.

It becomes totally reasonable once you make the conversion explicit, like in programming languages. Let's assume that we have a file test.pdf that we want to convert to sound. In our specific case it's text to be read (something the VFS couldn't figure out by itself). Let's create a kind of extended symlink test.txt that refers to test.pdf converted to text/plain (using the default converter). And then create another test.ogg that refers to test.txt converted to audio/vorbis using the converter "TTS German male" (the language is another thing the VFS would potentially fail to figure out reliably, and the voice I want it can't possibly know).

Directly opening the PDF as a sound file wouldn't work out, but with some explicit steps it does work. What's left is to figure out how to present this nicely. I guess with some good (per-application, not supplied by the OS) defaults you can already do a lot to avoid additional user interaction in the common cases.

You're right; but the existence of an automatic system to do file format conversion doesn't prevent users from using applications to do explicit conversions. For example (your example), the user can start a speech synthesiser program, ask it to open test.pdf (as text), then explicitly tell it to convert the text to "TTS German male" and save it as mysound (as a sound file); and then open mysound (as a sound file) in some sort of media player. Of course if the user simply doesn't care and/or the default settings happen to be good enough for them; then they can just open test.pdf (as a sound file) in some sort of media player without any unnecessary hassle.

Basically users have the option of avoiding the hassle of explicit conversions. For some cases (e.g. converting PDF into text) the user might never want to bother with explicit conversions; and for other cases (e.g. text to speech) they might never want the automatic conversion.

Of course most file format conversion are conversions from one file format to a similar file format (e.g. between GIF, PNG, TIFF and BMP) where (as long as it's a lossless conversion) there's no reason for the user to care and no reason for the user to want an explicit conversion. For the remainder the default conversion will still be suitable about half the time. This means that in the majority of cases (I'd expect over 75% of cases) automatic conversion will be fine; and the user will be able to avoid the hassle of explicit conversion for the majority of conversions.

Kevin wrote:
The main problem is that nothing is designed for this. For example, if you download 1234 files with FTP (using a utility like "wget") then you don't want to make the user manually enter the type of each of those files; and it's extremely difficult (and sometimes impossible) to reliably "auto-guess" the file type using heuristics.
Auto-guessing works well enough today that it doesn't trouble me. When I click on the file, it generally does the right thing. (Well, a reasonable thing at least. It might open the file not with the exact application I was thinking of, but another one that can handle the same file type.)

So I think it would be perfectly fine to use the file system metadata whereever it's present. If it's not there, use auto-guessing, and provide an option so that the user can manually correct a wrong guess if ever needed. So you don't invent replacements or improve auto-guessing, but you do both. Breaking compatibility is great for experiments, but in real use cases it's not an option. An OS not being able to use HTTP or iso9660 would be ridiculous.

One of the goals of my project is to create a standardised set of native file formats that all use the same generic header; where there's a "file type" field in this generic header that makes it extremely simple to auto-guess the file format 100% reliably. All native applications for the OS will only ever use these native file formats for all files. The problem is that there's a lot of existing files out there; which means that to make this work I need a way to automatically convert "legacy" file formats into my native file formats. Ironically, this is why I started thinking about automatic file format conversions in the first place - so I could shield applications running on my OS from the "legacy mess" that infests every other OS, while still providing a clean way for users to shift their data from existing OSs to my OS.

Kevin wrote:
Now think about "single source file". You could have a networked file system (e.g. something like NFS) running on a server, and a file called "foo" containing the source code for an application. When computer #1 asks the server to read the file "foo" as a "32-bit 80x86 native application" file type, the server would use a "file format converter plug-in" to compile the source file (if it's not already cached) and computer #1 automatically gets a file it can execute natively
Why accept the limitation to a single file and not go a step further: Converters for entire directories? (I'm not entirely sure if typed directories that describe what files they can contain would be a good idea, but it's not directly related anyway.)

Then you'd have something that behaves like a directory in some situations but behaves like a file in other situations. I can't think of an advantage that justifies the confusion that this would cause.

Cheers,

Brendan

Posted: **Sat Jul 06, 2013 10:23 am**

Brendan wrote:Then you'd have something that behaves like a directory in some situations but behaves like a file in other situations. I can't think of an advantage that justifies the confusion that this would cause.

An archive also behaves like a directory in some situations and like a file in others. I like archives.

Furthermore, I personally don't like that idea anyway. Compiling a sufficiently large program requires just too much time to be done automatically – but that's just my personal opinion.

However, I think you should generally consider the idea of converting multiple files into a single output, not just for that “code to executable” conversion, but more importantly for instance for converting an audio track and a video stream (without sound) to a video stream containing that audio track. But maybe you'd also approach that problem in a different way (e.g., “a video is converted into a stream of single images and a separated audio track anyway”).

Posted: **Sat Jul 06, 2013 10:48 am**

Kevin wrote:
Brendan wrote:If files had types (like variables in a high level language), then (...)
We've talked about this before, and I still think this doesn't quite work out because the VFS needs to figure out things it can't easily know. The problem here is however not the typed files (I agree that the information should really be in the file system), nor the converter plugins for the VFS, but that you want to trigger the conversion automagically.

Why put the problem where it shouldn't belong? If you really want the filesystem to be typed, change your entire storage accessors to include typing as well. After all, the app writing the data already knows the filetype. The app reading the data knows what filetypes it can support.

Posted: **Sat Jul 06, 2013 11:57 am**

Combuster wrote:After all, the app writing the data already knows the filetype.

Does it? wget for instance doesn't, if you don't rely on the MIME type (which Brendan seems to do).

Posted: **Sun Jul 07, 2013 1:56 am**

If you want to properly discuss truly new ideas, migrating legacy sources should only be an afterthought.

if you don't rely on the MIME type

That's about as useless as assuming an app is broken to demonstrate it's broken if it is. The web is pretty much completely filetype aware already, so any conversion (and sanity checking since the web is evil) should be done on that level where needed.

Posted: **Sun Jul 07, 2013 2:54 am**

Brendan wrote:I could shield applications running on my OS from the "legacy mess" that infests every other OS

If I wrote an image manipulation program (e.g. like GIMP, Photoshop) that runs natively on BCOS, I would not have to care any other file formats expect the native one? I mean, the actual program in this case. All the "import and export" features are a part of the VFS. In practise, is it the author of the image manipulation program who provides the plug-ins that are needed? In the long run, if the native format proves to be superior (and the BCOS itself), it would make less important to have those plug-ins at all.

Posted: **Mon Jul 08, 2013 12:05 am**

Hi,

Antti wrote:
Brendan wrote:I could shield applications running on my OS from the "legacy mess" that infests every other OS
If I wrote an image manipulation program (e.g. like GIMP, Photoshop) that runs natively on BCOS, I would not have to care any other file formats expect the native one? I mean, the actual program in this case. All the "import and export" features are a part of the VFS.

Yes - you'd only need to care about the native file format.

Antti wrote:In practise, is it the author of the image manipulation program who provides the plug-ins that are needed?

You could also write plug-ins if you wanted to; but they'd be separate things (not part of your application), all applications will take advantage of them (not just your application), and they can be done at any time.

For an example; a company might create a proprietory/closed source image manipulation program and sell it for $123 per copy and then go out of business; and 10 years later the application can still work properly on hardware that didn't exist when the application was written (due to new "byte-code to native" converters) and it'll also support file formats that didn't exist when the application was written (due to new image file converters). In addition; existing converters can be improved (e.g. better optimisation in "byte-code to native" converters, better conversions and more options/features added to image file converters, etc) so that even though the company died and there's no source code, their application takes advantage of improvements elsewhere.

Antti wrote:In the long run, if the native format proves to be superior (and the BCOS itself), it would make less important to have those plug-ins at all.

Yes.

There is one problem though. If (for any reason) the native file format/s need to be changed, you'd need file format converters to convert the new native file format into the previous native file format so that you don't break existing/older native applications. Fortunately there's a lot of existing "hindsight" (due to the evolution of existing file formats), so it doesn't take too much research to avoid pitfalls when designing new native file formats.

Cheers,

Brendan

Posted: **Mon Jul 08, 2013 7:25 am**

Brendan wrote:[BCOS Introduction]

If I had an audio file (a song) and it would use the native audio format. After listening to it too much I would like to open the file in a scorewriter. The author of that software (or someone else) has created a plug-in that converts audio to music notations. It does quite good job. However, I want to do some fine adjustments. After working on it several hours, I have perfect notations of that song.

Can I save my "fine adjustments" in the original audio file? After all, both of them refer to the same piece of music. I would like that my fine adjustments are available when someone opens the audio file (my version) with a scorewriter. I am not good but I made better notations than the plug-in converter.

This is not just a technical issue. Do you want to have this kind of ideology? Then the audio file is like a collection of different representations of the content. It would not just rely on the automatic converter. It also has "user-defined hints" to help the conversion. It would make the unit big, of course. However, it could be possible to extract a light version of the file if needed. For example, to share the vanilla audio file (native also) without anything extra.

Everything would probably be transparent to users? One challenge: if the scores and audio are equally important, which one would be the best format? The one that happens to come first?

Posted: **Mon Jul 08, 2013 9:50 am**

Antti wrote:Everything would probably be transparent to users? One challenge: if the scores and audio are equally important, which one would be the best format? The one that happens to come first?

In this case, the conversion is accurate in one direction only. Audio can be cached from the score and treated as identical, but score can't be assigned losslessly from audio and should result in a separate object.

Posted: **Mon Jul 08, 2013 1:32 pm**

Hi,

Antti wrote:
Brendan wrote:[BCOS Introduction]
If I had an audio file (a song) and it would use the native audio format. After listening to it too much I would like to open the file in a scorewriter. The author of that software (or someone else) has created a plug-in that converts audio to music notations. It does quite good job. However, I want to do some fine adjustments. After working on it several hours, I have perfect notations of that song.

Can I save my "fine adjustments" in the original audio file?

No; but (if file system permissions allow it) you could replace the original file (including any cached conversions of it) with your new version.

Antti wrote:After all, both of them refer to the same piece of music. I would like that my fine adjustments are available when someone opens the audio file (my version) with a scorewriter. I am not good but I made better notations than the plug-in converter.

Each file has one set of permissions (including one owner) and one set of associated metadata (name, creation time, etc); which all apply to any cached conversions. Also, the OS can discard any cached conversions whenever it likes (e.g. when you're running out of disk space, or when a file format converter is updated) as it knows that it can create the conversion again if it needs to and nothing important is lost (and the OS can ignore the cached conversions when backups are created for the same reason). Basically; even though the OS would cache conversions on disk for performance reasons; this "caching" is transparent and (as far as applications and users can tell, excluding performance) it behaves as if conversions are never stored on disk and the conversions are done each time they're requested.

Your new "fine adjustments" will have a different creation time, and may have different permissions and a different owner; and you probably don't want the OS to discard your changes whenever it likes, or ignore it for backups.

Cheers,

Brendan

Posted: **Fri Nov 15, 2013 4:35 am**

I have an idea (I am aware that this topic hasn't been touched for a long time)

Code: Select all

Normal Code Structure (OS):
OS----> (whatever)
    --> kernel.c
    --->kernel (DIR)
    --->fs (DIR)
    --->libc (DIR)
    --->boot (DIR)
    And what not....

Single Source file structure

Code: Select all

OS --->
 ---->kernel.c
 ---->boot.asm

As people have said it would make the code look like a piece of junk.
But I have a suggestion :
We can make our sources look like this :

Code: Select all

// MAIN Functions KERNEL/MAIN
void main()
{
    //Some code
}
// KEYBOARD Functions KERNEL/KEYBOARD
void keyboard()
{
   //Some code
}
// FAT16 Driver (or whatever you call it) KERNEL/FAT16
void fat16()
{
   //Some code
}
// Command-Line Interface KERNEL/CLI
void cli()
{
   //Some Code
}

With this you can open a text editor, find your code, if you are modifying something in the CLI, then simply press Ctrl+F,
and file KERNEL/CLI Just like a normal directory!
(Although this is useless, and no sane person would ever do it, I just wanted to share this.)

OSDev.org

Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file

Re: Single source file