Describe your dream OS

iansjack · Post by **iansjack** » Wed Aug 28, 2019 9:28 am

Qbyte wrote:Another thing I don't like about existing operating systems is that they mandate a program format, such as an ELF/COFF header in the case of *nix or a PE header in the case of windows. I'm usually a fan of "mechanism, not policy", so I'd prefer programs to have no header at all, but instead simply have the entry point of a program be the base address of the file. The program itself can then make various system calls during initialization in order to achieve what the header would normally do, if desired. This is better because it eliminates the distinction between programs and functions and it also makes programs simpler because most of the time a program won't need to make any system calls and will be content with the default scheme of having the whole program image instantiated as a single WRX segment.

The header of, for example, an elf file contains a lot more information than just the entry point. What about all the entries for dynamic linking? Load address of the code segment (as opposed to entry point), load address of the data segment, details of the amount of space to reserve for uninitialized variables (and where to reserve it), debug information, etc., etc.

It's not a question of the header "doing stuff" that system calls could do (system calls do that stuff anyway), but the information needed for those system calls to do their job.

eekee · Post by **eekee** » Wed Aug 28, 2019 1:18 pm

Qbyte wrote:NVRAM would greatly simplify things for multiple reasons. The first is that there would never be any need to manage two separate storage mediums and copy data between them. All data, both temporary and persistent, would be stored in a single, unified store. The second reason is that all code and data can be executed or accessed in-place; there is never any need to load programs or map data and there is no need to periodically back up data in case a power outage or other fault occurs.

Ah! Yes, I wasn't thinking.

Qbyte wrote:The third reason is that current non-volatile storage mediums have a number of severe drawbacks which have to be taken into account when designing a kernel. HDD's are notoriously slow and require seek times to be factored in when storing data which requires more complex storage algorithms, whereas with NVRAM, access times are uniform regardless of where the data is located. SSD's have non-linear read/write relations, with reading being much faster than writing and having a limited number of write cycles. NVRAM would just do away with all of that complexity and would be located on the CPU itself instead of being external.

This I'm not so sure about. I'm told addressing huge amounts of memory limits performance. I can't think of a good workaround. A segment:offset scheme might work, but introduces a degree of complexity. A very wide data bus (to load a whole cache line at once) would present synchronization problems.

Anyway, if you're willing to settle for a comparatively small amount of total storage, you could try your idea on anything capable of suspend-to-RAM, so any sufficiently documented tablet or laptop. You could save the entire ram image to disk or Flash as a sort of backup.

I think the Kindle 4 is suitable. It's quite well-documented, but I can't remember the details. I got the manual for the SoC ages ago, but never got around to doing anything with it. (It's a big manual!) The Kindle 3 is probably decently documented too. They're old Kindles with eink screens; the 3 has a built-in keyboard. The 4 is also known as Kindle NT and Kindle WiFi. Those two can also be flashed from a Windows machine even if they seem to be bricked. I need to hunt down the updater program, (which can flash any image,) I didn't get it when I had a chance.

Qbyte · Post by **Qbyte** » Wed Aug 28, 2019 9:58 pm

iansjack wrote:The header of, for example, an elf file contains a lot more information than just the entry point. What about all the entries for dynamic linking? Load address of the code segment (as opposed to entry point), load address of the data segment, details of the amount of space to reserve for uninitialized variables (and where to reserve it), debug information, etc., etc.

It's not a question of the header "doing stuff" that system calls could do (system calls do that stuff anyway), but the information needed for those system calls to do their job.

Much of that information contained in the ELF header is redundant. For example, the load address of the code segment isn't required if a) the architecture supports relative branching (which any self-respecting one should) or b) all programs are compiled to run at address 0 so that the loader can trivially rebase them. The data segment load address also isn't required thanks to relative addressing. Most programs don't have a need to have separate code and data segments, they can be treated as a single monolithic blob, and this is better for performance too because data can be stored as "literal pools" within the code stream for much faster loading due to better locality.

Uninitialized variables could a) be stored in the heap, b) could be included in the program image (which does take up slightly more space) or c) the program could make a system call to expand its single segment by the amount that is required to store them.

The entry points for dynamic linking can be obtained via system calls. The program would ask the OS "where is this library located" and the OS would return a pointer to its base address. In other words, linking would be performed by the program itself during initialization instead of the OS.

iansjack · Post by **iansjack** » Thu Aug 29, 2019 8:59 am

Qbyte wrote:The entry points for dynamic linking can be obtained via system calls. The program would ask the OS "where is this library located" and the OS would return a pointer to its base address. In other words, linking would be performed by the program itself during initialization instead of the OS.

What library? What function? How does the system call know what it is looking for? The executable code just says "call xxxx".

That is the information that is contained in the headers. Even if you produced some convoluted scheme to embed the information in the function calls it would mean doing a lookup each time the function is called. Now there's efficiency for you - to what purpose when headers listing this information provide a much more elegant solution.

I'm sure that with enough ingenuity you could somehow embed all the information in the raw executable (although all you are doing is hiding the information normally stored in the headers elsewhere), but what is the point? Executable formats such as ELF provide an elegant and efficient means of storing this extra information. They even tell you how they are to be processed, so it is easy for an OS to transparently use different formats.

As for expanding the single segment each time you come across an uninitialized variable - that sort of inefficiency should be dealt with by the compiler/linker, where performance is not of the essence, not in the executable.

Qbyte · Post by **Qbyte** » Thu Aug 29, 2019 11:42 am

iansjack wrote:What library? What function? How does the system call know what it is looking for? The executable code just says "call xxxx".

The library name is a parameter of the system call. The kernel checks to see if a library of that name exists in memory, and if it does, it returns a pointer to its base address. The program can now use offsets from that pointer to call the functions in the library. No look-ups are required each time a function from the library is called, and no code fix-ups are needed at load or run time. Since the ABI of the library is known at compile time, the compiler knows what all the offsets for each of the library function calls should be.

I'm sure that with enough ingenuity you could somehow embed all the information in the raw executable (although all you are doing is hiding the information normally stored in the headers elsewhere), but what is the point?

The main point is simplification. Many programs don't need or want a header at all, so why force that on them? It also makes the interface to programs nicer because they can be passed by reference in the same way as functions, among other benefits.

As for expanding the single segment each time you come across an uninitialized variable - that sort of inefficiency should be dealt with by the compiler/linker, where performance is not of the essence, not in the executable.

The number of uninitialized variables (the so-called bss segment) is known at compile time, so the address space only needs to be expanded once during initialization to create space for them. The compiler would simply insert a single system call at the start of your program to grow it by the space needed.

iansjack · Post by **iansjack** » Thu Aug 29, 2019 12:11 pm

You've explained how the library name (and function name) is used, but not where it comes from. Where in the executable is this information stored? In the code itself - no, that would be crazy. But it has to be stored somewhere; whatever you choose to call it it's just the object file header.

Anyway, enough. I can see that you don't understand my point and I'm finding the forums increasingly frustrating to use because of the spam.

Qbyte · Post by **Qbyte** » Thu Aug 29, 2019 8:50 pm

iansjack wrote:You've explained how the library name (and function name) is used, but not where it comes from. Where in the executable is this information stored? In the code itself - no, that would be crazy. But it has to be stored somewhere; whatever you choose to call it it's just the object file header.

The library name would be stored as a string literal either within the code stream (for performance) or within the data segment. There's nothing crazy about that, it's just a different approach with various trade-offs. Think about programs that are statically linked and therefore entirely self-contained. They have no real need for an ELF-style header at all. ELF headers contain all sorts of redundant information which can be replaced by a minimal number of system calls; the process only has to specify exactly the things it needs to run. It also results in better error handling because the program itself gets to decide what happens if, say, a library is missing, whereas with a header scheme, the program won't even get to run at all if the linker detects a missing library.

Anyway, enough. I can see that you don't understand my point and I'm finding the forums increasingly frustrating to use because of the spam.

Yeah, because these discussions totally constitute "spam"

I know what ELF headers are and the reasons for their existence, it's you who's missing my point about how there are viable alternatives to them that have certain advantages.

iansjack · Post by **iansjack** » Fri Aug 30, 2019 1:13 am

Qbyte wrote:Yeah, because these discussions totally constitute "spam"

There's really no call for rudeness like that.

If you can't get your point across without rudeness then don't bother (as far as I'm concerned).

Solar · Post by **Solar** » Fri Aug 30, 2019 6:43 am

iansjack was referring to the actual spam, which are making the forum a bit of a pain to use at the moment.

If I may add my $0.02 on the "header" debate, the difference is basically this:

what a header (ELF, PE, whatever) does is to provide necessary information (passively) in a format that can easily be parsed by another process, e.g. the dynamic loader, or debugger.
what Qbyte is argueing here is to have the same information provided by the executable, in that the executable (actively) gives the information to the system via a series of system calls.

I tend to be on iansjack's side here.

For one, not every time I want those informations I am willing (or able!) to actually run the code (yet). Debugging springs to mind: If the code in question doesn't run, a passive header would still provide useful information, while relying on active collaboration of the binary breaks. Fat binaries, cross-compilation or multilib environments, or scatterloading are similar, where you can't just "jump to $0 and run with it".

Second, as for claims to efficiency, information from a passive header can be read by e.g. the dynamic loader in one go. Actively passing the information to the system would either require multiple system calls (and thus context switches), or all the information being in one block (i.e., a header, at which point you've come full circle).

eekee · Post by **eekee** » Fri Aug 30, 2019 10:50 am

Going back a bit for this quote:

Qbyte wrote:Another thing I don't like about existing operating systems is that they mandate a program format, such as an ELF/COFF header in the case of *nix or a PE header in the case of windows.

I don't think the Linux kernel actually mandates any format, but rather, the kernel can recognize a variety of different formats. The formats include scripts, Windows executables (launched with Wine), and something to do with Java. Even dynamically-linked ELF executables are recognized and passed to the linker program. It's probably relatively easy to add a new format like, say, some sort of .COM for Linux.

A few posts later, Qbyte wrote:It also results in better error handling because the program itself gets to decide what happens if, say, a library is missing, whereas with a header scheme, the program won't even get to run at all if the linker detects a missing library.

If a library is optional, the program should probably dlopen() it rather than link it conventionally. Of course, many things which should be are often not.

I think iansjack and Solar have converted me to headers, if I ever get around to making a not-so-simple OS. Interesting page, Solar!

BTW, even Plan 9, which tends to simplification to the point of absurdity, has a header on its executable files.

Qbyte · Post by **Qbyte** » Fri Aug 30, 2019 10:44 pm

eekee wrote:I don't think the Linux kernel actually mandates any format, but rather, the kernel can recognize a variety of different formats. The formats include scripts, Windows executables (launched with Wine), and something to do with Java. Even dynamically-linked ELF executables are recognized and passed to the linker program. It's probably relatively easy to add a new format like, say, some sort of .COM for Linux.

My point was that in order for a program to run, it has to have a header of some description, whether that be ELF, COFF, a.out, etc. I have nothing against kernels being able to handle different program formats, I would just like one of those formats it supports to be a raw, headerless program. A file's type should be kept as metadata in the file table so that the OS knows what type of file all files on the system are. Programs in different formats would have different file types/extensions. This allows programs to be in the format that suits them best, which for some, will be headerless.

BTW, even Plan 9, which tends to simplification to the point of absurdity, has a header on its executable files.

I do like a.out because it's about as simple as a header format can get while still doing everything it needs to.

eekee · Post by **eekee** » Sat Aug 31, 2019 12:22 am

Qbyte wrote:A file's type should be kept as metadata in the file table so that the OS knows what type of file all files on the system are. Programs in different formats would have different file types/extensions.

Extensions are the only metadata which don't get removed on download, not to mention bad operating systems (Mac OS <= 9) which make it hard to include metadata when you copy the file. (I ran into a note on that in Python documentation just yesterday.) Then there's the question of what to do when a file gets the wrong metadata. Filename extensions aren't too bad because any OS will include ways to rename files, but again bad OSs make it hard to change the extension. I say "bad", but they have reasons for making it hard to change extensions and other metadata. There are reasons not to copy various pieces of metadata too.

It's better to look into the file's data for its type, and indeed most executable formats include a magic number for that purpose, in the header of course. An OS could then cache the type in the filesystem, but is it worth it? Properly invalidating the cache could get quite complex. If one of your formats is flat binary, properly invalidating the cache could be downright impossible!* That would be nasty for sysadmins and users. (I get caught out enough by bash caching program locations.) Also, reading the metadata may introduce unnecessary overhead, depending on filesystem.

*: I guess I was wrong about ".COM for Linux". It would need some metadata, which .COM doesn't have.

Thinking all this through only strengthens my opinion that keeping type information as filesystem metadata is always more trouble than its worth. Only file extensions have some worth, and there are some issues with them too.

iansjack · Post by **iansjack** » Sat Aug 31, 2019 1:00 am

Another thing I don't like about existing operating systems is that they mandate a program format, such as an ELF/COFF header in the case of *nix or a PE header in the case of windows.

I think it's clear now that modern operating systems don't mandate a program format (Linux certainly doesn't), but default to a particular format.

So, if you don't like that, the answer is simple. Write your own handler for the format of your choice. The fact that no-one has been interested enough to write a handler for a raw binary format - with or without additional metadata - tells us something. But if anyone thinks it's a good idea, just do it. Otherwise it's a bit like complaining that Linux - for example - doesn't have a driver for a particular bit of hardware that you have just invented.

Qbyte · Post by **Qbyte** » Sat Aug 31, 2019 3:33 am

eekee wrote:Extensions are the only metadata which don't get removed on download, not to mention bad operating systems (Mac OS <= 9) which make it hard to include metadata when you copy the file.

This is more of an implementation flaw than an inherent flaw with the idea of file types as metadata. I mean, in the end the difference is basically where this information is stored, either in a header within the file itself, or within a file table entry. As usual, both schemes have their pros and cons.

Then there's the question of what to do when a file gets the wrong metadata. Filename extensions aren't too bad because any OS will include ways to rename files, but again bad OSs make it hard to change the extension. I say "bad", but they have reasons for making it hard to change extensions and other metadata. There are reasons not to copy various pieces of metadata too.

If a file somehow has the wrong metadata such as an incorrect type, it should be as easy to change as a file permission.

It's better to look into the file's data for its type, and indeed most executable formats include a magic number for that purpose, in the header of course. An OS could then cache the type in the filesystem, but is it worth it? Properly invalidating the cache could get quite complex. If one of your formats is flat binary, properly invalidating the cache could be downright impossible!* That would be nasty for sysadmins and users. (I get caught out enough by bash caching program locations.) Also, reading the metadata may introduce unnecessary overhead, depending on filesystem.

I think separating data and metadata is generally a good thing. For example, if an executable is defined as a file that begins with a certain magic number, that means that all other file types have to conform to this standard as well, because otherwise if they happen to start with the same bit pattern as a magic number, then they will be interpreted incorrectly. If the type is stored as metadata in the file system (either as an explicit field or as a filename extension), the file doesn't need to conform to any standards and can be stored as a flat binary. Files with metadata can also be slighty less efficient to work with and require more code to manipulate because of the variable length header that needs to be stripped.

Thinking all this through only strengthens my opinion that keeping type information as filesystem metadata is always more trouble than its worth. Only file extensions have some worth, and there are some issues with them too.

I'm still not convinced it's an inherently bad design. I suppose which scheme is better for your OS comes down to how it ties in with other design issues.

Sik · Post by **Sik** » Sat Aug 31, 2019 10:23 pm

Eh, my opinion on this (and mostly based on how it ends up working in practice) is that the file format (header, etc.) describes what's in the file, while the file extension (or wherever you store the file type) describes the intention. I mean, how many file formats are glorified ZIP archives with the actual data in the files inside it? And pretty much all scripts are plain text. And if something has an unknown file extension (assuming you didn't accidentally change it), that usually means you aren't expected to treat the file like its format normally would imply.

And if the intention and the file format end up mismatching, you probably should throw an error to let the user know that something is fishy :P

OSDev.org

Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS

Re: Describe your dream OS