Single address space OS design: please review my thoughts

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
User avatar
ababo
Member
Member
Posts: 27
Joined: Thu Jun 13, 2013 8:20 am
Location: Ukraine

Single address space OS design: please review my thoughts

Post by ababo »

Hi guys,
From time to time I work on my hobby project: Toy OS (https://github.com/ababo/toy). Now I plan to start a new coding iteration, but before I need you to review of my vision (not to waste time in case if I'm wrong). There are my previous design points:
  • * Single 64-bit address space. This will give ability for applications to call kernel without need to switch context and serialize data. Moreover it will shorten thread context switches and will remove need to clear TLB-cache.

    * Virtual machine for applications. It will disable applications to exceed own memory boundaries (memory protection). Virtual machine is also useful for distributed environments with several CPU architectures.

    * Storage devices are mapped to 64-bit address space. RAM is just caching access to persistent storage via address space. In-use pages are loaded to RAM, and not-in-use ones are saved back to device. This is done transparently to applications. This approach makes files redundant: regularly allocated memory is already persistent, thus there is no need to save its content to files.

    * Persistent applications. They survive both system restart and moving to another machine. This is partially achieved by the previous point as well as by absence of resource descriptors (e.g. no file handles). To use any resource a unique identifier must be used instead (no need to open the resource before the actual usage). Оn the one hand this will expose some additional overhead (but not too drastic because of caching) but on the other hand this will detach such call from place and time constraints (code which is resumed after system restart or moved to another machine will be able to continue running).
Domains

Storage devices are mounted on a single 64-bit address space. Each device takes some contiguous region (let's call it domain). It is possible to mount a storage device on arbitrary address (in case if there is enough unoccupied place).

Image

After mounting domain threads resume running. The cheapest function calls are calls inside a domain. They impose no additional overhead than regular C-functions. Inter-domain calls (including kernel calls) are not so cheap (e.g. need to handle caller/callee domain unmount during the call). Still there is no need to serialize in/out data (single address space).

This will not work

As for me the scheme above is very beautiful. Unfortunately it has a hidden defect which make it impossible. There is no simple way to make domain code position independent (PIC). It's needed to enable mounting on arbitrary address (otherwise there's no guarantee that two given drives could be mounted simultaneously - their addresses could overlap).

Yes, there are useful techniques and tricks which are widely used in shared libraries. But there don't work for living code (code which is unloaded and reloaded again). Think about allocating memory. You call malloc and save the returned memory address to some variable. Now the domain gets unmounted and remounted on a different address. Here the code operates absolute addresses so the techniques based on instruction pointer offsets are not working.

The x86_64 could help us here, because of its segmentation capability. Indeed we could dynamically create memory segment for domain we're going to mount. All domain code addresses are absolute, starting from its beginning. They are implicitly incremented by the segment address inside CPU. But:
  • * Inter-domain calls must be performed using far pointers (segment + offset). Intra-domain calls should use regular near pointers (far pointers are expensive and bloat binary code). Two different types of pointers is not a good idea.
    * Far pointers are not part of the C/C++ standards and not supported by the major compilers (e.g. gcc or clang).
    * OS design based segmentation is not portable to CPU architectures other than x86.
Conclusion

The multiple address spaces scheme is unavoidable. Persistence mechanism must be built on top of it. Each domain should occupy its own address space. Am I right?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Single address space OS design: please review my thought

Post by Brendan »

Hi,
ababo wrote:Single 64-bit address space. This will give ability for applications to call kernel without need to switch context and serialize data. Moreover it will shorten thread context switches and will remove need to clear TLB-cache.
This won't avoid switching from one context (e.g. the context of an application) to another context (e.g. kernel context). You may or may not reduce the overhead of these context switches depending on how they're done (e.g. no privilege level change) but that has nothing to do with "single 64-bit address space".

Clearing the TLB itself costs almost nothing (it's the TLB misses that clearing the TLB causes that cost something, not clearing it). The TLB miss costs are also not the real problem - the real problem is changing from one working set to another. E.g. the TLBs, instruction caches, data caches, branch prediction buffers, return buffer, etc. are all full of stuff for the previous thread and you switch to a different thread causing TLB misses, instruction cache misses, data cache misses, branch mispredictions, etc. Using a single address space (where you don't change virtual address spaces during thread switches) doesn't avoid any of this; and there's very little difference between "TLB is empty" and "TLB is full of stuff for previous thread that's useless for the current thread".
ababo wrote:Virtual machine for applications. It will disable applications to exceed own memory boundaries (memory protection). Virtual machine is also useful for distributed environments with several CPU architectures.
Running applications inside a virtual machine will add overhead. It also effectively means that applications are running inside their own virtual address space (it's just that the virtual address space is created by slower software/virtual machine and not faster hardware).
ababo wrote:Storage devices are mapped to 64-bit address space. RAM is just caching access to persistent storage via address space. In-use pages are loaded to RAM, and not-in-use ones are saved back to device. This is done transparently to applications. This approach makes files redundant: regularly allocated memory is already persistent, thus there is no need to save its content to files.
For 80x86, the address space is 48-bit (256 TiB). This is probably too small for some servers already; and by the time you "finish" the OS it will probably be worse.

This does not make files redundant. If I download a large picture from the internet then modify it and save the modified version, then open one of those pictures in an image viewer, how does the user tell the image viewer which one they want? You must have some sort of human readable identifier (a file name) that is used to find the requested data (the file).
ababo wrote:Persistent applications. They survive both system restart and moving to another machine. This is partially achieved by the previous point as well as by absence of resource descriptors (e.g. no file handles). To use any resource a unique identifier must be used instead (no need to open the resource before the actual usage). Оn the one hand this will expose some additional overhead (but not too drastic because of caching) but on the other hand this will detach such call from place and time constraints (code which is resumed after system restart or moved to another machine will be able to continue running).
Do you honestly think there's any difference between "unique identifier" and "file handle"?

If there's a power failure or hard reset; do all applications and all disk contents end up corrupted (inconsistent state due to some things being stored on disk while others aren't, at the time of the power loss/reset)?
ababo wrote:The x86_64 could help us here, because of its segmentation capability. Indeed we could dynamically create memory segment for domain we're going to mount. All domain code addresses are absolute, starting from its beginning. They are implicitly incremented by the segment address inside CPU. But:
For 64-bit code there is no segmentation (it sucked extremely badly due to unavoidable protection check overheads, so nobody used it, so AMD removed it for 64-bit code to improve performance).

In long mode, you can have 32-bit code that uses segments. In this case, all segments have 32-bit base addresses and 32-bit limits. This means that all of your segments will be in the first 4 GiB of the virtual address space (including all of your memory mapped hard disks).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: Single address space OS design: please review my thought

Post by gravaera »

Yo:

Seems like a very interesting design, but the biggest issue is that you have several conceptual misnomers or mix-ups here. This has the unfortunate side-effect of making most of your reasons for choosing particular design patterns "incorrect". Not that they are incorrect on their own -- what I'm saying is that they're incorrect for accomplishing the goals you want them to accomplish.
  • Having a single addrspace doesn't eliminate the need for context switches in order to call the kernel. There are two things people refer to when talking about "context switches":
    • Thread context switching: saving the register context of the current thread to a thread control block (TCB), and then switching to another thread by loading that new thread's register context. The act of thread context switching may or may not trigger the second type of "context switch", which is:
    • Address space context switching: Loading a new linear address space. If you are switching between threads in the same process, they will share the same addrspace, so there is no need to load a new addrspace. If you are switching to a thread from a different process, then you'll need to load the new process' address space.
    Take note, neither of these has to do with syscalling into the kernel. The kernel lives in the upper half of each process' addrspace for the very reason that this already eliminates the need to context-switch to a separate address space to communicate with the kernel. You need only syscall, and cross the privilege barrier from userspace to kernelspace, and that is all.

    In other words: crossing the privilege barrier to call the kernel is not the same as switching address spaces to call the kernel.
  • Using a managed language doesn't in and of itself reduce an application's memory footprint. If an application needs a large amount of memory, and you use a single address space with some form of virtual machine setup, you aren't going to implicitly somehow mitigate that application's need for memory. If anything, using multiple address spaces will help you deal with large-memory applications more gracefully, because you can give them their own linearized address space, and use tricks to fool them into thinking they have more memory than the computer actually has.
  • "Persistent Applications": This is an interesting idea, but it has a lot of shortcomings -- I'm not discouraging you, and I'm sure you can improve the idea with time and further iterations over the design, but as it is now, it is actually harmful.

    Most programs that have critical data that needs to be stored to disk with urgency will /already/ be making use of api calls like POSIX fsync() to ensure that critical data is stored to disk. If an application doesn't consider data to be critical, then if the kernel wastes time with some complicated arrangement whereby RAM is treated as being a writeback buffer for disks, the kernel will spend time flushing RAM to disk in excess of what is needed.

    This has the negative effect of thrashing the disk unnecessarily, wearing out sectors, wasting time, and of course redundancy as applications actually do not /want/ a lot of their data to be ever written to disk at all. In fact there are many applications which are security related that should not have their data written to disk unless they ask for it.

    Furthermore, this idea doesn't inherently provide failsafe or automatically resumable applications. For achieving your stated design goals, this idea isn't a solution.
I couldn't read the rest because it looked a bit unrefined, and I couldn't bring my brain to pay attention. I hope I was able to give you another point of view, and maybe make your future iterations more viable.

--Hope this helped,
gravaera
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
User avatar
ababo
Member
Member
Posts: 27
Joined: Thu Jun 13, 2013 8:20 am
Location: Ukraine

Re: Single address space OS design: please review my thought

Post by ababo »

Thank you guys for your replies.

I agree with most of your points. There are couple of remarks.

For Brendan:
This does not make files redundant. If I download a large picture from the internet then modify it and save the modified version, then open one of those pictures in an image viewer, how does the user tell the image viewer which one they want? You must have some sort of human readable identifier (a file name) that is used to find the requested data (the file).
I guess you're mixing the notions (storage organization and naming mechanism). Of course we need some system of human readable identifiers to give user ability to choose data he wants. But there is no need to serialize data to make it persistent.
Do you honestly think there's any difference between "unique identifier" and "file handle"?
If you want to read file you need to "open" it. You pass unique identifier (filename) and retrieve a descriptor (in Windows it called "handle"). If your application was suspended and resumed after OS restart the descriptor you retrieved is not relevant already. Without reopening the file a next reading request will definitely fail. But in case if "open" call takes a filename instead of descriptor we could continue reading file after system restart (in this particular case we should add an additional file offset parameter - the call must be stateless).
If there's a power failure or hard reset; do all applications and all disk contents end up corrupted (inconsistent state due to some things being stored on disk while others aren't, at the time of the power loss/reset)?
I believe some transactional support must be provided. Pages cannot be saved individually. I mean system should make periodical consistent snapshots of whole address space.
For 64-bit code there is no segmentation (it sucked extremely badly due to unavoidable protection check overheads, so nobody used it, so AMD removed it for 64-bit code to improve performance).
I missed that point, thanks for your reminder. This means that there's no way to achieve true relocation of persistent code. This is an additional argument against single address space.

To gravaera:
In other words: crossing the privilege barrier to call the kernel is not the same as switching address spaces to call the kernel.
Thanks for the clarification. Didn't realize this difference clearly.
Using a managed language doesn't in and of itself reduce an application's memory footprint.
I didn't write that.
Last edited by ababo on Mon Apr 14, 2014 12:11 am, edited 1 time in total.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Single address space OS design: please review my thought

Post by bluemoon »

ababo wrote:
If there's a power failure or hard reset; do all applications and all disk contents end up corrupted (inconsistent state due to some things being stored on disk while others aren't, at the time of the power loss/reset)?
I believe some transactional support must be provided. Pages cannot be saved individually. I mean system should make periodical consistent snapshots of whole address space.
There are lots of work to save the machine state (which to some extend, may include external hardware state). The more comfort way (to both OSdev and app programmers) is to use "checkpoint" approach so that application may use such information to simulate a "resume".
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Single address space OS design: please review my thought

Post by Brendan »

Hi,
ababo wrote:
This does not make files redundant. If I download a large picture from the internet then modify it and save the modified version, then open one of those pictures in an image viewer, how does the user tell the image viewer which one they want? You must have some sort of human readable identifier (a file name) that is used to find the requested data (the file).
I guess you're mixing the notions (storage organization and naming mechanism). Of course we need some system of human readable identifiers to give user ability to choose data he wants. But there is no need to serialize data to make it persistent.
You don't need to serialize data used for file IO on any other OS either. However, you do need to serialize data that can be used by different computers - e.g. with different endian-ness or differences caused by languages ("sizeof(int)", structure padding, etc). This applies to all IO (including files and networking) and applies to your OS the same as it does all other OSs.
ababo wrote:
Do you honestly think there's any difference between "unique identifier" and "file handle"?
If you want to read file you need to "open" it. You pass unique identifier (filename) and retrieve a descriptor (in Windows it called "handle"). If your application was suspended and resumed after OS restart the descriptor you retrieved is not relevant already. Without reopening the file a next reading request will definitely fail. But in case if "open" call takes a filename instead of descriptor we could continue reading file after system restart (in this particular case we should add an additional file offset parameter - the call must be stateless).
So, you're using fully qualified file names as file handles?

How does this work with removable file systems (USB flash, CD-ROM, NFS, etc)? How does it handle device failure (e.g. if a SATA drive explodes, can you install a new/empty SATA drive and restore the files from a backup)? How about partial failures (e.g. a few bad sectors that can't be read)?
ababo wrote:
If there's a power failure or hard reset; do all applications and all disk contents end up corrupted (inconsistent state due to some things being stored on disk while others aren't, at the time of the power loss/reset)?
I believe some transactional support must be provided. Pages cannot be saved individually. I mean system should make periodical consistent snapshots of whole address space.
You mean, applications will have defined "synchronisation points" where they ask the OS to save their state (such that the applications can be restored to the previously saved state)?
ababo wrote:
For 64-bit code there is no segmentation (it sucked extremely badly due to unavoidable protection check overheads, so nobody used it, so AMD removed it for 64-bit code to improve performance).
I missed that point, thanks for your reminder. This means that there's no way to achieve true relocation of persistent code. This is an additional argument against single address space.
Note that it's still possible for your virtual machine to emulate a "segmentation like" thing - it's just slower (and not impossible).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
ababo
Member
Member
Posts: 27
Joined: Thu Jun 13, 2013 8:20 am
Location: Ukraine

Re: Single address space OS design: please review my thought

Post by ababo »

You don't need to serialize data used for file IO on any other OS either. However, you do need to serialize data that can be used by different computers - e.g. with different endian-ness or differences caused by languages ("sizeof(int)", structure padding, etc). This applies to all IO (including files and networking) and applies to your OS the same as it does all other OSs.
Maybe we differently define the notion of "serialization". All I meant by using it is a process of converting arbitrary data structures into a flat byte sequence. For example you cannot save a binary tree in file directly. You need to recursively traverse it in order to convert its data into a byte sequence to be written in file. But in case of a persistent address space there is no need for the such traversal - the tree will be loaded after restart as a part of the address space.
So, you're using fully qualified file names as file handles?

How does this work with removable file systems (USB flash, CD-ROM, NFS, etc)? How does it handle device failure (e.g. if a SATA drive explodes, can you install a new/empty SATA drive and restore the files from a backup)? How about partial failures (e.g. a few bad sectors that can't be read)?
No, I have almost nothing yet: from the previous iterations I have just a single address space, simple SMP-scheduler with some stuff around it (mutex, sleep, etc.). So I have nothing to reply to you here. But I was talking about the idea not to be bound to temporal resource handles but to permanent unique identifiers.
You mean, applications will have defined "synchronisation points" where they ask the OS to save their state (such that the applications can be restored to the previously saved state)?
No, I meant this should be totally transparent to applications. Each period of time system adds a new incremental snapshot of the address space. It must be consistent (I guess it's not easy to achieve that).
TylerH
Member
Member
Posts: 285
Joined: Tue Apr 13, 2010 8:00 pm
Contact:

Re: Single address space OS design: please review my thought

Post by TylerH »

I've thought of something similar to this before and did some research on the current state of knowledge about it. Basically, if you want a single address space, you want everything to be provably safe. And preferably, provably safe at compile time. You might be interested in this, it looks at using LLVM to ensure safeness at compile time and then using LLVM to translate from the VM code to native code (which could be done AOT): http://llvm.org/devmtg/2008-08/Criswell_SVA.pdf. Of course, this means limiting the languages you can use on the OS. Even with big advancements, most "safe" C would be hard to impossible to prove safe at compile time.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Single address space OS design: please review my thought

Post by Brendan »

Hi,
ababo wrote:
You don't need to serialize data used for file IO on any other OS either. However, you do need to serialize data that can be used by different computers - e.g. with different endian-ness or differences caused by languages ("sizeof(int)", structure padding, etc). This applies to all IO (including files and networking) and applies to your OS the same as it does all other OSs.
Maybe we differently define the notion of "serialization". All I meant by using it is a process of converting arbitrary data structures into a flat byte sequence. For example you cannot save a binary tree in file directly. You need to recursively traverse it in order to convert its data into a byte sequence to be written in file. But in case of a persistent address space there is no need for the such traversal - the tree will be loaded after restart as a part of the address space.
No, we're using the same definition of "serialization". Data in a process' virtual address space doesn't need to be serialised (on any OS including yours), and data that can be accessed by other computers (e.g. using file IO or networking) does need to be serialised (on any OS including yours).

The main difference is that you're trying to do "single-level store", which avoids (some/most?) file IO.

ababo wrote:
So, you're using fully qualified file names as file handles?
How does this work with removable file systems (USB flash, CD-ROM, NFS, etc)? How does it handle device failure (e.g. if a SATA drive explodes, can you install a new/empty SATA drive and restore the files from a backup)? How about partial failures (e.g. a few bad sectors that can't be read)?
No, I have almost nothing yet: from the previous iterations I have just a single address space, simple SMP-scheduler with some stuff around it (mutex, sleep, etc.). So I have nothing to reply to you here. But I was talking about the idea not to be bound to temporal resource handles but to permanent unique identifiers.
In theory, how do you think your OS design might handle removable file systems, complete device failures and partial failures?
ababo wrote:But I was talking about the idea not to be bound to temporal resource handles but to permanent unique identifiers.
So, you're using fully qualified file names (as permanent unique identifiers) instead of file handles (temporal resource handles)?
ababo wrote:
You mean, applications will have defined "synchronisation points" where they ask the OS to save their state (such that the applications can be restored to the previously saved state)?
No, I meant this should be totally transparent to applications. Each period of time system adds a new incremental snapshot of the address space. It must be consistent (I guess it's not easy to achieve that).
In this case the total amount of state that needs to be on disk includes:
  • Each active process' virtual address space
  • Each inactive process' virtual address space
  • Each thread's state (registers, etc)
  • Any buffers used for communication between threads
To get a consistent snapshot you will need to stop the world (otherwise things are changing while you're trying to save to disk). If (due to disk bandwidth limitations) it takes 500 ms to do an incremental snapshot every 10 seconds then the entire computer freezes for 500 ms every 10 seconds. Note that people complain about Java's garbage collection causing small pauses but these pauses don't involve disk IO and aren't for the entire OS. How are you planning to minimise this problem (and make the OS usable)?

Also; what happens when a power failure occurs while the OS is writing a snapshot to disk; and (when a computer is turned on) will the user have to wait for the OS to piece together hundreds of small incremental changes to restore the previous state?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
embryo

Re: Single address space OS design: please review my thought

Post by embryo »

ababo wrote:* Persistent applications. They survive both system restart and moving to another machine.
Please, can you explain the benefits of this feature? Why it is so important to have an application moving among machines? And why the ordinary PC's sleep mode doesn't suit you? And how the full blown copy of an application can be implemented? Here the copy means everything the application needs, including application's code, application's data, the libraries, used by application, the special version of OS, required by the application, the hardware, which is used by the application - the app have just opened the CD-ROM and expect the user to insert the disk, what it should expect on some another machine?
User avatar
bwat
Member
Member
Posts: 359
Joined: Fri Jul 03, 2009 6:21 am

Re: Single address space OS design: please review my thought

Post by bwat »

ababo wrote:
You mean, applications will have defined "synchronisation points" where they ask the OS to save their state (such that the applications can be restored to the previously saved state)?
No, I meant this should be totally transparent to applications. Each period of time system adds a new incremental snapshot of the address space. It must be consistent (I guess it's not easy to achieve that).
It's not really necessary in surprisingly many cases. I've worked on telecoms systems where applications would register data changes with a central database so that if the machine running the application went down - planned or not, another machine could start the application and continue where it left off. Once you see your applications as non-terminating state machines, this type of design is quite easy to achieve as the registering happens at quite natural places.

Now, doing this transparently would be very difficult. How does the system know that the application crashed mid-computation and, say, only two of three vital global variables were updated and a restart in this state would lead to an erroneous state? I think we're back to the halting problem. The application will have to signal the OS as the OS can't mind-read.
Every universe of discourse has its logical structure --- S. K. Langer.
User avatar
ababo
Member
Member
Posts: 27
Joined: Thu Jun 13, 2013 8:20 am
Location: Ukraine

Re: Single address space OS design: please review my thought

Post by ababo »

Now I think that multiple address spaces are unavoidable. My current vision is the following:

1. Self-made L4-like microkernel in kernel mode.
2. Each domain (namespace) supports number of ports to send synchronous IPC messages. It manifests open ports + some metadata by some predefined mapped page or IPC-call (no details yet, just a raw idea).
3. Thus each domain supports its own IPC interface (there could be an intra-domain additional decomposition to some subunits, but this is the middleware level, e.g. virtual machine, not the basic OS level).

To Brendan:
In theory, how do you think your OS design might handle removable file systems, complete device failures and partial failures?
The existent file systems must be treated by three driver domains:
1. Device type driver (e.g. SATA or SCSI). This domain knows how read and write byte sequences with a given offset and size.
2. Snapshot database ("meta-filesystem") driver which helps microkernel to establish drive->address_space mapping to the domain below.
2. File system driver (e.g. ext3 or FAT32) which treats part of its address space as file system structure. It serves other domains by some well defined IPC-interface. There is more conventional (and probably more efficient) alternative: to work directly with device type driver (without snapshot database driver). In that case we miss snapshots capability.

The new scheme (no files) needs only the first two driver domains. You're asking about handling hardware failures. If we have a support of periodical consistent snapshots all we need is to perform a rollback to the previous correct snapshot (actually this should be done automatically). Such rollback support implies that part of a drive should hold some metadata including snapshot diffs. It should be a highly configurable "meta-filesystem".
So, you're using fully qualified file names (as permanent unique identifiers) instead of file handles (temporal resource handles)?
Yes, I think, it would be good.
To get a consistent snapshot you will need to stop the world (otherwise things are changing while you're trying to save to disk).
I think domains (namespaces) should be treated separately. This means to make a consistent incremental snapshot we should stop the entire domain (not the whole system).
Also; what happens when a power failure occurs while the OS is writing a snapshot to disk
This is the "meta-filesystem" job to do it atomically (as transactions). This is well known stuff which is used in filesystems and databases.

To embryo:
Why it is so important to have an application moving among machines?
Let's imagine some long-term calculation which we want to accelerate by moving to another machine. Or a stateful distributed queuing system with automatic load balancing. There are many use cases.
embryo

Re: Single address space OS design: please review my thought

Post by embryo »

ababo wrote:
Why it is so important to have an application moving among machines?
Let's imagine some long-term calculation which we want to accelerate by moving to another machine.
There should be some calculation state saved with your persistent application. Such state will impose a constraint of a calculation continuation. Then until all computers finish the same calculation there will be no speed increase and all the computers will spend processor time in a very inefficient manner (calculating the same thing).

But if we design an application without persistence in mind, then we can just start new process and feed it with new data - it's just ordinary situation for the PC world. So, the persistence here is something not very efficient.
ababo wrote:Or a stateful distributed queuing system with automatic load balancing. There are many use cases.
And again we have the issue of state. Existing queue solutions just use ordinary application processes and work very well.

In general - the persistence is not a pervasive magic, it should have some useful area to work in. Modern software uses persistence in different manner for a good reason. And if users need persistent state the software just separates the state and stores it in some arbitrary manner, dependent on the user goal. Saving the state permanently without user request or even without need is just wasting of computer resources. And, of course, it wastes programmer's efforts, because there are a lot of state coherency problems that should be solved. And I still can't see the area where the efforts spent will bring us some profit. But may be I miss some interesting way of the persistence usage.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Single address space OS design: please review my thought

Post by Rusky »

There is plenty of established research into why and how you would do those things. Maybe you should check it out sometime.
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: Single address space OS design: please review my thought

Post by AndrewAPrice »

The main problem with removing hardware isolation of processes (e.g. a single address space) is that you generally don't want people to run 'unsafe' code on your system (code that can scan memory arbitrarily and mess with other running code.)

On single-tasking systems (DOS-based PCs, older video game consoles) you generally don't care about 'unsafe' code - because:
a) there's nothing security critical in memory, for example:
---- On a Gameboy, save games are stored on the game's cartridge - if the code tries to be malicious and it ruins it's own cartridge - that's its own deal, your other cartridges will still be fine.
----In the DOS era, you were encouraged to store most of your important stuff on external media and make lots of backups. If you had the floppy disk containing your financial records in while you were playing a game, and the game overwrote everything, it would be your own fault.

b) If the program messes up your machine you can always reboot, because there's no other processes running in the background you care about.
--- Early consoles didn't have inbuilt writable media. There was nothing malicious software could mess up.

But today, our we tend to care about:
a) Security - we store our personal information and our operating system on our hard drive. If it messes up, we loose our personal files deleted, compromised and shared with others, or waste a day re installing and reconfigure our OS.
b) Isolation - modern operating systems have many services running, take forever to reboot, we multitask, have documents open in the background. If our memory state is messed up, we have to reboot (which takes minutes) and potentially loose unsaved data.

For a modern system, allowing the user to write arbitrarily to memory and hardware is generally undesirable, so we tend to use hardware isolation (giving each process their own address space, using ring protection, system calls.)

When we eliminate hardware isolation, it's no longer a good idea to allow the user to execute arbitrary machine code - as there is absolutely no restriction on what the potential code may do to the system.

So, that leaves us with so-called "safe languages" - languages that don't allow access to random addresses in memory which a process may not own, don't allow random access to hardware, etc. To prevent the execution of arbitrary "unsafe" machine code, we need to distribute programs in either source or bytecode form, and rely on the kernel to either interpret or JIT the program.

Some high-level language JITs have really smart people working on them and are able to accomplish really amazing performance (LuaJIT, Google V8, Mozilla SpiderMonkey) - sans the overhead of context switching and system calls.

The problem I face is that there are a lot of C libraries out there - for handling file formats, network protocols, math routines, GUI libraries. There are also full C programs out there - command line tools, office suites, web browsers.

With a high level language, you don't have access to any of that (well, kind of - Emscripten has successfully compiled fully fledged C++ desktop applications to a subset of Javascript known as asm.js), so you're left to start from scratch and figure it all out on your own.

But it will be fun and earn you a lot of respect - if that's what you want to do!
My OS is Perception.
Post Reply