... And UNIX For All

SoLDMG · Post by **SoLDMG** » Sun Jul 27, 2014 7:50 am

iansjack wrote:I think it will be much easier to do this a a single-person project so that you don't have to worry about people not agreeing with your concept of how things should be.

I meant not having to write everything myself, even though it gives an empowering feeling.

iansjack wrote: Once you have proper specifications, a workable plan of how to implement them, and some well-written code you may be able to pursuade people that your ideas are so good that they could profitably work with you.

I'm working on that as I type this, alongside an assembler and a C compiler (assembler first though, that's the building block of it all).

SoLDMG · Post by **SoLDMG** » Sun Jul 27, 2014 10:36 am

[quote=linguofreak] (Everything before this as well) If a program needs to call a library that has private data (lets say that the library is a device driver, and the private data has to do with multiplexing access to the device), it makes a far call to the library's code segment. This causes the library's LDT to be loaded as the code LDT. The segment containing the library's private data is mapped in the library's LDT, allowing the library to load that segment to access its private data. The library does its work, then invalidates the segment register that it used to access its private data segment and makes a far return to the original program's code segment. This causes the program's LDT to be loaded. The program's LDT does not have the library's private data segment mapped, which prevent the program from accessing the library's private data.[/quote]
That's seriously very interesting. I was originally going to implement segmentation+paging, but after reading this I'm seriously considering just going with segmentation. The parts about libraries are also very intriging. I'm thinking of having a "main" library which implements things like forking (which it will use a system call for) and stuff like fopen() which it uses IPC for to communicate with the VFS server, which then loads it into memory.

iansjack · Post by **iansjack** » Sun Jul 27, 2014 12:05 pm

I'm seriously considering just going with segmentation.

So it's going to be a 32-bit OS? Hmm.

SoLDMG · Post by **SoLDMG** » Sun Jul 27, 2014 12:21 pm

iansjack wrote:
I'm seriously considering just going with segmentation.
So it's going to be a 32-bit OS? Hmm.

Maybe later I'll implement paging and create 64-bit build. It's probably better to create a functional and stable OS and then later start porting it to other architectures and make the jump to long mode.

iansjack · Post by **iansjack** » Sun Jul 27, 2014 1:10 pm

I'd target a modern processor from the start. And I believe that the use of paging would make the OS more stable rather than less so. 32-bit OSs have had their day, IMO. The x86_64 is far more fun to work with than the x86.

SoLDMG · Post by **SoLDMG** » Sun Jul 27, 2014 1:38 pm

iansjack wrote:I'd target a modern processor from the start. And I believe that the use of paging would make the OS more stable rather than less so. 32-bit OSs have had their day, IMO. The x86_64 is far more fun to work with than the x86.

Should I? I don't like the idea of only having one fixed mode, in this case long mode. I personally have an old Pentium I I'd like to install it on when it's more or less done and use it to test it on real hardware and not possibly ruin my Windows machine.

iansjack · Post by **iansjack** » Sun Jul 27, 2014 1:43 pm

Of course it's your OS, so your choice. I'm just saying that, personally, I'm not very interested in old technology so it's another reason why your project wouldn't interest me. I'd like to think that a modern replacement of Unix would use modern hardware.

But I know that some people are quite happily working on 16-bit OSs; again, that wouldn't interest me in the least. Horses for courses I guess.

linguofreak · Post by **linguofreak** » Sun Jul 27, 2014 3:04 pm

SoLDMG wrote:
linguofreak wrote: (Everything before this as well) If a program needs to call a library that has private data (lets say that the library is a device driver, and the private data has to do with multiplexing access to the device), it makes a far call to the library's code segment. This causes the library's LDT to be loaded as the code LDT. The segment containing the library's private data is mapped in the library's LDT, allowing the library to load that segment to access its private data. The library does its work, then invalidates the segment register that it used to access its private data segment and makes a far return to the original program's code segment. This causes the program's LDT to be loaded. The program's LDT does not have the library's private data segment mapped, which prevent the program from accessing the library's private data.
That's seriously very interesting. I was originally going to implement segmentation+paging, but after reading this I'm seriously considering just going with segmentation. The parts about libraries are also very intriging. I'm thinking of having a "main" library which implements things like forking (which it will use a system call for) and stuff like fopen() which it uses IPC for to communicate with the VFS server, which then loads it into memory.

Working on a modern x86 processor, I'd avoid doing more with segmentation than the bare minimum needed to write an OS for an x86 processor (just as NT-kernel Windows and Linux do). And even if I did make my OS heavily dependent on segmentation, I'd still use paging. Paging is essential for any modern OS. Keep in mind that the processor design I suggested in my previous post, while very segmentation heavy, implements segmentation entirely in terms of paging (whereas x86 segmentation involves adding a base to every address in a segment with a non-zero base, which slows down memory access by forcing page table lookups to wait for that addition to be complete).

Intel doesn't have a thoroughgoing enough segmentation design for a microkernel to be viable, and no other modern processor I'm aware of has segmentation at all (except, as I understand it, IBM's z/Architecture, though it isn't called "segmentation" there). The design I suggested in my previous post *might* be sufficient for running a microkernel well (it's certainly closer than anything that I know to exist), but has the disadvantage that it doesn't yet exist (and it's unlikely that it ever will). As a result, if you're working on any existing processor, it's best to go for a monolithic kernel with paging. And even should my dream-microkernel-architecture ever become a reality, anyone wanting to write an OS that would be portable to other processors would have to write a monolithic kernel (though if they designed it well, they might be able to make it so that it could run as a monolithic kernel on traditional architectures and a microkernel on my architecture).

Rusky · Post by **Rusky** » Sun Jul 27, 2014 4:21 pm

One interesting processor here is the Mill, which has only a single virtual address space (with paging, because it's useful for more than protection) and separates protection domains with a Protection Lookaside Buffer that lets software assign permissions to arbitrary regions. They have a mechanism called a portal call, similar to old Intel call gates that is actually closer to shared library function calls in implementation and performance, that enables IPC with no kernel intervention, no process space switching, and minimal cache trashing.

The relevant bits are in these talks:
http://millcomputing.com/docs/memory/
http://millcomputing.com/docs/security/

Brendan · Post by **Brendan** » Mon Jul 28, 2014 11:22 am

Hi,

linguofreak wrote:Intel doesn't have a thoroughgoing enough segmentation design for a microkernel to be viable, and no other modern processor I'm aware of has segmentation at all (except, as I understand it, IBM's z/Architecture, though it isn't called "segmentation" there). The design I suggested in my previous post *might* be sufficient for running a microkernel well (it's certainly closer than anything that I know to exist), but has the disadvantage that it doesn't yet exist (and it's unlikely that it ever will). As a result, if you're working on any existing processor, it's best to go for a monolithic kernel with paging. And even should my dream-microkernel-architecture ever become a reality, anyone wanting to write an OS that would be portable to other processors would have to write a monolithic kernel (though if they designed it well, they might be able to make it so that it could run as a monolithic kernel on traditional architectures and a microkernel on my architecture).

Note that the main problem for both monolithic kernels and micro-kernels is the cost of changing between different "working sets". For example, you might have some sound related code that fills the CPU's caches and TLBs with data it needs, then switch to file system related code that fills the CPU's caches and TLBs with data it needs; and this switching between working sets causes cache efficiency problems (cache misses and/or TLB misses). It doesn't matter too much if the code and data are in different virtual address spaces running under a micro-kernel or if they're just in different areas of the same virtual address space running under a monolithic kernel.

There are 2 ways to minimise the overhead of "working set switches" - reduce the number of working set switches and/or reduce the cost of each individual working set switch.

Reducing the number of working set switches mostly means using asynchronous techniques that allow working set switches to be postponed (e.g. rather than doing 10 working set switches, postpone them and do one instead). This is easier to do (in a clean/abstracted way) for micro-kernels. Sadly, most micro-kernels and monolithic kernels fail to do it at all.

Reducing the cost of each individual working set switch is a little easier for a monolithic kernel. For a well written micro-kernel running on a modern 80x86 CPU (e.g. something that supports Intel's "Process Context Identifiers") the difference should be minor. The cost of isolation can never be zero (for any OS design and any CPU design); however, isolation has many benefits (stability, security, debugging, maintainability, etc) and it can be foolish to sacrifice all of these benefits for the sake of (potentially minor) performance differences.

I can't see anything wrong with 80x86 and paging for a micro-kernel (especially a modern 80x86 CPU that supports Intel's "Process Context Identifiers"). I don't see any valid reason to wish for something that doesn't exist; regardless of whether it's a theoretical CPU that you've made up or a "fantasy" CPU that hasn't died yet because it hasn't been released yet (e.g. Mill).

Cheers,

Brendan

SoLDMG · Post by **SoLDMG** » Mon Jul 28, 2014 12:54 pm

Rusky wrote:One interesting processor here is the Mill, which has only a single virtual address space (with paging, because it's useful for more than protection) and separates protection domains with a Protection Lookaside Buffer that lets software assign permissions to arbitrary regions. They have a mechanism called a portal call, similar to old Intel call gates that is actually closer to shared library function calls in implementation and performance, that enables IPC with no kernel intervention, no process space switching, and minimal cache trashing.

The relevant bits are in these talks:
http://millcomputing.com/docs/memory/
http://millcomputing.com/docs/security/

I've read a lot about the Mill architecture. If they're not bluffing (which I suppose they arent, they're a real company) I'm definitely buying a Mill machine and porting my OS to it and using that for dedicated computing and keep my current Windows machine for gaming (I couldn't get GRUB to work with Linux, MINIX, OpenSolaris or BSD, so yeah).

SoLDMG · Post by **SoLDMG** » Mon Jul 28, 2014 1:02 pm

linguofreak wrote:
SoLDMG wrote:
linguofreak wrote: (Everything before this as well) If a program needs to call a library that has private data (lets say that the library is a device driver, and the private data has to do with multiplexing access to the device), it makes a far call to the library's code segment. This causes the library's LDT to be loaded as the code LDT. The segment containing the library's private data is mapped in the library's LDT, allowing the library to load that segment to access its private data. The library does its work, then invalidates the segment register that it used to access its private data segment and makes a far return to the original program's code segment. This causes the program's LDT to be loaded. The program's LDT does not have the library's private data segment mapped, which prevent the program from accessing the library's private data.
That's seriously very interesting. I was originally going to implement segmentation+paging, but after reading this I'm seriously considering just going with segmentation. The parts about libraries are also very intriging. I'm thinking of having a "main" library which implements things like forking (which it will use a system call for) and stuff like fopen() which it uses IPC for to communicate with the VFS server, which then loads it into memory.
Working on a modern x86 processor, I'd avoid doing more with segmentation than the bare minimum needed to write an OS for an x86 processor (just as NT-kernel Windows and Linux do). And even if I did make my OS heavily dependent on segmentation, I'd still use paging. Paging is essential for any modern OS. Keep in mind that the processor design I suggested in my previous post, while very segmentation heavy, implements segmentation entirely in terms of paging (whereas x86 segmentation involves adding a base to every address in a segment with a non-zero base, which slows down memory access by forcing page table lookups to wait for that addition to be complete).

Intel doesn't have a thoroughgoing enough segmentation design for a microkernel to be viable, and no other modern processor I'm aware of has segmentation at all (except, as I understand it, IBM's z/Architecture, though it isn't called "segmentation" there). The design I suggested in my previous post *might* be sufficient for running a microkernel well (it's certainly closer than anything that I know to exist), but has the disadvantage that it doesn't yet exist (and it's unlikely that it ever will). As a result, if you're working on any existing processor, it's best to go for a monolithic kernel with paging. And even should my dream-microkernel-architecture ever become a reality, anyone wanting to write an OS that would be portable to other processors would have to write a monolithic kernel (though if they designed it well, they might be able to make it so that it could run as a monolithic kernel on traditional architectures and a microkernel on my architecture).

I'm confused now. I'll for now just focus on the toolchain I guess and later on think out the internals.

Rusky · Post by **Rusky** » Tue Jul 29, 2014 5:34 pm

Just adding PCIDs to x86 doesn't really do that great a job at reducing context switching costs. It reduces TLB flushing, but that's not the whole story:

A multi-address space design with physically-addressed caches has to access the TLB in the critical path for every memory reference at every level of the cache hierarchy, forcing it to be smaller and more power-hungry. A single-address space design can use virtually addressed caches and move the TLB out of the critical path, where it can be bigger and use less power without impacting performance, because DRAM access is orders of magnitude slower.

At the micro-optimization level, x86 context switches still require entering ring 0, saving and restoring registers (this can be a LOT of data), and returning to ring 3. The Mill, on the other hand, has no kernel mode- each process hands out portals to itself that can be called directly. For all calls, "registers" are saved and restored lazily and asynchronously by the hardware itself, with no additional performance penalty on the software at all- context switches cost a single cycle for the call, plus the (possibly hidden) time to load the portal/function address itself, plus the possibly missed branch prediction.

On x86, passing any data bigger than the registers requires either setting up and tearing down shared page mappings, which potentially also requires pointer translation, or copying through kernel space. The Mill's PLB can be used to grant arbitrary regions of the single address space for the duration of the call, with no copying, and with permissions checked in parallel with the memory access, rather than in sequence like the TLB's translations.

The cost of isolation on a Mill is significantly closer to zero than on an x86. That it doesn't exist (yet?) doesn't make it unworthy of attention- if nothing else, some of these ideas can be adapted to existing architectures, so it's worth thinking about what operating system designs they enable.

Brendan · Post by **Brendan** » Wed Jul 30, 2014 3:59 pm

Hi,

Rusky wrote:The cost of isolation on a Mill is significantly closer to zero than on an x86. That it doesn't exist (yet?) doesn't make it unworthy of attention- if nothing else, some of these ideas can be adapted to existing architectures, so it's worth thinking about what operating system designs they enable.

To be perfectly clear; there's quite a few things that Mill have done that make sense to me and may make it a technically superior CPU design. However, technically superiority will not prevent the Mill from being stillborn. The challenges Mill faces are economics and marketing; not technical.

For software (after you've spent ages designing and implementing something) it costs you virtually nothing to make more copies, and the performance of the copy isn't effected by how expensive it was to create the copy. For hardware, creating copies costs money, faster silicon costs more, and the more copies you make the cheaper each copy is. What this means is that a company like Intel (who produces millions of CPUs) has a massive advantage for "performance per dollar" simply due to the volume of processors they manufacture. For something like the Mill, they won't be able to afford cutting edge manufacturing and won't have the volume needed to drive the price per CPU down to an acceptable level; and their CPUs will be slow and expensive because of this.

In addition; a CPU without software is useless. For a "very different" CPU like the Mill, you're going to need very different software to make it shine. The Mill's features look like they're designed for a single address space "micro-kernel-ish" thing; and porting Linux (or running an 80x86 emulator or Java VM) will not do the CPU justice.

Basically; they're going to find a "catch-22" situation (bad "performance per dollar" and no decent software designed for it, because of far too few CPUs produced/sold, because of "performance per dollar" and no decent software designed for it).

What I expect will happen is that they will apply for patents, make a few slow/expensive CPUs and die, then sell their patents to make profit. Despite the fact that it may be technically superior, nobody here will ever actually see one.

Cheers,

Brendan

Rusky · Post by **Rusky** » Wed Jul 30, 2014 5:25 pm

Those challenges have been faced by every company to break into the CPU market, including Intel (vs IBM), AMD (with AMD64), and all the ARM manufacturers. Look at the companies that license and manufacture their own ARM CPUs. They have essentially the same problem (minus creating the architecture) and are doing very well for themselves, even threatening Intel in some areas. It's definitely a possibility.

The Mill has a cheaper and faster design, which will significantly lower the cost. They estimate a 10x performance/power increase over traditional OoO superscalar designs for good, well-analyzed reasons- even early on it has a good chance of matching Intel's power/performance, if not beating it, for the markets they target. That is their business model, after all.

The software really isn't a problem either. The Mill is designed to run regular old off-the-shelf C code, including Unix kernels, so most applications will see the benefits immediately on recompile (using LLVM as the backend, so even that won't take much/any effort). While a Linux port may not take full advantage of the design, it will still benefit from it, with no real performance penalties.

Your expected scenario is still possible, but I don't expect it. I expect it to break into some niche (server farms might be easier?) and grow from there. And like I said before, if their patents get sold off, many aspects of the design could show up in a next-gen Intel or ARM or something. Some of us will likely see at least some aspects of it.

OSDev.org

... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All

Re: ... And UNIX For All