RabbieOS

0b1 · Post by **0b1** » Sat Jul 07, 2018 8:32 am

Thanks for your responses. Every one helps, negative or positive.

Korona, I'm not saying I will do it better, I am still learning, but I sure am going to try. A very wise muppet once said there is no try. So I will succeed or fail. If I thought I would fail I wouldn't start.

If you have ever inherited legacy spaghetti code, you'll know that 95% of the time its better to start over with no assumptions than to start again. I agree that it's good to have a firm understanding of the technology, but again, the gains are not always what you think they are.

I've spent lots of time inheriting legacy code, and refactoring and simplifying. After a while you develop a feel for when things could be simpler, for when things should simply be thrown out and built up from first principles. A lot of the microarchitecture is the way it is because of legacy reasons. Some is the way it is because of software limitations. Some software is written the way it is because of hardware limitations. A lot of that cancels out, but only if you start from first principles and forget the 'noise'. That is a risk because some of that 'noise' is necessary information. But a lot of it isn't.

An example of inefficiencs is Page tables in a 64 bit address space. Initialization headaches aside, TLBs have to be loaded, translated, and evicted, taking time. 64 bit processes are meant to coexist linearly anyway, otherwise segmentation would not have been thrown out. An OS that is architected to use linear addressing has no need of page tables, but they are forced to, with unnecessary overhead. Only 'this is the way it is done' makes it necessary (and in this case mandatory). Another example. 64 bit BSP and APs have no need to spend any time in real or protected mode, yet they have to jump through those hoops. Not a huge deal (except as a wasted learning curve for NOOBs like me) at boot time, but still not optimal.

We cache because chip speeds are becoming much faster than RAM. OOO execution, cacheing, threading are all ways to make a CPU seem more efficient, but the cost is predictability. There is, at least theoretically, and 'optimal' software execution architecture where cacheing and OOOE add no value. I doubt I can find it, but perhaps I can get close.

When you look at high level languages such as .NET and Java, they jump through even more hoops: JIT compilation, object-oriented calls layered on top of byte code, byte code translated into machine code. That's a lot more steps. I'm contending that, when you take the entire pipeline from executable to hardware, 80% or more of what is happening can be optimized out.

For my OS, the ideal CPU would have:
- Direct-boot or single instruction switch to 64 bit mode
- No (or explicitly disable-able) OOOE.
- Implicit code cache
- Explicitly controllable data cache (via CPU instructions)
- MAX_PHYS_ADDR wide linear memory access without page tables
- A cached on-chip-only stack that was ONLY for returns and blocked data pushes.
- Async memory copy and search operations (likely better on the PCI bus, so not really a CPU wish)

As it is, Intel doesn't have those things (at least, if they do, they are hard to discover). So my goal is to get as close as possible without having to read 3000+ page of things that (a) I won't remember, because (b) they are probably not relevant.

My OS has a long way to go to even be considered an OS, but so far I have:
- Learned x86/NASM assembler basics to not-quite intermediate level.
- A custom bootloader, with contiguous kernel boot code.
- NASM build scripts, with gdb debug and unit test compiler options
- RAW virtual hard disk build scripts, including adding the OS and boot sector without third party tools.
- Build-to-boot build chain for BOCHS, QEMU, VBOX and VMWARE
- Automatic breakpoint parser (NASM comment to GDB script)
- Boot to 64 bit mode, IDT.
- Large page table support (all vguests except VirtualBox) and vbox support.
- A memory manager that tracks fragmentation (no defragmenter yet)
- About 50% of the unit tests for the memory manager
- i8254 nic initialization (no read-write yet. waiting on mm unit tests)
- SMP table scanning and AP trampolining (in progress)
- On paper draft design for a context-based cooperative OS, about 60% complete.
- Partial (< 5%) 'FilthyScript' compiler (tokenization and call hierarchy so far) in .NET.

Big gaps are:
- OS API for compiler to consume.
- Process, thread, and atomic-task scheduler.
- The remainder of the compiler.
- A basic TCP/IP network stack and multihost HTTP server
- An advanced network stack (SSL, TLS, WebSockets, compression)
- A database engine (simplified for most common uses and data relationship).
- Server-side scripting for HTTP and RDBMS
- A selection Off-the-shelf example web/database applications.
- A bunch of things I don't even know I need yet.

Things I have built/integrated and then thrown out as too inefficient, too complicated, or too inconvenient:
- Linux based build. I work on Windows by necessity a virtual Linux is not an option.
- GCC windows cross compiler. Too many issues, especially in 64 bit. Lots of bloat.
- Several existing boot loaders. Too large, to complex to configure and script.
- A C kernel stub, linker, PE and COFF loader. Assembly has proven to no harder to learn to than C was to relearn and, with a good debug setup, far less of a headache.
- A bitmap-based-memory manager. Too inconvenient to debug and doesn't track fragmentation.
- Open source disk image tools. Didn't work well on windows, and didn't have the option to manipulate individual bytes.

And that's since January of this year, bearing in mind I have a day job of of 40-50 hours a week, and weekends with other commitments. A LOT of credit goes to OSDev for making things easy to find and follow (although the 64 bit stuff is a bit thin) And for warning me about the insanity of attempting something this ambitious.

The goal of the FilthyScript compiler is an Object-Oriented, multi-threaded, EMCA-7-ish language to cooperative MT code blocks.
(and yes, Script and Compiler are contradictory, but I have my reasons).

Design approach is to build most of the code in assembly, until the compiler is relatively mature. Then port the compiler to itself. Then rewrite much of the assembly code as high-level compiler. Ideally, the system will:
- Boot from a stub to an initial state
- Secure boot from FilthyScript source in the cloud, compiled-on-boot. Maybe.
- Configuration stored locally or in the cloud (with a local stub only)
- Web based configuration and management.
- Off-the-shelf customizable database applications (e-store, cms, and so on). Including server-side 'scripting' for web and DB.
- Virtual hosts available at low cost on most large hosting providers (EC3, etc).

And yes, I expect many roadblocks, not least of which is the number of hours it will take

Octocontrabass · Post by **Octocontrabass** » Sun Jul 08, 2018 11:40 am

davidpi wrote:There is, at least theoretically, and 'optimal' software execution architecture where cacheing and OOOE add no value.

That's impossible with x86. I suspect there are no existing CPU architectures that would meet this requirement when scaled up to the speed of modern x86.

For x86, you need caching due to the frequency of memory accesses. Even if you could disable caching without enforcing serialization, you'll most likely spend a lot of time accessing the same areas of RAM. Without a cache, those accesses are limited by the relatively low RAM bandwidth.

For most CPU architectures, out-of-order execution allows more efficient resource sharing. Most instructions only require a few of the CPU's execution resources, and forcing in-order execution prevents resource sharing. For example, division is a relatively slow operation, but it usually requires only a single execution unit. The other execution units are free, and best utilized by future (or past!) instructions that have no dependency with the division.

davidpi wrote:So my goal is to get as close as possible without having to read 3000+ page of things that (a) I won't remember, because (b) they are probably not relevant.

No one is telling you to read the entire architecture manuals, just to search through them for the relevant information. Also, since you're focusing so much on performance, you might also want to consider the optimization guides as well: Intel's, AMD's, and Agner Fog's.

Muf · Post by **Muf** » Sat Aug 18, 2018 12:19 pm

From what I can tell, you're trying to make an unikernel. There's already quite a variety of unikernels, maybe it's worthwhile to investigate?

OSDev.org

RabbieOS

Re: RabbieOS

Re: RabbieOS

Re: RabbieOS