My OS Design

Marionumber1 · Post by **Marionumber1** » Sat Nov 23, 2013 10:32 am

I've recently gotten basic features finished in my operating system, namely memory management and multithreading. Now I'm beginning to focus on executable files, module loading, device management, and the virtual filesystem. These areas are much more high-level, and as a result, I've started focusing more on the design of my operating system. In an attempt to avoid making any drastic design errors, I'm going to post it here with the hopes that people here will pick it apart and help me improve.

Before I say anything about the design, I should mention that I maintain a wiki that serves as a design document. You can access it here: http://darksideproject.hopto.org/wiki/Main_Page. Also, I know this post is pretty long, but I did my best to organize it into sections. If you don't want to read the whole thing, I would most prefer feedback on the object manager, memory manager, subsystems, and the device manager.

To give a general overview of my design, I plan to use a hybrid kernel in my OS. The definition of "hybrid" I'm using is that all system components run in kernel mode, but they're structured similarly to a microkernel to allow for flexibility and modularity. The kernel in my design is actually divided into two components, the executive and the kernel. The kernel only handles low-level, architecture-specific functionality (just like a microkernel), while the executive performs system resource management. In addition to that, I also use a module called the Hardware Abstraction Layer (HAL) to abstract away differences in system hardware (things like legacy vs ACPI, 8259 vs APIC, PIT vs APIC timer vs HPET).

Executive
The executive is the central component of my operating system. It provides the basic system functionality that all applications, libraries, and drivers need to interface with the hardware, such as memory management, a file manager, processes and threads, and device management.

Object Manager
The object manager is the central component of the executive. The object manager manages all of the system resources resources as objects. It is responsible for keeping track of resources allocated to processes. All resource access goes through the object manager. Each object managed by the object manager has a header and a body. The header contains generic object information that is used by the object manager, while the body contains class-specific data. Generic object information includes the interfaces an object exposes, its access permissions, and a reference count.

Object Classes
Every executive subsystem implements object classes. An object class is a specific type of resource managed by the object manager. They are similar to classes in OOP languages. Each object class consists of a set of methods and defines the layout of the object body. The executive implements the following object types:

Directory - Object used by the object manager to create the object namespace
Section - Object that maps a part of a process's virtual address space
Inode - VFS structures that contain information about files
File - Instance of an open file, device, pipe, or socket
Process - Self-contained tasks with their own threads, address space, and objects
Thread - Parts of processes that have their own execution path, registers, and stacks
Event - Asynchronous events that can be sent to threads
Pipe - Objects that provide a bidirectional data flow between 2 file handles
Socket - Communication endpoints that allow for data exchange between processes
Semaphore - Synchronization primitives that can be owned by multiple threads
Mutex - Synchronization primitives that can be owned by one thread at a time
RWLock - Special mutexes that allow multiple threads to read a resource at the same time, but only one to write to it
Timer - Objects that fire an event after a certain amount of time.
Module - Dynamically loadable kernel modules
Device - Hardware that's part of the system

Handles
Objects managed by the object manager are exposed to userspace through handles. Handles are opaque structures that refer to objects. They are created by the object manager whenever an object is opened. A process must own a handle to an object before using it. Each process has its own handle table, which is a table matching handles to objects. Handle table entries contain a pointer to the object and the permissions that the process has to access that object.

Object Namespace
Objects managed by the object manager can be given a name to identify it. The object manager maintains an internal object namespace that organizes named objects in a hierarchy. This allows for objects to be categorized and opened in a uniform matter. In order to implement the object namespace, the object manager defines a directory object type. Directory objects contain a list of directory entries, which are structures that map object names to object pointers. This allows for directory objects to contain objects in the namespace, including other directory objects. Each object maintains a link count that keeps track of how many directory entries point to it.

The only way for userspace to gain access to objects is through the object namespace. User applications can open objects in the namespace and get a handle to that object. When userspace code opens an object, it requests a specific interface. An interface is a set of methods that can be called. Each object can provide multiple interfaces. Calling one of an object's methods invokes a syscall that causes the object's methods to be executed in the executive or redirected to another process or over the network. In this way, the executive becomes a namespace manager and RPC system.

Memory Manager
The memory manager is responsible for managing virtual memory. The memory manager is made up of the physical memory manager, the virtual memory manager, and the kernel heap.

Physical Memory Manager
The physical memory manager allows passing out physical memory pages. In order to keep track of the system's physical memory, it first needs to know what physical memory is available. This information is collected by the kernel bootloader, and is passed to the kernel in an array of memory ranges. The memory ranges contain the start, size, and flags of the memory range. Once the physical memory manager has a map of available memory, it uses it to initialize the buddy allocator.

Virtual Memory Manager
The virtual memory manager is used to manage a process' address space. One of its major major responsibilities is keeping track of the virtual memory used by each process. It uses structures called Virtual Address Descriptors, or VADs, for this purpose. VADs contain information about a specific region of virtual memory in a process's address space, including its start, size, and type. Every process has a set of VADs organized by memory address in an AVL tree, which allows the memory manager to find a VAD that corresponds to a certain address.

Section Objects
The virtual memory manager also implements section objects. Section objects are objects that map parts of a process's virtual address space. They are the basis on which memory-mapped files and shared memory are built. A section object is used to map a portion of the virtual address space to either physical memory or a file. Once a section object is created, it can be mapped into an address space by mapping a view of it. A view of a section object refers to a specific portion of the object that the section object refers to.

Mapping a view of a section object reserves a portion of the address space but does not commit it. This is used to implement a scheme called demand paging. Demand paging is a mechanism by which pages are only allocated once they're accessed. Since the mapped view of the section object isn't committed, accessing it causes a page fault. The page fault handler consults the VAD tree for the process, and upon learning that the memory faulted on is occupied by a section object, it commits the memory. For physical-memory backed sections, it allocates a physical page and maps it. For file-backed sections, the same process occurs, but the file data is then read into memory.

Kernel Heap
The kernel heap is used for allocating kernel data structures. It heavily relies on both the physical and virtual memory managers to get memory for allocations. The functions the kernel heap implements are allocation, freeing, and reallocation.

The kernel heap is made up of 2 suballocators, a heap allocator and a slab allocator. The heap allocator is a large area of memory, which is subdivided into chunks. It reuses the buddy allocator that is used by the physical memory manager. It is mainly used to allocate strings and buffers which do not have a predetermined size, as well as infrequently allocated objects.

The slab allocator is used for objects that are allocated often, like threads and inodes. The way the slab allocator works is that there are slab caches for each type of object. Each slab cache contains several slabs, which are blocks of memory with a predefined size. The advantage of using the slab allocator is that slabs in a cache can be reused. When allocating from a slab, it searches for a slab that's free and returns it. When freeing a slab, it just marks the slab as free. That way, freed slabs can be easily reused, with no need to search for more free memory or perform splitting or coalescing of chunks.

Virtual Filesystem
The executive’s virtual filesystem, or VFS, provides a filesystem abstraction. It allows for multiple different filesystems to be accessed in a uniform manner. The virtual filesystem in my OS is based on the Unix filesystem. The VFS uses a node graph to keep track of the filesystem hierarchy. Volumes can be added to the filesystem tree by mounting them on a directory.

The VFS implements several object types. In order for the VFS to have security and reference counting, its object types are managed by the Object Manager. This allows for files and directories to be secured by access control lists and contain a reference count so that they're removed when they're no longer in use.

Inodes
The most important structure in the VFS is the index node, or inode. Inodes contain important information about files, such as the mountpoint the inode resides on, the file size, owning user and group, access, modification, and change time, and file mode. Each filesystem implements a subclass of inodes containing filesystem specific data.

Directory Entries
Another important structure in the VFS is the directory entry. Directory entries are structures that map filenames to inodes. Directory inodes contain a list of directory entries, which allows them to contain inodes as children. Using the idea of directory entries, the same inode can be referenced multiple times in the filesystem hierarchy if multiple directory entries refer to it. This is known as hard linking. Each inode maintains a link count that keeps track of how many directory entries point to it. Directory entries are crucial in building the filesystem hierarchy.

Although directory entries are a major part of the VFS, they aren't actually provided by the VFS. This responsibility is owned by the Object Manager. The object manager maintains an internal object namespace that organizes named objects in a hierarchy, which the VFS integrates with. In order to implement the object namespace, the object manager defines a directory object type. Directory objects are used to contain objects in the namespace. The VFS becomes part of this namespace by creating its own directory object under the path name \VFS, which represents the root of the filesystem. This \VFS directory implements methods for both an object directory and an inode, allowing it to function as both.

Filesystem Drivers
A major component of the VFS is filesystem drivers. Filesystem drivers are responsible for treating a volume as a filesystem. All filesystem drivers implement a set of filesystem functions and register them with the VFS. All of these functions take a device as an argument, which allows these functions to be used on any devices that use that filesystem. Mountpoints can use a registered filesystem in order to handle filesystem requests.

Mountpoints
Mountpoints are locations in the VFS where volumes can be added to the filesystem. They contain an inode that functions as the mountpoint, a device that is mounted, and a filesystem to handle requests. The way they work is that a certain device is mounted at an inode with a specific filesystem. The VFS looks up the filesystem, and if it is found, creates a mountpoint for the device. This mountpoint is added to the mountpoint list, and then the inode is updated to point to the root of the mounted filesystem.

Caching
In order to speed up filesystem access, the VFS implements caching of file data and directory entries. File caching is a mechanism used to cache file data. It uses section objects provided by the memory manager to map 256 KB views of files into memory. Each inode contains a pointer to the view that holds its cached data. If the cached I/O is allowed on the file, all read and write requests to files first attempt to read or write from the file cache. If the requested data is not in the cache, or the cache does not exist, the VFS maps a view of the file and reads or writes the data to it. The memory manager is responsible for reading in or writing out the data by sending a non-cached I/O request to the VFS.

Directory caching is a mechanism used to cache directory entries. It keeps the directory entries most likely to be used again in memory. The code that handles directory caching is implemented by the inode function used to lookup a directory entry. When this function is called by the Object Manager while traversing the VFS namespace, it first attempts to get the directory entry from the inode's list. If this fails, and the function detects that the inode does not contain the full directory cache, it calls the filesystem driver to read in the specific directory entry that it's looking for. At this point, if the directory entry was successfully read in, it is returned. Otherwise, the function has failed to find a directory entry.

Multitasking
The executive provides multitasking with processes and threads. Processes are self-contained tasks with their own threads, address space, and object handles. Threads are parts of processes that have their own execution path, registers, and stacks, but run in the same address space as every other thread in their parent process. Each process contains at least one executing thread.

Inter-process Communication
The executive exposes several IPC primitives to userspace, such as events, pipes, sockets, shared memory, synchronization primitives, and waitable timers. These IPC primitives are implemented as objects managed by the object manager. They can be shared between processes by naming them in the object namespace and opening them by name.

Events are asynchronous events that can be sent to threads. They're meant to be as generic as possible. Events can hold an arbitrary amount of data, which allows them to be used for asynchronous message passing. They are used for many purposes, such as notifying I/O completion, GUI events, and POSIX signals.

Pipes and sockets are both objects that are accessed through file handles. Pipes are objects that provide a bidirectional data flow between 2 file handles. Sockets are communication endpoints that allow for data exchange between processes on the same or different systems. They are session-layer interfaces that provide an abstraction on top of multiple transport-layer protocols.

Shared memory is memory that is shared between multiple processes. It is implemented using the Memory Manager's section objects.

Synchronization controls access to resources from threads. Synchronization is designed to enforce a mutual exclusion policy. There are three synchronization primitives that the kernel exposes to userspace: the semaphore, mutex, and readers/writer lock.

Timers are objects that fire an event after a certain amount of time. They can be used synchronously, meaning that an application blocks on the timer, or asynchronously, where an application is interrupted by an event when the timer finishes. There are two types of timers: manual-reset timers and periodic timers. Manual-reset timers will not fire again until they are reprogrammed. Periodic timers automatically reset each time they fire, allowing for timers to fire at a common interval.

Subsystems
The executive exposes the user mode API using syscalls. In order for the executive to be incredibly flexible, it allows for different subsystems to be loaded as kernel modules. Subsystems are pluggable syscall interfaces, which applications can run under. With subsystems, applications from any operating system can be run, as long as there's an appropriate subsystem.

Subsystems are written to support a certain API, such as my OS API, POSIX, or the Windows API. Each subsystem implements a set of syscalls exposing that API to userspace. My OS is planned to support several subsystems: the native subsystem, which implements the OS API; the POSIX subsystem, which implements the POSIX API; and the Windows subsystem, which implements the Windows API and emulates Windows features such as its volume management and the registry. I'm well aware that supporting all these subsystems will take a lot of effort, so my only immediate goal is the native subsystems. However, I'm keeping the other ones in mind.

Modules
The executive is designed to allow for kernel modules to be dynamically loaded. Kernel modules are executable files that the executive loads in order to add functionality to it at runtime. Modules are dynamically linked with the executive. The four main types of modules are device drivers, filesystem drivers, executable formats, and subsystems. Device drivers are the most common type of modules. They control devices. Filesystem drivers are a special type of driver, responsible for treating a volume as a filesystem. Executable format and subsystem modules are used to add their respective features to the executive.

To allow for the executive subsystems to find modules they want to load, the executive makes use of the module registry. The module registry is a database of modules that can be loaded. It allows for modules to be identified by the executive. The module registry is a text file that gets parsed by the bootloader and converted into a tree. This tree can be searched in order to find a module.

The way that modules are identified depends on what type of module they are. With device drivers, modules are identified by their device class, bus type, and device ID. The device ID is bus specific. For example, a PCI ATA hard drive with a PCI vendor ID of 0x8086 and PCI device ID of 0x7111, the device ID would be 0x80867111. With executable formats, filesystem drivers, and subsystems, they are identified by strings. An ELF executable format module would have a string of "elf", an EXT2 filesystem driver would have a string of "ext2", and the POSIX subsystem would have the string "posix".

Device Manager
The device manager is responsible for detecting and managing devices, performing power management, and exposing devices to userspace. It contains code for driver loading, I/O requests, and power management.

Drivers
Devices are managed by device drivers. As explained above, device drivers are the most common type of kernel module, which control devices. Device drivers are layered on top of each other in driver stacks. There are three types of device drivers: low level, intermediate level, and high level drivers.

Low level drivers are the lowest level drivers in the tree. The main low level drivers are bus drivers, which allow the device manager to detect devices on the system. Bus drivers interface directly with the hardware, and provide an interface for other drivers to access the hardware. Examples of bus drivers are PCI, PCI Express, and USB drivers. One special type of low level driver is a motherboard driver. Motherboard drivers are used by the kernel to provide bus detection and power management. When booting the kernel, the bootloader loads a motherboard driver that matches the system configuration. Examples of motherboard drivers are ACPI drivers.

Intermediate level drivers control devices found on a bus. There are two types of intermediate level drivers: function drivers and filter drivers. Function drivers control devices found on a bus. Examples of function drivers are video card drivers, storage device drivers, network card drivers, and input drivers. Filter drivers are drivers that modify the behavior of other drivers. They sit below or above other drivers in driver stacks.

High level drivers sit on top of intermediate level drivers. They control software protocols that exist on top of those drivers. Examples of high level drivers are filesystem drivers and network protocol drivers.

Device Detection
The main role of the device manager is detecting devices on the system. Devices are organized in a tree structure, with devices enumerating their children. Device detection begins with the motherboard driver. The motherboard driver sits at the root of the device tree. It detects the buses present on the system as well as devices directly connected to the motherboard. Each bus is then recursively enumerated, with its children continuing to enumerate their children until the bottom of the device tree is reached.

Each device that is detected contains a list of resources for the device to use. Examples of resources are I/O, memory, IRQs, DMA channels, and configuration space. Devices are assigned resources by their parent devices. Devices just uses the resources they're given, which provides support for having the same device driver work on different machines where the resource assignments may be different, but the programming interface is otherwise the same.

Drivers are loaded for each device that's found. When a device is detected, the device manager finds the device's driver in the module registry. If not loaded already, the device manager loads the driver. It then calls the driver's add-device routine with a pointer to the device object. The add-device routine starts the device and creates a thread for that device to handle requests.

I/O Request Packets
I/O Request Packets, or IRPs, are data structures used to perform I/O requests. They contain data about the request, such as the I/O function code, buffer pointer, device offset, and buffer length. IRPs can be created by the kernel or drivers, and passed to other drivers. Every device driver has dispatch functions used to handle each I/O function code. Most I/O function codes are driver specific, but some are generic and shared by all drivers.

Each driver has a queue of IRPs for it to handle. Whenever an IRP is sent to a driver, the device manager queues the request, and if the main driver thread is asleep, wakes it up. The main driver thread dequeues IRPs and handles them until the queue is empty. A driver will handle an IRP by either passing the IRP to a lower-level driver in the driver stack or performing the I/O request.

Asynchronous I/O
There are two main types of I/O: synchronous I/O and asynchronous I/O. Synchronous I/O sends an I/O request and then puts the current thread to sleep until the I/O completes. Asynchronous I/O just sends the I/O request and then returns. I/O completion is reported asynchronously using a callback. Asynchronous I/O improves the efficiency of the system by allowing allowing for the program execution to continue while I/O is performed. It also allows for multiple I/O requests to be started and then handled in the order they complete, not the order they execute. However, this comes at the cost of making programming more complex than using synchronous I/O.

Internally, my OS uses asynchronous I/O for all of its I/O requests. IRPs are sent to drivers, and then the function that sent them immediately returns. Eventually, the main driver thread will execute, handling the I/O request. Once the I/O request completes, it returns through the driver stack and finally calls the specified callback. It does this by queueing an event to the thread. Once the thread gets executed, the callback will execute.

Synchronous I/O is simply implemented as a special case of asychronous I/O. Just like with asynchronous I/O, an IRP is sent to the driver, but instead of returning, the thread goes to sleep. Once the I/O completion event is queued, the thread will wake up and execute the callback before returning.

Power Management
The device manager also performs power management. Power management is a feature of hardware that allows for the power consumption of the system and devices to be controlled. Each device managed by the device manager provides functions to set their power state. Setting the power state of a device will also affect their child devices' power state. For example, if the PCI bus is put to sleep, so will all of its devices. For power management support, all systems require a power management driver that controls the system power. On x86, this is done through ACPI. Each device also needs to support power management.

The device manager responds to power management events. Power management events can come from two sources: the user or the system. User-generated power management events are created by user mode applications. They are system-wide events for shutting down, rebooting, hibernating, or putting the system to sleep. When the device manager receives a system-wide power management event, it sets the power state of every device on the system.

System-generated power management events are events that come from the system hardware. Examples of system-generated power management events are plugging/unplugging an AC adapter or closing/opening the lid of a laptop. The device manager takes the appropriate action in response to the event.

Userspace Exposure
Devices are exposed to userspace through the device tree on /dev. /dev is actually a link to the \Device directory in the object namespace. The \Device directory contains device objects that represent each device in the system. Devices can be accessed directly through one of two ways: through normal file syscalls or through the object interface that a device provides. Because both of these methods can be complex to program, several device APIs are implemented in userspace that provide an abstraction over them.

Kernel
The kernel is responsible for architecture-specific code. It sits underneath the executive and performs I/O, trap dispatching, low-level virtual memory management, and thread scheduling. The kernel also implements synchronization primitives for use by the executive, which are spinlocks, semaphores, mutexes, and readers/writer locks. These services are exposed to the executive. In this way, the kernel pretty much serves as a microkernel, providing basic functionality that allows the executive to implement its services.

Hardware Abstraction Layer
The Hardware Abstraction Layer (HAL) is the lowest-level component in the OS. It implements machine-specific code, which is code that differs between machines with the same processor architectures, like IRQ controllers, system timers, real time clocks, and multiprocessor information. By abstracting different hardware configurations, the HAL provides the kernel with a consistent platform to run on.

mrstobbe · Post by **mrstobbe** » Sun Nov 24, 2013 12:10 am

So, reading over this and glancing over your wiki, I think I have a lot of questions. In order...
What type of kernel design is this? You've indicated a lot of details about what the design will accomplish, but not any specifics about how. I'm just asking because a lot of my questions could probably be answered by indicating explicitly how things interact with each other.
"Executive" is your main kernel (or main "part" of the kernel depending on design) right?
"Object Manager" is your kernel heap manager (basically) right? Allocating and deallocating resource structures at will for the system? Keep track of handles, what they are, what state they're in, all related information, etc, yes?

Marionumber1 wrote:Object Namespace
Objects managed by the object manager can be given a name to identify it. The object manager maintains an internal object namespace that organizes named objects in a hierarchy. This allows for objects to be categorized and opened in a uniform matter. In order to implement the object namespace, the object manager defines a directory object type. Directory objects contain a list of directory entries, which are structures that map object names to object pointers. This allows for directory objects to contain objects in the namespace, including other directory objects. Each object maintains a link count that keeps track of how many directory entries point to it.

You mean to say that the kernel takes in a uint8_t* reference and maps that to a resource structure/resource namespace? It sounds like you're saying that you're taking the plan-9 route and all resources are mapped into a virtual filesystem. This is also strongly seconded by all of your focusing on VFS later in your design. Am I reading this correctly?

Most of the rest is really straight forward everyday OS stuff, so I don' t have any questions or comments about it. One last thing though...

Both in this post, as well as on your wiki, you're making it absolutely clear that you want to pursue parallel support of multiple OS APIs/standards (just POSIX and Windows in the post, but also OS X is implied in your wiki).

Marionumber1 wrote:Subsystems are written to support a certain API, such as my OS API, POSIX, or the Windows API

While possible, I would warn you that (for example, the same goes for most other major APIs) supporting both POSIX and Windows is nearly a contradiction. The basics where they overlap, and there is a lot of that, is no problem, but the second you get API calls that are more standard specific (like fork()), you'll find yourself doing the most ridiculously hacky things imaginable to make that happen. I recommend you reconsider this, or simply refine the goal.

Brendan · Post by **Brendan** » Sun Nov 24, 2013 3:32 am

Hi,

This is a large/detailed description, so I'm going to focus on a few specific aspects of it rather than going through piece by piece in order.

The basic intent seems to be a micro-kernel that allows different "executives" that provide different personalities (e.g. an executive that supports a "POSIX personality", another that might support a "BeOS personality", etc). This is mostly fine (and reminds me of some of the ideas behind the design of L4 to be honest). However; if this is the case then I'd strongly recommend (while designing and building the lower level parts of the OS) ignoring the internal details of any specific executive/s and concentrating on providing features that many different executives may build on; in the same way that (e.g.) someone writing a boring *nix clone would ignore the internal details of specific applications and concentrate on providing features that many different applications would build on.

For the "Object Manager"; you'll find that for communication and synchronisation the kernel needs to provide low level primitives that everything else uses. For example, your kernel might provide "asynchronous messaging" that all "Objects" use (where the executives also build support for things like events, pipes and signals on top of the kernel's messaging).

For scheduling; this is deeply intertwined with communication and synchronisation. For example, task switches caused by sending to a higher priority task and task switches caused by blocking (waiting to receive communication or waiting for a semaphore/mutex) are likely to influence scheduling far more than anything else does (including time and CPU load). For this reason whatever is responsible for the lowest level of communication and synchronisation (e.g. the kernel) should also be responsible for scheduling.

For the remaining "Objects", I'm not too sure how they work as objects (in the OOP sense). Typically these things are done more like "services" that communicate (using communication provided by kernel) using agreed upon protocols. For an example; you might have a VFS service that implements a "file IO protocol" (and uses the kernel's communication for transport); sort of similar to how an FTP server would implement the "FTP protocol" and use TCP/IP networking as transport.

For 80x86 (although I doubt ARM is any better), there are a very large number of "machine specific" things to worry about (with/without HPET, with/without NUMA, with/without transactional extensions, with/without monotonic time stamp counters, with/without AVX, with/without hyper-threading, ...). If you have a different HAL for each different permutation, then you're going to need many thousands of HALs. Obviously this isn't practical. To make it practical you'd have to select a small number of "most important" things (e.g. so that there's 8 or less different HALs for the same architecture); but in that case all HALs have to support the majority of all machine specific things. For example, if you have some HALs for "with NUMA" and other HALs for "without NUMA"; then all HALs might have to support "with hyper-threading" and "without hyper-threading".

Basically; for HALs to be practical you end up with most HALs supporting most things. Why not go a small step further and just have one HAL that supports all machines (for one architecture)? If you did that you don't need a HAL at all (it can be merged into the architecture specific micro-kernel). This has several advantages (less code to maintain, more efficient because the kernel has no limitations caused by the "HAL API", etc).

The other thing I'd do is use auto-detection as much as possible. For example; for 80x86 all of the machine specific things that the HAL/kernel cares about can be quickly and easily auto-detected. I'd have boot code that auto-detects and passes the results to the kernel; where the boot code (and auto-detection code) is disposed of after boot. This makes it easy to have a "generic CD" (or "personal OS on USB flash" or whatever) where the exact same HAL/kernel code will take advantage of whatever machine specific features are available (e.g. without needing to reinstall the OS or switch HALs when you boot the same CD on different machines).

Cheers,

Brendan

Marionumber1 · Post by **Marionumber1** » Sun Nov 24, 2013 8:52 am

mrstobbe wrote:What type of kernel design is this? You've indicated a lot of details about what the design will accomplish, but not any specifics about how. I'm just asking because a lot of my questions could probably be answered by indicating explicitly how things interact with each other.

The kernel design for this is a hybrid kernel. It basically uses the structure of a microkernel, but with almost all system components running in kernel mode.

"Executive" is your main kernel (or main "part" of the kernel depending on design) right?

Yes, that's correct. The idea is that the executive does resource management, whether that resource is memory, files, processes and threads, IPC primitives, or devices.

"Object Manager" is your kernel heap manager (basically) right? Allocating and deallocating resource structures at will for the system? Keep track of handles, what they are, what state they're in, all related information, etc, yes?

It's kind of like a kernel heap manager, but it's a lot more like a resource manager. The object manager represents all system resources as objects, and is responsible for keeping track of them throughout their lifetime.

You mean to say that the kernel takes in a uint8_t* reference and maps that to a resource structure/resource namespace? It sounds like you're saying that you're taking the plan-9 route and all resources are mapped into a virtual filesystem. This is also strongly seconded by all of your focusing on VFS later in your design. Am I reading this correctly?

I do use the idea of one unified namespace, but not through the filesystem namespace. The object manager maintains an object namespace, where resources are represented as objects (with their own methods and data), not files.

Both in this post, as well as on your wiki, you're making it absolutely clear that you want to pursue parallel support of multiple OS APIs/standards (just POSIX and Windows in the post, but also OS X is implied in your wiki).

While possible, I would warn you that (for example, the same goes for most other major APIs) supporting both POSIX and Windows is nearly a contradiction. The basics where they overlap, and there is a lot of that, is no problem, but the second you get API calls that are more standard specific (like fork()), you'll find yourself doing the most ridiculously hacky things imaginable to make that happen. I recommend you reconsider this, or simply refine the goal.

I'm not quite sure what the problem is with supporting multiple APIs. I don't believe that doing so would lead to hackish solutions. For example, my memory manager will fully support cloning an address space with copy-on-write.

Marionumber1 · Post by **Marionumber1** » Sun Nov 24, 2013 9:21 am

Brendan wrote: The basic intent seems to be a micro-kernel that allows different "executives" that provide different personalities (e.g. an executive that supports a "POSIX personality", another that might support a "BeOS personality", etc). This is mostly fine (and reminds me of some of the ideas behind the design of L4 to be honest). However; if this is the case then I'd strongly recommend (while designing and building the lower level parts of the OS) ignoring the internal details of any specific executive/s and concentrating on providing features that many different executives may build on; in the same way that (e.g.) someone writing a boring *nix clone would ignore the internal details of specific applications and concentrate on providing features that many different applications would build on.

Sorry if I wasn't quite clear enough, but this is actually a hybrid kernel. It has the structure of a microkernel, but almost all system components are running in kernel mode. And there aren't multiple executives, but one executive that the different personalities (called subsystems) are built on. This way, I can actually have multiple personalities running side-by-side on top of one executive.

However, what you said gives me an interesting idea. All resource access goes through the object manager, and since the object manager is designed to be able to redirect a method call to another process using RPC, I could be able to have all executive components except the object manager run in either kernel-mode or user-mode. Services running in user-mode could be given permissions to define their own object classes and manage their own object directories.

Most of the rest seems to assume I use a microkernel-based design, so I'll skip it. I hope you don't mind. My scheduler actually is in the kernel, I'm not sure why I put it up there in the executive.

For 80x86 (although I doubt ARM is any better), there are a very large number of "machine specific" things to worry about (with/without HPET, with/without NUMA, with/without transactional extensions, with/without monotonic time stamp counters, with/without AVX, with/without hyper-threading, ...). If you have a different HAL for each different permutation, then you're going to need many thousands of HALs. Obviously this isn't practical. To make it practical you'd have to select a small number of "most important" things (e.g. so that there's 8 or less different HALs for the same architecture); but in that case all HALs have to support the majority of all machine specific things. For example, if you have some HALs for "with NUMA" and other HALs for "without NUMA"; then all HALs might have to support "with hyper-threading" and "without hyper-threading".

Basically; for HALs to be practical you end up with most HALs supporting most things. Why not go a small step further and just have one HAL that supports all machines (for one architecture)? If you did that you don't need a HAL at all (it can be merged into the architecture specific micro-kernel). This has several advantages (less code to maintain, more efficient because the kernel has no limitations caused by the "HAL API", etc).

The other thing I'd do is use auto-detection as much as possible. For example; for 80x86 all of the machine specific things that the HAL/kernel cares about can be quickly and easily auto-detected. I'd have boot code that auto-detects and passes the results to the kernel; where the boot code (and auto-detection code) is disposed of after boot. This makes it easy to have a "generic CD" (or "personal OS on USB flash" or whatever) where the exact same HAL/kernel code will take advantage of whatever machine specific features are available (e.g. without needing to reinstall the OS or switch HALs when you boot the same CD on different machines).

In my OS, the HAL is responsible for abstracting system hardware (8259 vs APIC, PIT vs APIC timer vs HPET), as well as detecting the CPU topology (SMP, NUMA, hyperthreading), which can all be done using ACPI. The kernel pretty much uses the topology information it gets from the HAL to initialize the scheduler, and from that point on, the scheduler assumes that you're running on a NUMA, SMP system with hyperthreading and makes the necessary optimizations. The other things you listed seem to be CPU-specific optimizations that mainly concern user-mode code. Now, with that said, your idea about having the kernel autodetect the hardware is an idea that I'll consider.

cyr1x · Post by **cyr1x** » Sun Nov 24, 2013 9:27 am

This is pretty much how the Windows NT kernel works and where you probably got your ideas from. My design is also somewhat biased like that. So yes it should work quite nicely.

Marionumber1 · Post by **Marionumber1** » Sun Nov 24, 2013 9:34 am

cyr1x wrote:This is pretty much how the Windows NT kernel works and where you probably got your ideas from. My design is also somewhat biased like that. So yes it should work quite nicely.

Yep, I took almost all of my inspiration from Windows NT. However, there are still some differences, like taking the idea of objects as resources a bit farther and having a VFS (something Windows doesn't actually have).

mrstobbe · Post by **mrstobbe** » Sun Nov 24, 2013 10:22 am

Marionumber1 wrote:
mrstobbe wrote: Both in this post, as well as on your wiki, you're making it absolutely clear that you want to pursue parallel support of multiple OS APIs/standards (just POSIX and Windows in the post, but also OS X is implied in your wiki).

While possible, I would warn you that (for example, the same goes for most other major APIs) supporting both POSIX and Windows is nearly a contradiction. The basics where they overlap, and there is a lot of that, is no problem, but the second you get API calls that are more standard specific (like fork()), you'll find yourself doing the most ridiculously hacky things imaginable to make that happen. I recommend you reconsider this, or simply refine the goal.
I'm not quite sure what the problem is with supporting multiple APIs. I don't believe that doing so would lead to hackish solutions. For example, my memory manager will fully support cloning an address space with copy-on-write.

I'm pointing out that many projects have tried to create API level compatibility for one OS within the domain of another. Just look at cygwin and Wine. It takes many years, never is complete, and even some of the most basic things can't be translated. The reason is some of the basic paradigms are completely different. Memory managers are really straight forward so that's not a good example. What about the security model? Are you planning to pass around security tokens like in Windows? If so, how do you plan to support setuid()? What about the console? The AllocConsole() stuff and how it works is far different from any other major OS out there. This is the tip of the iceberg.

Even if you do manage to support the majority of another OS's API it's a full time project just trying to keep up. They have teams of developers adding new features with every release.

EDIT: typo.

Marionumber1 · Post by **Marionumber1** » Sun Nov 24, 2013 10:41 am

mrstobbe wrote: I'm pointing out that many projects have tried to create API level compatibility for one OS within the domain of another. Just look at cygwin and Wine. It takes many years, never is complete, and even some of the most basic things can't be translated. The reason is some of the basic paradigms are completely different. Memory managers are really straight forward so that's not a good example. What about the security model? Are you planning to pass around security tokens like in Windows? If so, how do you plan to support setuid()? What about the console? The AllocConsole() stuff and how it works is far different from any other major OS out there. This is the tip of the iceberg.

Even if you do manage to support the majority of another OS's API it's a full time project just trying to keep up. They have teams of developers adding new features with every release.

I understand that there will be a lot of work involved, but I'm still sticking to my goals. As I said, my only immediate goal is the API that I develop specifically for my OS. Once I get to the point where the rest of my OS works reliably, I plan to focus on the other subsystems. I see your point about APIs having different paradigms, but solving that seems to be a matter of making the executive flexible.

neon · Post by **neon** » Sun Nov 24, 2013 3:27 pm

having a VFS (something Windows doesn't actually have).

Look up the IFS API for Windows. Technically the IFS acts as a VFS.

Marionumber1 · Post by **Marionumber1** » Sun Nov 24, 2013 3:37 pm

neon wrote:Look up the IFS API for Windows. Technically the IFS acts as a VFS.

What I mean is that Windows doesn't have a dedicated VFS component. It leaves the responsibility to filesystem drivers.

Marionumber1 · Post by **Marionumber1** » Tue Nov 26, 2013 7:25 am

I'm aware that this post is pretty long, so if you don't want to read the whole thing, I would most prefer feedback on the object manager, memory manager, subsystems, and the device manager.

OSDev.org

My OS Design

My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design

Re: My OS Design