Monolithic vs. Microkernel hardware boot drivers

physecfed · Post by **physecfed** » Fri Jul 08, 2016 8:05 pm

I'm planning on beginning a new wave (or bout, perhaps) of operating system development, and I'm planning on challenging myself by writing it in Ada (studying as an aerospace engineer; gonna have to learn it sometime). I'm looking into microkernel development as a potential avenue for this so-far hypothetical operating system.

Where I'm stuck is in relation to how microkernels deal with the issue of hardware "boot" or initialization drivers that must be in place for the computer to function correctly. How do microkernels deal with this issue? To clarify, I'm aware of the (major) differences:

Monolithic kernels, as I've read so far, integrate mostly-everything into the kernel itself, such as filesystems, drivers, paging/memory management, IPC, etc. This is what makes them so large. From what I've read, most monolithics disregard all but two protection/privilege levels; there is kernel mode and user mode and not much in between.
Microkernels, as far as I'm aware, attempt to run drivers, filesystems, and other "non-critical" stuff at a lower privilege level. This has security benefits but my thinking is that it also leads to some IPC conundrums and interesting design choices.

First of all, the OSDev wiki notes that microkernels attempt to run drivers and what-not in user mode; do microkernels by default have to treat drivers/filesystems at the same level (say, Ring 3), or can microkernels allocate these at ring 1/2 (i.e. still less privileged than 0)?

Second, and to the main question - how does the microkernel concept deal with drivers and hardware systems that are absolutely required for functionality (i.e. some (primitive) form of display, interrupt support, basic storage access and memory mapping)? I can understand that networking devices, TCP/IP stacks, USB frameworks and drivers are things that would go into userland, but what does the microkernel do about the drivers that must be functional at boot-time (before the OS has time to load modules) in order for booting to occur?

Roman · Post by **Roman** » Fri Jul 08, 2016 9:45 pm

Microkernels load the necessary drivers from RAM disks supplied by their bootloaders.

physecfed · Post by **physecfed** » Fri Jul 08, 2016 10:44 pm

Roman wrote:Microkernels load the necessary drivers from RAM disks supplied by their bootloaders.

Okay, so that's handled by the bootstrapping processes.

Now, in a microkernel, can drivers occupy the "intermediate" privilege ring levels - that is, can the OS make use of all available hardware security levels?

Brendan · Post by **Brendan** » Fri Jul 08, 2016 11:37 pm

Hi,

physecfed wrote:
Roman wrote:Microkernels load the necessary drivers from RAM disks supplied by their bootloaders.
Okay, so that's handled by the bootstrapping processes.

Yes.

physecfed wrote:Now, in a microkernel, can drivers occupy the "intermediate" privilege ring levels - that is, can the OS make use of all available hardware security levels?

In theory, yes; but it's not that simple.

The main point of a micro-kernel is to ensure things like device drivers, etc can't interfere with the kernel and can't interfere with each other either. Essentially, each piece (driver, file system, VFS, etc) runs in its own isolated space where it can't interfere with anything else, and the only thing that can interfere with it is the kernel.

The problem with using intermediate levels is that for paging on 80x86 there is only "user" and "supervisor". Anything running at CPL=1 or CPL=2 is considered "supervisor" and has the same access as CPL=0. This means that if you only use paging, drivers aren't really isolated from kernel properly and can interfere with kernel; so paging alone fails to satisfy the main point of having a micro-kernel in the first place.

A kernel could use segmentation to isolate CPL=1 code from kernel code (as long as no drivers are 64-bit). The problem is that segmentation alone won't isolate CPL=1 code from other CPL=1 code (device drivers, etc would be able to interfere with each other), so with segmentation alone still fails to satisfy the main point of having a micro-kernel.

If you use both paging and segmentation at the same time (where paging is used to isolate things like drivers from each other, and segmentation is used to protect the kernel from drivers); then it would satisfy the main point of having a micro-kernel. However, if you're using paging to isolate drivers from each other then there's no reason to use CPL=1 or CPL=2 - if you used CPL=3 for drivers instead, then it'd be almost exactly the same and the only difference is that paging would protect kernel from driver too (and you wouldn't need to bother with segmentation).

So...

In theory, yes you can use CPL=1 and CPL=2 for things like drivers, file systems, etc; but it's a whole pile of extra work with no benefit (and won't work for 64-bit drivers because segmentation is mostly disabled for 64-bit) and much better and easier to just use CPL=3 and forget about CPL=1 and CPL=2.

Cheers,

Brendan

onlyonemac · Post by **onlyonemac** » Sat Jul 09, 2016 11:07 am

Roman wrote:Microkernels load the necessary drivers from RAM disks supplied by their bootloaders.

Alternatively, a microkernel can have some drivers built into it in a way that offers the same isolation as a driver loaded from disk but which doesn't require the driver to be loaded separately from the kernel. I believe that that usually still qualifies as a microkernel.

onlyonemac · Post by **onlyonemac** » Sat Jul 09, 2016 11:08 am

If you use CPL3 for drivers, how do they perform the necessary hardware IO? Or does the kernel provide routines to do this for them?

iansjack · Post by **iansjack** » Sat Jul 09, 2016 12:02 pm

IOPL.

physecfed · Post by **physecfed** » Sat Jul 09, 2016 7:20 pm

Brendan wrote:In theory, yes; but it's not that simple.

The main point of a micro-kernel is to ensure things like device drivers, etc can't interfere with the kernel and can't interfere with each other either. Essentially, each piece (driver, file system, VFS, etc) runs in its own isolated space where it can't interfere with anything else, and the only thing that can interfere with it is the kernel.

The problem with using intermediate levels is that for paging on 80x86 there is only "user" and "supervisor". Anything running at CPL=1 or CPL=2 is considered "supervisor" and has the same access as CPL=0. This means that if you only use paging, drivers aren't really isolated from kernel properly and can interfere with kernel; so paging alone fails to satisfy the main point of having a micro-kernel in the first place.

A kernel could use segmentation to isolate CPL=1 code from kernel code (as long as no drivers are 64-bit). The problem is that segmentation alone won't isolate CPL=1 code from other CPL=1 code (device drivers, etc would be able to interfere with each other), so with segmentation alone still fails to satisfy the main point of having a micro-kernel.

If you use both paging and segmentation at the same time (where paging is used to isolate things like drivers from each other, and segmentation is used to protect the kernel from drivers); then it would satisfy the main point of having a micro-kernel. However, if you're using paging to isolate drivers from each other then there's no reason to use CPL=1 or CPL=2 - if you used CPL=3 for drivers instead, then it'd be almost exactly the same and the only difference is that paging would protect kernel from driver too (and you wouldn't need to bother with segmentation).

So...

In theory, yes you can use CPL=1 and CPL=2 for things like drivers, file systems, etc; but it's a whole pile of extra work with no benefit (and won't work for 64-bit drivers because segmentation is mostly disabled for 64-bit) and much better and easier to just use CPL=3 and forget about CPL=1 and CPL=2.

Cheers,

Brendan

Seems like it's one of those oddities about x86 (segmentation privilege vs. paging privilege). I've considered attempting to build on RISC architectures or moving my OS development concepts to the embedded domain, but in the absence of a prevailing standard like the PC I'm not too sure how much wheel-reinventing I want to have to perform.

onlyonemac wrote:If you use CPL3 for drivers, how do they perform the necessary hardware IO? Or does the kernel provide routines to do this for them?

That's one of the things I'm attempting to figure out right now - how to provide hardware access to drivers which in most examples (monolithics) perform the access themselves. Needless to say, I'm in the process of ordering a lot of literature on the subject.

Rusky · Post by **Rusky** » Sat Jul 09, 2016 8:57 pm

onlyonemac wrote:If you use CPL3 for drivers, how do they perform the necessary hardware IO? Or does the kernel provide routines to do this for them?

Specific x86 IO ports can be made accessible with IOPL and with the TSS's port permissions bitmap. A lot of hardware, however, just uses MMIO, which can be made accessible with paging and IOMMU. Some edge cases may need system calls to preserve isolation.

Brendan · Post by **Brendan** » Sat Jul 09, 2016 11:35 pm

Hi,

onlyonemac wrote:If you use CPL3 for drivers, how do they perform the necessary hardware IO? Or does the kernel provide routines to do this for them?

Bus mastering and DMA is where the big problem is. For ISA devices I provide a "setup DMA transfer" syscall in the kernel (which includes checking if the process should/shouldn't be able to use the DMA channel and the RAM area being transferred), so that isn't a problem. For PCI devices; if you have an IOMMU then it can be used to ensure a driver can't use the device to access something it shouldn't. Otherwise (for PCI devices when there's no IOMMU) there's no good solution - I just rely on "network effects" (if nobody that has an IOMMU has reported security problems with the driver, then assume the driver is safe on systems that don't have an IOMMU).

For IRQs there's no problem - typically the kernel has IRQ handlers which send "IRQ occurred" messages to device drivers, plus a syscall that drivers use to say "I've handled that IRQ" (so kernel can do EOI, etc). This has the benefit that device drivers don't need to know or care what the interrupt controller is (e.g. PIC, IO APIC) or if it's being shared by other devices, or which IRQ number it is; which also means that kernel can reconfigure IRQs without telling the driver. For e.g. if the device is using "interrupt 0x33" and that interrupt occurs the kernel can send a "Your device's IRQ #1 occured" message to driver, and then if the kernel reconfigures IRQs/interrupts so that the device ends up using "interrupt 0x44" instead and that interrupt occurs kernel can still send a "Your device's IRQ #1 occured" message to driver.

For memory mapped IO areas, you can map the areas that the driver should be able to access (as determined by PCI configuration space BARs, etc) into its virtual address space. Fortunately, for PCI the minimum size of a memory mapped IO area is 4 KiB (page size) and they have to be "power of 2", so there's never 2 or more memory mapped IO areas in the same physical page.

For IO ports there's 3 options:

Use IO permission bitmap in TSS
Emulate instructions that use IO ports in the general protection fault handler
Provide syscalls for IO port accesses (so a driver asks kernel to access the IO port)

Setting IOPL to 3 doesn't quite work, as you'd be giving a driver access to all IO ports and it'd be able to interfere with things it shouldn't be able to (e.g. disable A20 gate and crash the OS, etc).

The IO permission bitmap is probably fastest; but means you need to juggle IO permission bitmaps during task switching. Both the emulation and syscall methods are slower, but allow you to give "insanely fine grained" permissions (e.g. let a driver access some bits of an IO port but not others, read from an IO port but not write, etc); and allows you to re-arrange IO ports without telling the driver (e.g. the driver always thinks it's using "IO port 0x0000" regardless of which IO port the device actually uses when).

Note that because kernel can control access, kernel could provide some powerful advanced features. For example, kernel could generate a log of everything (e.g. when the device generates an IRQ, when a device's IO port is accessed and what data was involved, etc) to make it easier for device driver developers to debug their device drivers (without resorting to normal debugging tools like single-stepping, which tend to ruin timing). For a more advanced example, kernel could support "virtual devices" - a device driver writes to an IO port and kernel forwards it to "virtual device" to handle (and device driver doesn't know it's not driving a real device).

Cheers,

Brendan

physecfed · Post by **physecfed** » Sun Jul 10, 2016 12:09 am

Rusky wrote:
onlyonemac wrote:If you use CPL3 for drivers, how do they perform the necessary hardware IO? Or does the kernel provide routines to do this for them?
Specific x86 IO ports can be made accessible with IOPL and with the TSS's port permissions bitmap. A lot of hardware, however, just uses MMIO, which can be made accessible with paging and IOMMU. Some edge cases may need system calls to preserve isolation.

From what I read, the IOPB is quite literally a bitfield, with one bit corresponding to allowed/restricted port accesses for the process specified by that GDT entry (took me awhile to get that far, the Intel SDM and docs are a little bit too verbose to make reading them not a chore).

How would that be parsed in code? I'm not sure of a microprocessor that has granularity below one byte, and the x86 family certainly doesn't. Would a viable route be to use a little of math to load the byte/word/doubleword containing port N's bit into a register and then use bit-test instructions?

Brendan wrote:Bus mastering and DMA is where the big problem is. For ISA devices I provide a "setup DMA transfer" syscall in the kernel (which includes checking if the process should/shouldn't be able to use the DMA channel and the RAM area being transferred), so that isn't a problem. For PCI devices; if you have an IOMMU then it can be used to ensure a driver can't use the device to access something it shouldn't. Otherwise (for PCI devices when there's no IOMMU) there's no good solution - I just rely on "network effects" (if nobody that has an IOMMU has reported security problems with the driver, then assume the driver is safe on systems that don't have an IOMMU).

For IRQs there's no problem - typically the kernel has IRQ handlers which send "IRQ occurred" messages to device drivers, plus a syscall that drivers use to say "I've handled that IRQ" (so kernel can do EOI, etc). This has the benefit that device drivers don't need to know or care what the interrupt controller is (e.g. PIC, IO APIC) or if it's being shared by other devices, or which IRQ number it is; which also means that kernel can reconfigure IRQs without telling the driver. For e.g. if the device is using "interrupt 0x33" and that interrupt occurs the kernel can send a "Your device's IRQ #1 occured" message to driver, and then if the kernel reconfigures IRQs/interrupts so that the device ends up using "interrupt 0x44" instead and that interrupt occurs kernel can still send a "Your device's IRQ #1 occured" message to driver.

For memory mapped IO areas, you can map the areas that the driver should be able to access (as determined by PCI configuration space BARs, etc) into its virtual address space. Fortunately, for PCI the minimum size of a memory mapped IO area is 4 KiB (page size) and they have to be "power of 2", so there's never 2 or more memory mapped IO areas in the same physical page.

For IO ports there's 3 options:
Use IO permission bitmap in TSS

Emulate instructions that use IO ports in the general protection fault handler

Provide syscalls for IO port accesses (so a driver asks kernel to access the IO port)
Setting IOPL to 3 doesn't quite work, as you'd be giving a driver access to all IO ports and it'd be able to interfere with things it shouldn't be able to (e.g. disable A20 gate and crash the OS, etc).

The IO permission bitmap is probably fastest; but means you need to juggle IO permission bitmaps during task switching. Both the emulation and syscall methods are slower, but allow you to give "insanely fine grained" permissions (e.g. let a driver access some bits of an IO port but not others, read from an IO port but not write, etc); and allows you to re-arrange IO ports without telling the driver (e.g. the driver always thinks it's using "IO port 0x0000" regardless of which IO port the device actually uses when).

Note that because kernel can control access, kernel could provide some powerful advanced features. For example, kernel could generate a log of everything (e.g. when the device generates an IRQ, when a device's IO port is accessed and what data was involved, etc) to make it easier for device driver developers to debug their device drivers (without resorting to normal debugging tools like single-stepping, which tend to ruin timing). For a more advanced example, kernel could support "virtual devices" - a device driver writes to an IO port and kernel forwards it to "virtual device" to handle (and device driver doesn't know it's not driving a real device).

Cheers,

Brendan

I like the idea of the I/O syscalls better than the IOPB route, because that seems like it would enable me to change port access conditions in the kernel/driver infrastructure without having to navigate the myriad of switching and updating TSSs/GDTs while keeping everything in sync. The notion of having the kernel be able to abstract to virtualized devices certainly sounds like an easier solution with which to implement systems like /dev/random and /dev/null (I did say I wanted to implement a Unix!

)

Now, when you speak of ISA devices, do you mean via 8257 DMA (hardware/emulated) controllers? If I'm not attempting to develop for ISA devices or bus standards requiring that sort of setup, could I simply implement IOMMU protection to allow the drivers to directly access certain, limited areas of memory?

Brendan · Post by **Brendan** » Sun Jul 10, 2016 1:13 am

Hi,

physecfed wrote:
Rusky wrote:
onlyonemac wrote:If you use CPL3 for drivers, how do they perform the necessary hardware IO? Or does the kernel provide routines to do this for them?
Specific x86 IO ports can be made accessible with IOPL and with the TSS's port permissions bitmap. A lot of hardware, however, just uses MMIO, which can be made accessible with paging and IOMMU. Some edge cases may need system calls to preserve isolation.
From what I read, the IOPB is quite literally a bitfield, with one bit corresponding to allowed/restricted port accesses for the process specified by that GDT entry (took me awhile to get that far, the Intel SDM and docs are a little bit too verbose to make reading them not a chore).

It is a pure bitfield (with one bit per IO port, and where anything larger than a byte access require permission to access each byte - e.g. reading a dword from IO port 0x1234 requires permission to access the IO ports 0x1234, 0x1235, 0x1236, 0x1237). Also note that the IO permission bitmap doesn't need to be a full 65536 bits (8 KiB), and only needs large enough for the IO ports you want to allow rounded up to the next 32 (e.g. if you only want to allow access to IO port 0x0001, then you need an IO permission bitmap with 32 bits).

The simplest way of doing this is to have one TSS per CPU and copy the IO permission bitmap into it during task switches (and change IO permission bitmap's size to zero to avoid copying if the task isn't allowed to access any IO ports, which is the most common case). There's "clever" ways to avoid that copying, but they're probably not worth the hassle.

Also note that you don't necessarily need to pick one option - you can use multiple different options and switch between them (either dynamically based on performance feedback, or via. user config, or..).

physecfed wrote:Now, when you speak of ISA devices, do you mean via 8257 DMA (hardware/emulated) controllers? If I'm not attempting to develop for ISA devices or bus standards requiring that sort of setup, could I simply implement IOMMU protection to allow the drivers to directly access certain, limited areas of memory?

Yes, I mean the old 8257 DMA. I don't know what hardware you're targeting; but there's 3 cases - "ancient" (where you actually have ISA cards in ISA slots on the motherboard), "less ancient" (where you don't have ISA slots or true ISA devices; but things like serial and parallel ports and floppy controller are built into the chipset and use the old ISA DMA controller), and "modern" (where serial ports, parallel ports and floppy controller have all been replaced by USB so you don't need the old ISA DMA controller for anything).

If you do have IOMMU then you could probably use IOMMU to restrict the old ISA DMA (and enhance it - to break the "first 16 MiB of RAM" limitation).

Note that the main problem/s with IOMMU is that there's still a lot of computers that don't have the necessary hardware (Intel treated it as "only for people that want to pay extra for virtualisation support" for a long time), and Intel and AMD do it differently, and it can end up intertwined with code to support virtual machines.

Cheers,

Brendan

onlyonemac · Post by **onlyonemac** » Sun Jul 10, 2016 3:03 am

I gather that this IOMMU is not the same thing as an MMU, and is part of what Intel call "VT-x"?

physecfed · Post by **physecfed** » Sun Jul 10, 2016 3:11 am

Brendan wrote:Hi,

physecfed wrote:From what I read, the IOPB is quite literally a bitfield, with one bit corresponding to allowed/restricted port accesses for the process specified by that GDT entry (took me awhile to get that far, the Intel SDM and docs are a little bit too verbose to make reading them not a chore).
It is a pure bitfield (with one bit per IO port, and where anything larger than a byte access require permission to access each byte - e.g. reading a dword from IO port 0x1234 requires permission to access the IO ports 0x1234, 0x1235, 0x1236, 0x1237). Also note that the IO permission bitmap doesn't need to be a full 65536 bits (8 KiB), and only needs large enough for the IO ports you want to allow rounded up to the next 32 (e.g. if you only want to allow access to IO port 0x0001, then you need an IO permission bitmap with 32 bits).

The simplest way of doing this is to have one TSS per CPU and copy the IO permission bitmap into it during task switches (and change IO permission bitmap's size to zero to avoid copying if the task isn't allowed to access any IO ports, which is the most common case). There's "clever" ways to avoid that copying, but they're probably not worth the hassle.

Also note that you don't necessarily need to pick one option - you can use multiple different options and switch between them (either dynamically based on performance feedback, or via. user config, or..).

physecfed wrote:Now, when you speak of ISA devices, do you mean via 8257 DMA (hardware/emulated) controllers? If I'm not attempting to develop for ISA devices or bus standards requiring that sort of setup, could I simply implement IOMMU protection to allow the drivers to directly access certain, limited areas of memory?
Yes, I mean the old 8257 DMA. I don't know what hardware you're targeting; but there's 3 cases - "ancient" (where you actually have ISA cards in ISA slots on the motherboard), "less ancient" (where you don't have ISA slots or true ISA devices; but things like serial and parallel ports and floppy controller are built into the chipset and use the old ISA DMA controller), and "modern" (where serial ports, parallel ports and floppy controller have all been replaced by USB so you don't need the old ISA DMA controller for anything).

If you do have IOMMU then you could probably use IOMMU to restrict the old ISA DMA (and enhance it - to break the "first 16 MiB of RAM" limitation).

Note that the main problem/s with IOMMU is that there's still a lot of computers that don't have the necessary hardware (Intel treated it as "only for people that want to pay extra for virtualisation support" for a long time), and Intel and AMD do it differently, and it can end up intertwined with code to support virtual machines.

Cheers,

Brendan

I'm tending to think (without having read more on the subject first, which I should probably do) that I'll likely err on the side of setting up memory management for MMIO and system calls for I/O port use. At least in the case of the I/O ports, it might make it a little more intuitive security- and containment-wise if the kernel gets to mediate I/O port use, and at any rate seeing as the I/O port manner of throughput is slower than memory-mapped things, I doubt the kernel will make its one user very angry, at least at the beginning.

As to the hardware I'm planning on targeting - who knows at this point! Being a realist, by the time my kernel hits the point where it can scratch the surface of modern hardware capability, my current 6-core CAD workstation will probably be mothballed. I like the idea of targeting old hardware for its simplicity, but every time I sit down to find some old cheap floppy drive-carrying junker machine on eBay to use as a OS dev rig, I then think of more and more niceties and "useful" features until I eventually just end up at a modern machine anyway.

At any rate, I want to involve myself pretty thoroughly in the theory at first so I have a good idea of the core components and notions behind a microkernel before I set out. That way, if I complete the quest, I end up with a microkernel rather than some hybrid kernel full of ugliness and expletive-laced comments. Needless to say, I'm looking to acquire a bunch of books by Tanenbaum that discuss MINIX and its architecture.

Do you prefer to plan from theory and "higher-level" architecture first, or do you tend to enjoy jumping into the code first and shaping the work-in-progress?

gerryg400 · Post by **gerryg400** » Sun Jul 10, 2016 5:38 am

Brendan wrote:For IO ports there's 3 options:
Use IO permission bitmap in TSS
Emulate instructions that use IO ports in the general protection fault handler
Provide syscalls for IO port accesses (so a driver asks kernel to access the IO port)

Setting IOPL to 3 doesn't quite work, as you'd be giving a driver access to all IO ports and it'd be able to interfere with things it shouldn't be able to (e.g. disable A20 gate and crash the OS, etc).

Brendan, I think your 4th option is actually quite reasonable and deserves to be considered. Even if it doesn't prevent driver processes from accessing pretty much anything they like, it is possible to limit the number of processes that get IO privilege to a small set of trusted processes. It's still better protection than monolithic kernels provide. Drivers are often easier to review and test then say filesystems so there is still a win.

physecfed wrote:I'm tending to think (without having read more on the subject first, which I should probably do) that I'll likely err on the side of setting up memory management for MMIO and system calls for I/O port use.

Using system calls for IO adds a lot of complexity to the kernel. I presume you intend to limit access to various ports on a process by process basis. I think this means that need to have a system call that provides that access under certain conditions. How does the kernel decide which processes are allowed to access which ports? How does it know what the conditions for access are? How does the kernel store this information? To be honest I can't see any benefit using kernel calls to do IO. No complexity is saved and no gain is made by this method that I can see.

OSDev.org

Monolithic vs. Microkernel hardware boot drivers

Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers

Re: Monolithic vs. Microkernel hardware boot drivers