Novel modular kernel design

StudlyCaps · Post by **StudlyCaps** » Sat Sep 25, 2021 10:07 pm

So I've been playing with an idea in my mind for a bit for a novel approach to kernel modules.

Background

As we all know, most modern hardware protection systems present system software with a simple binary option for security. The CPU is in either supervisor or user mode. All code which cannot run in user mode must therefore be given full and complete access to every part of the system, indistinguishable from the kernel.
The theoretical result of this is that your keyboard can intercept your network stack, your video driver can overwrite your file system and any device driver can trash the kernel. This may come about in the real world as either a deliberate attack, or merely as poorly written driver software. The results can range from system instability, violation of privacy and theft or loss of data.

The traditional solution to this problem is simply driver signing. Under Windows, kernel modules cannot be loaded unless signed using a public/private key pair issued by a recognized Certificate Authority. This creates some issues, there is a financial barrier of entry as well as a "political" hazard created by centralizing the power to disallow software from running on any PC. Small companies or individuals may find it difficult to publish kernel modules and those who wish to "hack" drivers to allow advanced users to access officially unsupported functionality of their hardware cannot do so.
The only workaround is to disable signature validation completely, opening the user up to any number of exploits. This also does not protect against badly behaved drivers published by legitimate companies.
Linux offers no such protection, only supporting signatures which verify no changes have been made to driver binaries between compilation and execution. This does nothing to tell you if drivers are safe, only if they are as safe as when they were compiled.

Proposal

The novel solution I propose is JIT compiled kernel modules, written in a restricted bytecode language which provides superior static and runtime guarantees by design and which is compiled to machine code within the kernel on module initialization.
This system would guarantee that no arbitrary code will ever run with supervisor privileges, all machine code executed while in supervisor mode would either reside in the kernel binary itself, or be generated at run time by the kernel.
By designing the language and the API with which it communicated with true kernel code carefully it could be possible to expose all of the underlying functionality of the hardware but without the possibility of expressing an unsafe operation in a valid program. All access to memory and potentially dangerous operations could be corralled through a capability based security system, enforced by the JIT complier.
This system is not without precedent, for many years now graphics drivers have contained compilers which generate machine code from shader and parallel computing languages for uploading and execution on the GPU. These use a restricted language and similarly code which cannot be expressed as operations of the GPU is not recognized as a valid program.

Pros (of a well designed implementation):
- Module memory isolation, including stack protection
- Individual modules may have granular ownership of IO ports and memory mapped IO regions
- Core system can remain stable in the face of a misbehaving module
- No longer need to trust CAs or driver publishers, the kernel guarantees security
- ABI abstraction - two kernels with the same language spec. and API can run identical module code, regardless of ABI differences
- Sensitive operations like changes to page structures can be restricted

Cons inherent to the design:
- Complexity of implementation (take the three most complex things in CS; language design, compiler design and kernel design and combine them)
- Hardware operations must be supported by the language, new ways of interacting with hardware cannot simply be implemented in modules, the kernel must have some awareness
- Changes to language specification must be managed to avoid invalidating existing well behaved drivers
- Performance penalty at system boot - compiling of modules must be performed
- An Inter-Module Communication system must be in place as kernel memory is not implicitly shared
- A compiler is a large and potentially unsafe piece of software, it must be inside the kernel

Risks that must be managed by design:
- Well written modules must generate performant code, security cannot have significant runtime performance penalties
- Language must be sufficiently flexible and expressive to allow known and unknown hardware to be controlled efficiently
- Static analysis is inherently unreliable, so static and runtime security must be implement side by side

That's a high level overview of my idea, I welcome any related ideas, comments, criticisms, particularly if you identify factors I may have overlooked. Maybe I've just gone totally mad

Thanks for taking the time to have a look

nullplan · Post by **nullplan** » Sun Sep 26, 2021 1:10 am

StudlyCaps wrote: As we all know, most modern hardware protection systems present system software with a simple binary option for security. The CPU is in either supervisor or user mode. All code which cannot run in user mode must therefore be given full and complete access to every part of the system, indistinguishable from the kernel.

Correct. Partly, this is because most architectures only allow a binary switch (e.g. PowerPC only has System State and Problem State), but partly it is also because with the advent of DMA, any further distinction becomes meaningless. Even if, say, the hard disk driver was forbidden from overwriting the IDT directly, it could still tell the hard disk itself to overwrite the IDT using DMA. If you can overwrite the IDT, you can get any access you want (for other architectures, you can also overwrite the page tables to get any access you want). IOMMU might protect against this, but is not available everywhere, and is another unportable concept. For the most part, drivers are part of the kernel, and any past attempt to treat them differently has failed.

StudlyCaps wrote:The traditional solution to this problem is simply driver signing.

Driver signing is a band-aid solution that has failed very often in the past. The idea that "only authorized code can run" is stupid on a machine where the CPU will run whatever it gets its hands on. Driver signing will exclude malicious drivers, yes, but not insecure ones. If a security hole is present in a driver to allow an attacker to execute arbitrary code, then even the signed drivers will execute the unsigned injected code. Such is the nature of the beast.

StudlyCaps wrote:Linux offers no such protection, only supporting signatures which verify no changes have been made to driver binaries between compilation and execution. This does nothing to tell you if drivers are safe, only if they are as safe as when they were compiled.

Note that Windows driver signing does the exact same thing. See above. The biggest protection against malicious code in kernel space is the requirement for admin privileges to load new modules.

StudlyCaps wrote:By designing the language and the API with which it communicated with true kernel code carefully it could be possible to expose all of the underlying functionality of the hardware but without the possibility of expressing an unsafe operation in a valid program. All access to memory and potentially dangerous operations could be corralled through a capability based security system, enforced by the JIT complier.

As long as "tell harddisk to load sector to address x" can still be expressed in that language, and x can be the address of the IDT, this will fail. And note that the kernel does not know how exactly the harddisk is told to load a sector; that's why we are loading this harddisk driver. From the perspective of the kernel, some bytes are written into the command buffer for the hard disk, which the driver is allowed to do, and then suddenly the IDT is different.

StudlyCaps wrote:This system is not without precedent, for many years now graphics drivers have contained compilers which generate machine code from shader and parallel computing languages for uploading and execution on the GPU. These use a restricted language and similarly code which cannot be expressed as operations of the GPU is not recognized as a valid program.

Shader exploits have been observed in the wild, and took some effort to fix. Solving the problem of arbitrary code execution is not trivial. The only solution I see is manual review. Which I know is impractical for a large project, but I seriously see no other way.

StudlyCaps · Post by **StudlyCaps** » Sun Sep 26, 2021 1:41 am

nullplan wrote:Driver signing is a band-aid solution that has failed very often in the past.

Very true.

nullplan wrote:Note that Windows driver signing does the exact same thing.

Absolutely, the only difference is that Windows will at least verify that the publisher has access to the private key and that the key was generated by a CA. So you can be somewhat assured that the code is indeed published by a reputable organization, which as I mentioned provides only minimal safety.

nullplan wrote:As long as "tell harddisk to load sector to address x" can still be expressed in that language, and x can be the address of the IDT, this will fail.

Is DMA really so unrestricted? I will look into that aspect but if it is as you say that is disappointing. It would indeed make any improvement to kernel module isolation as worthless as a software MMU for process isolation.

nullplan wrote:Shader exploits have been observed in the wild, and took some effort to fix. Solving the problem of arbitrary code execution is not trivial. The only solution I see is manual review. Which I know is impractical for a large project, but I seriously see no other way.

Exploits would of course exist in practice, but I would hope they could be a flaw of design or implementation rather that an intrinsic property of the system. Manual review is possible, but subject to many of the problems of code signing, where you need to trust some third party or parties to secure your system. At that point why bother, people make mistakes and organizations can be careless or corrupt.

Edit: Could a VT-d, AMD-Vi or SMMU aware kernel enforce memory protection for DMA? Also, thanks for the feedback, I appreciate it.

nullplan · Post by **nullplan** » Sun Sep 26, 2021 9:11 am

StudlyCaps wrote:Absolutely, the only difference is that Windows will at least verify that the publisher has access to the private key and that the key was generated by a CA. So you can be somewhat assured that the code is indeed published by a reputable organization, which as I mentioned provides only minimal safety.

Well, I guess that depends on your opinion of Microsoft's reputation. Microsoft signing a driver was meant to mean that they checked it for problems, but apparently sometimes they just rubber-stamp them.

StudlyCaps wrote:Is DMA really so unrestricted?

Pretty much, yes. DMA just means the device makes a request to write into memory somewhere, and generally the PCI host bridge is not restricted, and allows it to write anywhere.

StudlyCaps wrote:Edit: Could a VT-d, AMD-Vi or SMMU aware kernel enforce memory protection for DMA? Also, thanks for the feedback, I appreciate it.

As I said, yes, IOMMU would help with that (IOMMU being the umbrella term for the techniques you mentioned). I don't know the specifics, but it would allow you to at least confine DMA accesses to a 64kB window. I'm not sure if it would be specific to the device, though, or if the hard disk could still write into the network buffer, but at least your IDT would be safe.

Now that I come to consider the problem more fully, I see that by using technologies such as IOMMU, MMU, and the IO port permission bitmap, it is not even necessary to run a driver in supervisor mode. The driver could just be a more traditional process. That way, the isolation from the main kernel would be even better than with a traditional module. And the result of that is a microkernel. The thing is: You don't even need a bytecode for this. The drivers could be written in completely untrusted native code, and the isolation would still work. Of course, an untrusted hard disk driver can still trash your disk, but it is hard to see how it could break anything else.

The only thing a byte code driver would add at this point is instrumentation of memory accesses. But using an MMU and a normal process abstraction, you no longer need that, as the MMU is doing it for you. So the solution I'm coming to now is still not bytecode.

Of course, the problem only arises if you have anyone working on your OS other than yourself, so I doubt I will implement even that in the foreseeable future.

Korona · Post by **Korona** » Sun Sep 26, 2021 9:12 am

This idea is not novel, it is implemented in Singularity, among other OSes.

Ethin · Post by **Ethin** » Sun Sep 26, 2021 6:41 pm

This is an idea I've been playing around with too, using something like web assembly or a scripting language that gives me access to the AST so I can translate it into machine code. WebAssembly would probably be one of the neatest solutions to this problem; restrict the driver to its own memory area and make DMA accesses go through external function calls so that the kernel can verify them, and if they're outside the drivers memory range or the range of the BARs that the driver requests at initialization time (which would partially be determined by the PCI driver), fail the access and return an error. It wouldn't be a perfect solution, of course, and I'm sure there are ways around it, but that's a possible way of doing things. Using process isolation would only strengthen that (but would make DMA accesses slower).

h0bby1 · Post by **h0bby1** » Mon Sep 27, 2021 9:03 am

Additionally you have things like vx32 or 64 bits equivalent to do machine code parsing/disasembling, to check memory access and code path, and filter instructions.

Especially if you already have some low level runtime to mannage io or such it can make it easier.

Its used for example to run untrusted native code in browser originally.

But otherwise yes its the microkernel route with modules in userland and memory protection/isolation with ipc but it tends to be slower.

I contemplated doing this with some AML like langage, could as well reuse the acpi stack, but it might not be 100% fit for all drivers, beside performance issue.

linguofreak · Post by **linguofreak** » Tue Sep 28, 2021 11:33 pm

nullplan wrote:
StudlyCaps wrote: As we all know, most modern hardware protection systems present system software with a simple binary option for security. The CPU is in either supervisor or user mode. All code which cannot run in user mode must therefore be given full and complete access to every part of the system, indistinguishable from the kernel.
Correct. Partly, this is because most architectures only allow a binary switch (e.g. PowerPC only has System State and Problem State), but partly it is also because with the advent of DMA, any further distinction becomes meaningless. Even if, say, the hard disk driver was forbidden from overwriting the IDT directly, it could still tell the hard disk itself to overwrite the IDT using DMA. If you can overwrite the IDT, you can get any access you want (for other architectures, you can also overwrite the page tables to get any access you want). IOMMU might protect against this, but is not available everywhere, and is another unportable concept. For the most part, drivers are part of the kernel, and any past attempt to treat them differently has failed.

One thing that could be done from the standpoint of a new bus architecture is to not allow devices to write to main memory, only to their own onboard buffers. You then give the DMA controller access to all addresses, and , so a disk read transaction would look like this:

1) Userspace disk driver signals disk to read a list of sectors into its onboard buffer.
2) Disk finishes reading sectors, sends interrupt.
3) Disk driver tells kernel to initiate a DMA transaction from onboard disk buffer to buffer in driver's address space
4) Kernel translates disk driver VM addresses to physical addresses and initiates DMA transaction.
5) DMA controller moves data from onboard disk buffer to requested RAM locations.

Of course, if you're specifying a new bus architecture, you may as well specify an IOMMU.

rdos · Post by **rdos** » Wed Sep 29, 2021 1:37 am

Actually, you don't even need to be able to install a software driver into the system. Any PCIe device could start writing anywhere in memory, including taking over or taking down the operating system. Driver signing, JIT compilation, and user account models won't help a bit to fix this.

nullplan · Post by **nullplan** » Wed Sep 29, 2021 1:58 pm

linguofreak wrote:One thing that could be done from the standpoint of a new bus architecture is to not allow devices to write to main memory, only to their own onboard buffers. You then give the DMA controller access to all addresses,[...]

Thus turning all memory transactions into two-step processes. And only to solve a rather esoteric problem that does not appear to be the bulk of our security problems these days. I doubt such an architecture would get much buy-in.

Also, we are kind of stuck with the hardware as it exists currently. Running drivers from bytecode is something we can do. Specifying a whole new system bus architecture less so.

rdos wrote:Actually, you don't even need to be able to install a software driver into the system. Any PCIe device could start writing anywhere in memory, including taking over or taking down the operating system. Driver signing, JIT compilation, and user account models won't help a bit to fix this.

OK, evil hardware is a whole different can of worms. The OS is basically powerless against it, so only install trustworthy hardware! That's why Thunderbolt now has an authorization procedure. Thunderbolt is basically exporting PCIe to an external interface.

But I was talking about well behaved hardware and evil drivers. That's what a restriction on the drivers (like running them at lower privilege) is supposed to consider.

linguofreak · Post by **linguofreak** » Wed Sep 29, 2021 4:52 pm

nullplan wrote:
linguofreak wrote:One thing that could be done from the standpoint of a new bus architecture is to not allow devices to write to main memory, only to their own onboard buffers. You then give the DMA controller access to all addresses,[...]
Thus turning all memory transactions into two-step processes. And only to solve a rather esoteric problem that does not appear to be the bulk of our security problems these days. I doubt such an architecture would get much buy-in.

As I said, if you're putting together a new bus architecture, you may as well specify an IOMMU. About the one potential advantage to the scheme I described is that OSes might have an easier time porting between different implementations of that scheme than between different IOMMU architectures.

Solar · Post by **Solar** » Thu Sep 30, 2021 6:18 am

nullplan wrote:
rdos wrote:Actually, you don't even need to be able to install a software driver into the system. Any PCIe device could start writing anywhere in memory, including taking over or taking down the operating system. Driver signing, JIT compilation, and user account models won't help a bit to fix this.
OK, evil hardware is a whole different can of worms. The OS is basically powerless against it, so only install trustworthy hardware!

Evil hardware could just short out and fry your machine, drivers be damned. Focus on the problems you can solve. Malicious hardware is not on the OS to protect against.

rdos · Post by **rdos** » Thu Sep 30, 2021 6:40 am

Solar wrote:
nullplan wrote:
rdos wrote:Actually, you don't even need to be able to install a software driver into the system. Any PCIe device could start writing anywhere in memory, including taking over or taking down the operating system. Driver signing, JIT compilation, and user account models won't help a bit to fix this.
OK, evil hardware is a whole different can of worms. The OS is basically powerless against it, so only install trustworthy hardware!
Evil hardware could just short out and fry your machine, drivers be damned. Focus on the problems you can solve. Malicious hardware is not on the OS to protect against.

It can do a lot more than that. The combination of an evil driver & evil hardware is optimal since you can then spy on the operating system and insert malicious code. It's a bit harder for evil hardware alone to find the page directory root entries & IDT, but I'm sure it would be possible, at least on a known OS. With access to the IDT, you can steel CPU cores and from there you can do practically anything you like with the target system.

Solar · Post by **Solar** » Thu Sep 30, 2021 11:06 am

I was intentionally pointing out a "simplest worst case" that doesn't take any knowledge or imagination to understand that there are things you cannot harden your OS against.

rdos · Post by **rdos** » Fri Oct 01, 2021 1:33 am

Solar wrote:I was intentionally pointing out a "simplest worst case" that doesn't take any knowledge or imagination to understand that there are things you cannot harden your OS against.

In the area of viruses, trojans, and indeed evil drivers & hardware too, the biggest risk factor is how widespread an OS is. The creators obviously won't bother to attack a hobby OS that almost nobody uses, while Windows, Linux, and Macs will be primary targets. I also don't think building something superior in this area would make your OS popular, and so this is a poor argument. Generally speaking, all these protective measures create obstacles for users, and so are not popular. If your OS is not popular, it won't become widespread.

Another thing is that filesystem-based protection measures are not very hard to break if you have a mature OS. For instance, I plan to do ext and ntfs drivers that just ignore the ACL lists and allows the OS to read & write anything it likes. Would be a fun way to spy on closed systems I cannot log in to.

Besides, you can potentially stop a PCIe device from doing anything it likes on PCI (disable bus mastering), but I fear this is just a software setting that an evil device could ignore too. Then you could selectively enable bus mastering on trusted devices only.

OSDev.org

Novel modular kernel design

Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design

Re: Novel modular kernel design