To POSIX or not to POSIX

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

nyc wrote:
There are also features to consider removing outright, such as fork() as per https://www.microsoft.com/en-us/researc ... otos19.pdf or, perhaps, much of the tty/pty infrastructure in UNIX and POSIX beyond just devising better-working features instead of POSIX threading and asynchronous IO.
I have a few comments on the article's analysis of fork(), for tonight I'll cover this:
article wrote:Fork conflates the abstraction of a process with the hardware address space that contains it.
tl;dr of my comments below:

What does "process" mean, independent of protection architecture? Is it even a meaningful term on a system where memory protection and/or memory mapping state changes autotmatically as control transfers are made within the same privilege level? In my opinion, "process" is not meaningful in the context of such an architecture. It is primarily meaningful in the context of current architectures, in which case it means "a single memory mapping / protection context", in which case a process is already conflated with a hardware address space before we even decide how we want to spawn processes. In such an environment, I think fork() makes sense.

In more detail:

Really, I think the concept of a "process" is an artifact of the fact that almost all current hardware makes all memory protection and memory mapping state static in user mode. Consider a hypothetical architecture similar to the x86 architecture, with the following changes:

1) Instead of one CR3, you have a CR3 for each segment register: CR3CS, CR3DS, etc.
2) Each segment register has a corresponding Virtual Descriptor Table Register (VDTR) pointing to a Virtual Descriptor Table (VDT) for the segment loaded into that register.
3) When a segment register is loaded, the upper bits of the selector are used to select a segment register, and the lower bits are used as an index into the VDT designated by the VDTR for that
4) The primary element of a VDT entry is a Real Segment Selector (RSS), which is used as an index into a Real Descriptor Table (which replaces both the GDT and LDT of the real-life x86 architecture).
5) The primary elements of an RDT entry are a CR3 entry for the segment (rather than the offset into a single address space for an x86 segment) and a field designating a VDT for the segment. The CR3 is loaded into the CR3 for the segment register being loaded, and the VDT field is loaded into the VDTR.
6) A program can load any segment that is referenced by the VDT of any currently loaded segment.
7) Loaded segments remain loaded regardless of whether they are in the VDT of any loaded segment (the VDT entry used to load a segment just finds the RDT entry for the segment, it's the RDT entry that's actually loaded into the segment register, CR3, and VDTR). A program can unload any segments it needs to protect before transferring control to untrusted code.
8) It may be necessary to amend 6 and 7 to get adequate security with good performance, but the description provided is a good starting point for discussion about such architectures.
9) The kernel can load segments as described above, or can use special instructions to load segments directly with their Real Selectors. (It might actually even be possible, depending on how I/O was done and protected on such an architecture, to do away with classical privilege levels entirely).

The association of each segment with its own VDT basically creates a directed accessibility graph of segments, i.e, "Segment A has privileges to see and access segments X, Y, and Z. As long as Segment A is already loaded, segments X, Y, and Z can be loaded". It's basically a coarse-grained capability system (a lot of capability systems I've seen described have seemed to be too fine-grained to have a chance of being performant).

A microkernel implemented on such an architecture could implement message passing to/from servers in terms of function calls and returns, without needing the kernel as an intermediary: A program makes a far call to the library implementing the file system server, passing as an argument an empty segment waiting to be filled with data read from a file. The VDT for the segment containing the library has an entry for the file system server's global data area, which the library then loads. It services the read call and returns to the program, and if the kernel gets called at all, it's for things the server actually needs the kernel to do for it, not for message passing.

With such an architecture, you could basically just have "user threads" (stack segments containing a thread's stack and thread-local storage), "kernel threads" (address spaces containing data for a kernel scheduling entity), "executables" (address spaces containing code for executables and libraries, with a VDT entry for each external library needed), and "files" (Addresss spaces containing any other data, memory mapped files, anonymous shared memory, heaps, etc.). In such a setup, threads might very well cooperate on a task, and share the same files (including one or more heaps), which would correspond somewhat with the traditional concept of a "process", but it doesn't seem to me, with this kind of an architecture that such cooperation would need to be enshrined in a kernel object / tied to a particular address space, as it is on traditional flat-memory architectures. For instance, many microkernel servers might not need to have a dedicated kernel thread (or even user thread)at all, they might just function as libraries with system-global data areas not accessible to anything but their own code (after all, that's basically what a monolithic kernel is). Programs would call the server, and the server would execute their request on the calling program's timeslice, and possibly the calling program's own stack, so unlike microkernel servers on a traditional architecture, the server wouldn't much resemble (let alone be) a traditional "process"; a process has at least one thread! It is also imaginable that something like a word processor might be implemented with a kernel thread servicing a user thread for each open file, with each user thread having a VDT with an entry for the corresponding file, as well as a heap for that user thread. Does this count as one or two processes? There's only one kernel scheduling entity, so we might say one process, but the file and corresponding heap for each editing session are only accessible when the stack segment for the corresponding user thread is loaded, which would require two separate processes on traditional architectures.

In such a case, you wouldn't have fork() and exec, you'd have primitives to create a new segment copy-on-write from an existing one, map a new data segment from a file, insert a code segment into a VDT (possibly creating a new RDT entry for it if no other program has loaded that executable/library already), create an empty segment with no pages present, create a new thread using a designated segment as the stack, etc. Something roughly like spawning or fork()-exec()-ing a new process on a traditional architecture would look something like this:

Create a blank segment and call a language-runtime function to initialize it as a stack.
Then, putting the needed segments into the new stack segment's VDT (or the VDT of some segment reachable from the stack segment, or one that will be kept loaded across the thread switch):
cow() the segments containing any resources that need to be brought over to the new thread as-is, but won't be shared
Set up any shared memory with the new thread
Set up new blank segments for any resources that will not be taken from or shared with the existing thread, and call the appropriate runtime functions to intialize them.
mmap() any files required by the new thread that aren't being used by the old thread, including the executable for the new thread.
Call the kernel function newThread() with the new stack segment and a stack pointer (obtained from the runtime function that initialized the stack) as an argument. This will create a new kernel scheduling entity, and, like fork(), will return in both the new and old threads, but unlike Unix fork(), will not copy anything that has not already been copied, and, unlike fork(), will have the stack set to the SS:SP designated, rather than a COW copy of the original thread's stack.

In the new thread:
Unload the segment register that was being used by the existing thread to set up the stack for the new thread (since that segment is now loaded as our stack segment).
Unload any other segment registers designating segments that should not be accessible to the new thread once it jumps into the new program.
Far jump to the code segment for the new program.

In the existing thread:
Unload the segment register that was being used by the existing thread to set up the stack for the new thread.
Unload any other segment registers that were being used by the existing thread to set up segments for the new thread.
If any segments used in setting up the new thread remain in the VDT of a currently loaded segment, and are not used by the existing thread, disown them from the relevant VDT.
Continue with whatever you were doing before you spawned the new thread.

There could also be a forkStack() call for instances in which it is desirable to have the new thread have a COW copy of the existing thread's stack. In this case, rather than initializing the new stack with a language runtime function, you would leave it blank, but fill in its VDT as described above. When ready to spawn the new thread, you'd call forkStack(), instead of newThread(), with the blank stack segment for the new thread as a parameter, which would COW the current stack into the blank segment and return in both threads. Like newThread(), and unlike Unix fork(), it would not copy anything (other than the stack) that had not already been copied.

On a traditional architecture, on the other hand, the abstraction of a process, as a monolithic address space with one or more associated threads that will all need that address space to run, is forced upon us (unless we're using a single address space system with protection in software), and I think fork() is one of the better ways of dealing with that abstraction when that's what the hardware allows. Now, there may be a bit of a chicken-and-egg thing going on as far as the traditional memory/protection model making fork() optimal, and fork() making the traditional memory/protection model optimal, so that other avenues aren't explored, but it's not just an issue of eeeeevil fork() holding us back.
User avatar
eekee
Member
Member
Posts: 872
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: To POSIX or not to POSIX

Post by eekee »

@bzt: thanks for the info! (Sorry, had to edit that in. I'm getting behind in my replies.)

@linguofreak: Thanks for the tldr, it helped me visualise the problem with fork on modern systems. Or perhaps older systems based on functional programming -- making strings immutable, for example.

It leaves me thinking of Ken Thompson's wish that processes were more lightweight. I don't know if Plan 9 is exactly applicable, I don't think it changes memory protection without changing priviledge level, but it does have very lightweight processes. Preemptively-scheduled threads (not coroutines) are processes. A multithreaded program is a process group. I mention all this because fork() in Plan 9 applies to individual processes, not the whole process group making up a typical application. I wonder what they would have done if they wanted multiple memory mappings or protection contexts in a single process? It's tempting to think they would have been all, "That's stupid, don't do that." (They did say that sort of thing rather a lot.) But then the same group went on to create Google Go which has immutable strings. *shrug*
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
User avatar
nyc
Posts: 17
Joined: Sun Dec 29, 2019 10:59 pm
Libera.chat IRC: nyc

Re: To POSIX or not to POSIX

Post by nyc »

nyc wrote:There are also features to consider removing outright, such as fork() as per https://www.microsoft.com/en-us/researc ... otos19.pdf or, perhaps, much of the tty/pty infrastructure in UNIX and POSIX beyond just devising better-working features instead of POSIX threading and asynchronous IO.
article wrote:Fork conflates the abstraction of a process with the hardware address space that contains it.
linguofreak wrote:tl;dr of my comments below:

What does "process" mean, independent of protection architecture? Is it even a meaningful term on a system where memory protection and/or memory mapping state changes autotmatically as control transfers are made within the same privilege level? In my opinion, "process" is not meaningful in the context of such an architecture. It is primarily meaningful in the context of current architectures, in which case it means "a single memory mapping / protection context", in which case a process is already conflated with a hardware address space before we even decide how we want to spawn processes. In such an environment, I think fork() makes sense.
I'm at a bit of a loss as to what to make of this. I went over the whole discussion of the hypothetical hardware and was vaguely reminded of IBM POWER SLB's. I'm not sure anything follows from that, though. The demands fork() makes still break invariants that would otherwise promote maintainability, scalability, and performance.
Last edited by nyc on Thu Jan 23, 2020 3:27 pm, edited 1 time in total.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: To POSIX or not to POSIX

Post by Korona »

The Microsoft research paper does not suggest that paging should be abandoned.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Qbyte
Member
Member
Posts: 51
Joined: Tue Jan 02, 2018 12:53 am
Location: Australia

Re: To POSIX or not to POSIX

Post by Qbyte »

Korona wrote:The Microsoft research paper does not suggest that paging should be abandoned.
To be fair, the paper was focused on fork(), not paging itself, so it could be that the authors are entirely open to the idea of abandoning paging if they were to design a modern system from scratch. I personally don't really care for paging all that much and it seems as though its reason for being is diminishing as hardware evolves. When we had a 32-bit address space and limited RAM, it was practically necessary to use paging, but we're reaching the point where it can be considered to be adding unneeded complexity, overhead, and limiting design.
linguofreak wrote:A microkernel implemented on such an architecture could implement message passing to/from servers in terms of function calls and returns, without needing the kernel as an intermediary: A program makes a far call to the library implementing the file system server, passing as an argument an empty segment waiting to be filled with data read from a file. The VDT for the segment containing the library has an entry for the file system server's global data area, which the library then loads. It services the read call and returns to the program, and if the kernel gets called at all, it's for things the server actually needs the kernel to do for it, not for message passing.
That seems pretty convoluted and it's not even clear what advantages that has over a simple conventional system call. At that point, you might as well implement i/o primitives within the kernel and allow libraries to implement functionality, rather than forcing everything to go through a server.

Also, there's a much simpler way to achieve some of what you want: just allow all processes to execute code from anywhere in memory. There are no security concerns here because the process is sandboxed and can only write to memory regions the kernel has allowed it to (via the MMU/MPU) and since it is not in kernel mode, it can't actually execute any privileged instructions even if it jumps to code that exists within the kernel. That way, most things can be treated as a simple function call where you just pass arguments/pointers in registers and jump to the code you want, without having to do IPC or system calls.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

Korona wrote:The Microsoft research paper does not suggest that paging should be abandoned.
Neither do I. Note that each of the segments in my hypothetical architecture its own paged address space.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

Qbyte wrote:That seems pretty convoluted and it's not even clear what advantages that has over a simple conventional system call.
The advantage over a system call to a monolithic kernel on a traditional architecture is that, as with a microkernel on a traditional architecture, the various drivers are isolated from each other and from the kernel.

The advantage over a call to a microkernel server on a traditional architecture is that you're not making a kernel call to send a message to a different scheduling entity on either end of the transaction: you make a far call, the hardware automatically changes your mapping/protection context without needing to go through the kernel, the code on the far end has access to things you don't have access to and uses that to fulfill your request, and then makes a far return to the caller, whereupon the hardware changes the mapping/protection context back.
At that point, you might as well implement i/o primitives within the kernel and allow libraries to implement functionality, rather than forcing everything to go through a server.
I used the word "server" for the components providing the functionality that servers provide with a traditional microkernel, but it's debatable whether that's the right word to use, as, like kernel-mode drivers under monolithic kernels, many "servers" in this case might not have their own separate threads, and requests to "servers" would be via call/return, not message passing. On the other hand, unlike a monolithic kernel, and like a microkernel, the "servers" would be independent of the kernel and would not have access to kernel data, nor to each other's data. I'm not sure quite what to call these "servers"; "server" tends to imply a dedicated scheduling entity and message passing, "driver" tends to imply operation in kernel mode.
User avatar
nyc
Posts: 17
Joined: Sun Dec 29, 2019 10:59 pm
Libera.chat IRC: nyc

Re: To POSIX or not to POSIX

Post by nyc »

Qbyte wrote:That seems pretty convoluted and it's not even clear what advantages that has over a simple conventional system call.
linguofreak wrote: The advantage over a system call to a monolithic kernel on a traditional architecture is that, as with a microkernel on a traditional architecture, the various drivers are isolated from each other and from the kernel.

The advantage over a call to a microkernel server on a traditional architecture is that you're not making a kernel call to send a message to a different scheduling entity on either end of the transaction: you make a far call, the hardware automatically changes your mapping/protection context without needing to go through the kernel, the code on the far end has access to things you don't have access to and uses that to fulfill your request, and then makes a far return to the caller, whereupon the hardware changes the mapping/protection context back.
I think VMS did privilege transitions on calls on the VAX as per http://h30266.www3.hpe.com/odl/vax/opsy ... o_078.html, albeit with a different mechanism from segments. When the smoke clears, these things don't really make system calls that much faster; what does is slimming down the amount of state in the CPU. Maybe that's a discussion better had on https://www.opencores.org or similar.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

nyc wrote: I think VMS did privilege transitions on calls on the VAX as per http://h30266.www3.hpe.com/odl/vax/opsy ... o_078.html, albeit with a different mechanism from segments.
That appears to have been done through the kernel ("However, because a call to a routine in a more privileged mode must be vectored through the system service dispatch routine...", and without switching address spaces (like x86, the VAX had 4 privilege levels, and VMS seems to have used all four).

The point of my mechanism is to transfer control directly between components running at less than kernel privilege without involving the kernel, and allowing each component to have state that the others don't have access to. Ringed privilege levels allow higher privilege levels to prevent lower privilege levels from accessing their state, but lower privilege levels can't prevent higher privilege levels from accessing their state, nor can two components at the same privilege level keep state from one another unless they're operating at different privilege levels. That's why I propose the VDT mechanism: every segment switch then changes what segments are currently visible to the running code. It's not segmentation per se that's key to my proposal, it's the VDT mechanism.
When the smoke clears, these things don't really make system calls that much faster; what does is slimming down the amount of state in the CPU. Maybe that's a discussion better had on https://www.opencores.org or similar.
The point isn't making system calls faster, it's turning as many system calls as possible into userspace <-> userspace calls without sacrificing too much in performance. Things are going to be slower than just nipping into kernel mode and back under a monolithic kernel on a traditional architecture, but hopefully not too much slower, and hopefully much faster than message passing to a separate process under a microkernel on a traditional architecture. The idea is not to increase absolute performance, but to get a better combination of security and performance than is available on traditional architectures.

That said, other capability architectures have failed in the past on sacrificing too much in performance. This idea does not attempt to be so fine-grained as other capability architectures, and so would hopefully be faster, but still could end up failing for the same reason.
Qbyte
Member
Member
Posts: 51
Joined: Tue Jan 02, 2018 12:53 am
Location: Australia

Re: To POSIX or not to POSIX

Post by Qbyte »

linguofreak wrote:The advantage over a system call to a monolithic kernel on a traditional architecture is that, as with a microkernel on a traditional architecture, the various drivers are isolated from each other and from the kernel.
That can be achieved in a much more simple and efficient way with user space drivers. That is, implementing drivers as library code that applications can directly link to. The kernel provides an i/o primitive like "send_packet()" and the job of the driver is to simply translate generic library functions into command packets that the specific device can understand. The driver would create the packet, and then make the "send_packet()" system call and the kernel would be responsible for actually sending the packet. This completely eliminates the need for servers and message passing and all the overhead and complexity that accompanies them, while still achieving the microkernel goal of isolating drivers and getting them out of the kernel.

This can be taken a step further with dedicated hardware support, whereby the kernel can set up hardware flags/registers (like an IOMMU) in advance that controls which hardware resources each process can access and in what ways. That way, even fewer system calls are required as a process can have direct access to a network port for example instead of requiring kernel involvement each time it wants to send a packet. In other words, the OS becomes an exokernel (which are essentially a subset of monolithic kernels).
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

Qbyte wrote:
linguofreak wrote:The advantage over a system call to a monolithic kernel on a traditional architecture is that, as with a microkernel on a traditional architecture, the various drivers are isolated from each other and from the kernel.
That can be achieved in a much more simple and efficient way with user space drivers. That is, implementing drivers as library code that applications can directly link to. The kernel provides an i/o primitive like "send_packet()" and the job of the driver is to simply translate generic library functions into command packets that the specific device can understand. The driver would create the packet, and then make the "send_packet()" system call and the kernel would be responsible for actually sending the packet. This completely eliminates the need for servers and message passing and all the overhead and complexity that accompanies them, while still achieving the microkernel goal of isolating drivers and getting them out of the kernel.
That isolates drivers from the kernel, but de-isolates them from applications, and from each other. If the drivers (or other components that might be spun out into a server on a traditional microkernel) have no system-global state that applications can't be allowed to see, and if the devices they drive have can be securely and efficiently driven by the kind of packet interface you describe without any device-specific code in the kernel, then that idea just might work, but I have doubts on both points *especially* the first. Your idea also doesn't do anything to prevent a buggy or malicious driver from corrupting application data.

If drivers have system-global state that has to be isolated from applications, then they either need to be part of the kernel (or, at least, operating in a higher privilege ring that applications), part of a userspace process that other processes pass messages to, or the hardware needs to support some means of changing memory mapping/protection on control transfers.
This can be taken a step further with dedicated hardware support, whereby the kernel can set up hardware flags/registers (like an IOMMU) in advance that controls which hardware resources each process can access and in what ways. That way, even fewer system calls are required as a process can have direct access to a network port for example instead of requiring kernel involvement each time it wants to send a packet. In other words, the OS becomes an exokernel (which are essentially a subset of monolithic kernels).
And that's all well and good until you end up with contention between two different processes with direct access to the same device.
Qbyte
Member
Member
Posts: 51
Joined: Tue Jan 02, 2018 12:53 am
Location: Australia

Re: To POSIX or not to POSIX

Post by Qbyte »

linguofreak wrote:That isolates drivers from the kernel, but de-isolates them from applications, and from each other. If the drivers (or other components that might be spun out into a server on a traditional microkernel) have no system-global state that applications can't be allowed to see, and if the devices they drive have can be securely and efficiently driven by the kind of packet interface you describe without any device-specific code in the kernel, then that idea just might work, but I have doubts on both points *especially* the first.

If drivers have system-global state that has to be isolated from applications, then they either need to be part of the kernel (or, at least, operating in a higher privilege ring that applications), part of a userspace process that other processes pass messages to, or the hardware needs to support some means of changing memory mapping/protection on control transfers.
Nothing under this scheme prevents drivers from being implemented as servers if that is desirable for a given device. The important point is that drivers shouldn't be forced to be servers because most don't need to be and there are pretty large penalties for it.
Your idea also doesn't do anything to prevent a buggy or malicious driver from corrupting application data.
User space drivers are known to be far less buggy than kernel space drivers because a hell of a lot less can go wrong and they are much easier to develop. In any case, the situation here is really no different to regular libraries which can potentially contain bugs, but that doesn't stop anyone from using them.
And that's all well and good until you end up with contention between two different processes with direct access to the same device.
That's not a problem. The OS still knows which processes have made a connection to what device and can decide whether or not to give access to a device that already has a connection. Many devices are also perfectly able to communicate with multiple processes concurrently, such as via process id's in the packet header, etc.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

Qbyte wrote: Nothing under this scheme prevents drivers from being implemented as servers if that is desirable for a given device. The important point is that drivers shouldn't be forced to be servers because most don't need to be and there are pretty large penalties for it.
As I've said a few posts back, while I used the word "server" in my initial description of the scheme, it's debatable if "server" is the right word for drivers under my scheme. The driver is isolated from data that the calling application does not choose to share with it, and its data is isolated from the calling application, as with a microkernel server on a traditional architecture, but unlike such a server, the driver is called with essentially a direct library call, there is no mode switch to kernel mode to send a message, no process switch and attendant TLB flush* to deliver the message, etc, which has to happen going both ways on a traditional architecture. These elements make up much of the penalty for a call to a server, and hopefully the penalties eliminated would be much larger than the new penalties imposed by the additional complexity of the CPU.

One way of saying what I'm trying to do is that I'm trying to allow a microkernel to be built without having to use the server model (with its attendant penalties) to implement drivers.

*There are CPUs where each TLB entry contains an Address Space ID, in which case the TLB flush is eliminated, but other penalties associated with the kernel call and message passing remain. My model allows a program to use multiple ASIDs at once, with each ASID having access to certain other ASIDs, and allows a direct call to the driver.
That's not a problem. The OS still knows which processes have made a connection to what device and can decide whether or not to give access to a device that already has a connection. Many devices are also perfectly able to communicate with multiple processes concurrently, such as via process id's in the packet header, etc.
The problem is that the kernel may not be able to decide whether a device can accept a new connection without device-specific information. Now, depending on what the bus protocol looks like on the system in question, there may be enough structure that the device-specific part can be done by a single piece of code using a table to determine device characteristics, rather than needing to have different code for every device, but if that isn't the case then at least some part of the driver has to have access to a private data area where it can keep track of application requests and the state of the device, and that area has to be isolated from applications. On a traditional architecture, this means it has to be either in the kernel (on monolithic kernels), or has to be its own process (a server on a microkernel).
Qbyte
Member
Member
Posts: 51
Joined: Tue Jan 02, 2018 12:53 am
Location: Australia

Re: To POSIX or not to POSIX

Post by Qbyte »

linguofreak wrote:As I've said a few posts back, while I used the word "server" in my initial description of the scheme, it's debatable if "server" is the right word for drivers under my scheme.
I would define a server most generally as called code that isn't part of the kernel or the application (libraries are part of the application because they are directly mapped into it). Since the code under your scheme has a different context to the caller (memory access and capabilities), it still qualifies as a server.
The driver is isolated from data that the calling application does not choose to share with it, and its data is isolated from the calling application, as with a microkernel server on a traditional architecture, but unlike such a server, the driver is called with essentially a direct library call, there is no mode switch to kernel mode to send a message, no process switch and attendant TLB flush* to deliver the message, etc, which has to happen going both ways on a traditional architecture. These elements make up much of the penalty for a call to a server, and hopefully the penalties eliminated would be much larger than the new penalties imposed by the additional complexity of the CPU.
This seems like it would introduce a lot of hardware complexity for no net gain. The exokernel approach of giving applications direct but secure access is just a better and simpler scheme imo.
The problem is that the kernel may not be able to decide whether a device can accept a new connection without device-specific information.
This also isn't a problem because the driver itself knows whether or not the device it is attempting to connect to is capable of multiple connections. The kernel simply keeps track of what devices already have a connection and the driver can query the kernel to see if it is in the clear to connect to the device.
Now, depending on what the bus protocol looks like on the system in question, there may be enough structure that the device-specific part can be done by a single piece of code using a table to determine device characteristics, rather than needing to have different code for every device, but if that isn't the case then at least some part of the driver has to have access to a private data area where it can keep track of application requests and the state of the device, and that area has to be isolated from applications. On a traditional architecture, this means it has to be either in the kernel (on monolithic kernels), or has to be its own process (a server on a microkernel).
Drivers can also make use of shared memory to manage multiple contexts across different processes and avoid message passing to servers, but as you said that's usually unnecessary with well designed protocols.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: To POSIX or not to POSIX

Post by linguofreak »

Qbyte wrote:
The problem is that the kernel may not be able to decide whether a device can accept a new connection without device-specific information.
This also isn't a problem because the driver itself knows whether or not the device it is attempting to connect to is capable of multiple connections. The kernel simply keeps track of what devices already have a connection and the driver can query the kernel to see if it is in the clear to connect to the device.
But since the driver in your model is just a userspace library that calls send_packet() to send packets to a particular device, what's to prevent a malicious userspace application (that may or may not have the driver loaded) from waiting for a device that can only accept one connection to have a connection, and then calling send_packet() in order to cause mayhem?

Or for that matter, how do you enforce filesystem permissions? Under your model any process that reads or writes anything to disk has to have permission to send_packet() to the disk, in which case malicious code can bypass the filesystem driver (which is called without context switch by userspace code and calls the disk driver in userspace without context switching) and do raw disk I/O directly with send_packet(), or else you have to either put the filesystem and disk driver in kernel space like a monolithic kernel, or put them in their own processes like a microkernel, or have special hardware arrangements like my model, or else move the filesystem entirely out of the CPU's purview and into the disk firmware, effectively turning the disk into NAS (except with communication directly over the system bus).
Post Reply