Brendan wrote:
rdos wrote:3. Separation with segmentation in kernel is an adequate method of providing protection between modules in a kernel.
4. Separation with paging only in kernel is typically not an adequate method, instead many designs use a microkernel and let driver modules run in isolated user-address spaces. At least similar protection as with segmentation cannot be achieved without a microkernel design.
For monolithic kernels the basic idea is that kernel modules can be trusted and therefore "adequate" means no protection between modules is needed at all (and no overhead). For micro-kernels the basic idea is that these modules can't be trusted and therefore it's necessary to (for e.g.) prevent one module from accessing another module's data and also prevent a module from accessing a normal process' data.
Separating modules with segmentation can work, but if you trust the module then the overhead isn't necessary and if you don't trust the module then you have to make sure one module can't load another module's segments or load a normal process' segments (which means having an LDT for each module and each process, and doing "LLDT" every time control is passed).
Yeah, this is one of my big frustrations with 386 segmentation. If I were designing a segmented paged architecture, I'd have things structured something like this:
Instead of dividing things up into a local and a global descriptor table, I'd put all segment descriptors into the GDT (though I might break it up into several tables so that a contiguous block of memory didn't have to be reserved for it). Then, instead of a local *descriptor* table, I'd have each code segment (and possibly each stack segment) have a pointer to a local *selector* table (LST). There would be two kinds of selectors, global selectors, which would be valid for direct use as indexes into the GDT (though loading a segment register this way would only be permitted in kernel mode), and local selectors, which would be per-process (or rather per-code/stack segment) and would index into the LST. Each entry in the LST would then contain a global selector value and various permission bits (so that a globally read/write segment could be restricted to read-only access for a particular program, for instance).
When a local selector was used in a segment register load (this would be required in user-mode), the hardware would do the following:
1)Use the local selector as an index into the LST to select an LST entry.
2)Use the global selector found in the selected LST entry as an index into the GDT.
3)Load the segment register in question from the selected GDT entry. If any permissions in the LST entry are more restrictive that those in the GDT entry, load those fields of the segment register with the corresponding fields of the LST entry instead.
This would allow a user-mode program to call a user-mode driver directly without message-passing through the kernel or a full context switch, while still preventing the program in question from calling just any code on the system it wanted to, and while preventing the driver from accessing private program data.
I can still think of a few bugs that would have to be worked out (largely relating to far returns), but something like this would be necessary for segmentation to be useful on a paged architecture.
I actually used this for one of my earliest protected mode kernels and abandoned the idea because half the kernel's code ended up duplicated (different virtual memory management for normal processes and modules, different IPC for normal processes and modules, different executable file formats for normal processes and modules, different kernel APIs for normal processes and modules, etc). The other problem was space - the 4 GiB virtual address spaces were split into 2 GiB for the process, 1 GiB for the kernel, and 1 GiB shared by all of the "system modules"; which meant that you run out of space for modules very quickly (e.g. two video card drivers with 512 MiB of memory mapped IO each and 2 GiB of "VFS disk caches" and you're screwed). Of course I shifted to a micro-kernel, so a lot of code duplication disappeared and now each "system module" gets 3 GiB of space to use.
This is another reason I'd like segment descriptors to reference page directories rather than offsets and limits into a paged address space: It would allow each segment to be up to 4 GiB (on a 32-bit system) without having to overlap with any other segment (assuming the physical address space is more than 32-bits wide, or that swap space is available).
rdos wrote:5. Long mode addresses can be used in a pseudo-segment related manner by treating the upper 32-bits as a segment, and keeping various pieces at "random" locations. However, typical uses of 64-bit mode use the 2G lower addresses or 2G higher addresses, which provide no protection at all in addition to a 32-bit flat memory model.
Typical use of paging in long mode is "several TiB" for each 64-bit process, a full 4 GiB for each 32-bit process and about 512 GiB for kernel.
Note: I don't know why people seem to limit the kernel to 512 GiB (especially for monolithic kernels), but I assume it's so that the kernel is contained in one page directory pointer table.
It probably also has to do with the fact that few systems yet have that much physical memory, let alone need it for kernel data.
In practice I don't think there are many ways to use segmentation where the advantages outweigh the disadvantages.
Not on any existing hardware, as far as I can see. I love the idea of segmentation and microkernels (which, I think, need a really powerful segmentation mechanism to work), but there's no hardware on which anything but a flat-address space and monolithic kernel is feasible.