Schol-R-LEA wrote:We've gone over this many, many times before, rdos. Segmentation is not, and has never been, a protection mechanism, any more than paging is. While the protection mechanisms work together with them, it is
not the protection mechanism, nor does it provide any more or any less protection than paging does.
rdos wrote:The same scenario in long mode can lead to corruption of physical memory, vital kernel data, application data in another process, and even PCI BAR data.
That's simply not true, or rather, the claim that segmentation would prevent it is incorrect. Supervisor-mode pages have exactly the same degree of protection as supervisor-mode segments - a wild userland pointer to a supervisor data page is still going to be blocked by the protection mechanisms, because the page is marked as supervisor access only. A wild pointer in the kernel? True, that can access any virtual address currently mapped for the process, but the majority of addresses
won't be mapped at all, meaning that a page fault will be caught by the memory manager, which presumably can determine that the page shouldn't be accessible and raise a protection fault. If it does hit an address that is live, then yes, a kernel bug can have the effect you describe - but the same is just as true with segmentation. A corrupted supervisor-mode pointer is a supervisor-mode pointer, period.
A corrupted supervisor far pointer is a corrupted supervisor far pointer, and, given the small size of the x86 descriptor tables, if you're making heavy use of segmentation, a large proportion of possible selector values are likely allocated (though if the corruption includes the low-order bits of the selector and affects those bits completely randomly, the RPL check will save you 75% of the time if the selector points to a ring-0 descriptor). This also applies to loading a segment before doing a bunch of near pointer work in that segment.
But, assuming a separate bug did not cause the incorrect segment to be loaded, a corrupted supervisor near pointer can only affect the relevant segment. If it tries to access an address beyond the segment's limit, you'll get a fault, and the fraction of addresses *within that segment* that are valid is going to be less than or equal to the total fraction of the logical address space that is valid (equal to it only if you have a segment covering the whole logical address space). In fact, with far pointers, this will actually stack with the RPL check and whatever proportion of unallocated selectors you do have.
There are definite protection benefits afforded by non-flat address spaces, but Intel segmentation is a clunky implementation of the non-flat address space concept:
1) The use of base-offset within a global paged address space impacts performance and means that the sum of all simultaneously loaded segments has to fit within the size of the paged address space. It would be better to have each "segment" be a full paged address space with no "global" paged address space (multiple CR3s, one per segment register, and a CR3 value rather than a base-offset as part of each segment descriptor).
2) The limited width of the segment registers, and the use of two bits in the selector for the RPL, makes the pool of segments that can be addressed at any one time far too limited. Wider segment registers would be better: with a 32 bit selector you could probably even keep the RPL if you wanted, with a 64-bit selector (possibly with a narrower width like 48 bits in the implementation, sign-extended for forward compatibility to a full 64-bit selector) you definitely could, though I'm not sure the function of the RPL couldn't be better implemented by other mechanisms.
3) Intel segmentation comes close to being a capability system, but isn't quite there. There are massive potential benefits for microkernels if you have a non-flat addressing scheme that does act as a capability system. This could be implemented by having a "System Descriptor Table", that has descriptors for ever segment/address space in the system containing the actual addressing information for that segment (base+offset if you're doing actual segmentation, or "CR3" if you're doing paged address spaces). You wouldn't be able to directly load an SDT selector into a segment register: every code segment, and every data segment used as a stack segment, would have a "Virtual Descriptor Table", that specifies what segments are loadable when that segment is loaded as CS/SS. The descriptors in the VDT wouldn't contain direct addressing information, but rather would contain a selector pointing into the SDT. (It's tempting to call these "Global" and "Local" descriptor tables, rather than "System" and "Virtual", but as Intel segmentation uses that terminology for a different arrangement, that would only invite confusion).
1) and 2) especially, and to some degree 3), are due to back-compatibility with the 8086 and 286, but there's a scheme, I think, that could work towards alleviating these issues while maintaining back-compatibility (probably not a big issue these days, but this could have been helpful for Intel in developing the 386, or for AMD when developing the x86-64):
If you go with 32 or 64-bit segment selectors, your segment tables are going to need a similar sort of multi-level scheme to what's used in page tables. So you split your VDT selectors into a lower and an upper part. The lower part is 16-bit, and indexes into the lowest level of the VDT. The upper part indexes into the remaining levels. The lowest level of the VDT indexed by the upper part (the second lowest level overall), has a two-bit "legacy type" field in its table entries. This can have values of "none", "protected", "real, megabyte aligned" or "real with offset". If the type is "none", then the table entry points to the lowest level of the VDT, indexed by the low 16 bits of the selector, and the entries in that table are VDT entries each pointing to an SDT selector. If the type is anything *other* than "none", the second-level VDT entry is a selector into a "legacy environment descriptor table" (and there is no lowest level of the VDT). If the type is "protected", then the LEDT entry contains an SDT selector (pointing to a paged address space), and a pointer to a legacy GDT, whose descriptors use the address space designated by the SDT selector as their logical address space. The low 16 bits of the VDT selector, instead of indexing in to the lowest level of the VDT, index into the designated GDT. If the type is "real, megabyte aligned", the LEDT entry contains an SDT selector and an offset (at megabyte granularity) into the designated address space. If the type is "real with offset", then the LEDT entry contains a SDT selector and an offset (at 16-byte granularity) into the designated address space.
In both "real" legacy modes, the low 16 bits of the VDT selector, rather than indexing into the lowest level of the GDT, or into an LDT, are simply added to (or in the case of "megabyte aligned", concatenated with) the offset in the LEDT entry. This allows for a single address space to host multiple real-mode environments at different offsets (megabyte-aligned mode has the advantage of requiring one less addition, but can't handle any real mode code that depends on the HMA mode existing, offset mode requires an extra addition but allows for an HMA).
If CS contains a selector whose upper part has a legacy type of "none", then the standard segment register manipulation instructions manipulate the whole segment register (though you might have prefixed instructions that manipulate either part individually). If CS contains a selector whose legacy part is anything other than "none", then the standard segment register manipulation instructions only deal with the lower 16 bits (though you might have prefixed instructions that manipulate the whole register or the upper part). So to run a program that uses legacy segmentation, you load all the segment registers with selectors whose upper parts have legacy types other than "none" and all point to the same LEDT (probably selectors whose upper parts are, in fact, identical), and then far jump into a code segment with an upper part that uses the same LEDT. Here you don't have a specific "real" or "compatibility mode", you just have special segment types.
It is, at least, an interesting road-not-taken.
As for drivers, well, either they are running in supervisor mode - whether intrinsic to the kernel as with a monolithic kernel, or loaded as modules, as with most hybrid models - or they are in a separate process, as with a microkernel system. For microkernels, the drivers would be covered by the protection mechanisms the same as the user processes are (even if one were to use the intermediate ring 1 or ring 2 levels). For supervisor-mode drivers - whether loadable or not - then it becomes a matter of trust, again regardless of whether segmentation is used or not.
With a capability-structured segmentation system, this wouldn't necessarily be the case.
The only way what you are describing could work is if the driver segments are run in supervisor mode, but mapped separately from the kernel to their own code, stack, and data segments. As far as I am aware, this isn't possible - supervisor-mode memory will all have the same memory mapping, meaning that the kernel would have the same segmentation as the drivers. I can't see any way you can have separate segments within the supervisor memory space for the drivers distinct from the kernel itself - nor can I see how this differs from doing the same with paging, if so. As iansjack said, you can just as easily use separate page tables as you can separate segments.
For well-intentioned (but possibly buggy) drivers using near pointers for their own data, segmentation does provide a fair bit added protection against wild pointers. For poorly written drivers that use far pointers everywhere, it will improve the probability of a wild pointer causing a fault (instead of further memory damage) somewhat, but not eliminate the danger entirely, and against malicious drivers it does nothing.
I will again ask you a question you dodged previously: aside from x86, what other modern ISAs which support virtual memory (i.e., not a microcontroller) have you worked with? It is no coincidence that none of them use segmentation, because more or less all of them have had 32-bit or 64-bit memory addressing from the outset, and didn't need a hack to make a larger address space out of overlapping 16-bit memory addresses.
ESA/390 and z/Architecture have fully-paged, non flat addressing. The 360/370/390/z line was never as cramped as a 16-bit address space, and it's 24-bit addressing days were behind it when the non-flat features were introduced, so I believe they were introduced for the benefits described above rather than to deal with a cramped address space. Of course, physical hardware isn't available for hobbyists (but Hercules exists), and z/Linux doesn't use the non-flat features (and the OSes that do are heavily proprietary and you can't get a license to run them on Hercules).
The implementation 390 and z/Arch use is largely what I have described above (minus the bits where I talked about how to accomplish back-compatibility with legacy Intel segmentation in such an implementation, and with some really opaque documentation. IBM manuals use a *ton* of non-standard terminology for common concepts, because they were there before everyone else, so they just kept using the terminology they had while the rest of the industry standardized on different terms).
What are you going to do if - or rather, when - Intel drops 32-bit protected mode, the same way they apparently plan to drop real mode? True, it probably won't be any time soon, but it is almost certainly coming - assuming that x86 remains the dominant desktop platform in the first place, which is increasingly unlikely with the growth of ARM platforms with comparable performance to the best x86-64 CPUs. What will you do if there are no more segmented platforms in common use?
Well, he always has the option of porting OpenWatcom to target z/Architecture, porting his OS to run on z/Architecture, and running his code under Hercules on the physical platform of his choice.