Page 1 of 2

Can overlap segments achieve protection?

Posted: Fri Feb 06, 2015 3:28 am
by angwer
Hi,

From all the information I learn segments can be used for protection. Codes in segments with different protection level usually can not access each other without using gates. But from an OS (not any famous one, it is on a book written in Chinese, mimicking Minix) I've studied, I can directly call subroutines in kernel running on level 0 from processes running on level 1.

In the OS, the kernel is running on level 0 and some system processes run on level 1. The processes are compiled with the kernel into a single image. There is no paging only segmentation. The kernel uses segments defined in GDT and every process has its own LDT. The two segments the kernel uses, one for code and one for data, code segment is not conforming, both ranging from 0 to 4GB. And the processes, use the same segments except they are defined in LDT and the protection level is 1. Thus, from my understanding, all the kernel and processes can access the whole physical memory. So the processes can access code and data in the kernel not using system calls, producing no GPF.

Is my understanding right?

And the Intel manual says:
More complexity can be added to this protected flat model to provide more protection. For example, for the paging mechanism to provide isolation between user and supervisor code and data, four segments need to be defined: code and data segments at privilege level 3 for the user, and code and data segments at privilege level 0 for the supervisor. Usually these segments all overlay each other and start at address 0 in the linear address space. This flat segmentation model along with a simple paging structure can protect the operating system from applications, and by adding a separate paging structure for each task or process, it can also protect applications from each other. Similar designs are used by several popular multitasking operating systems.
It seems overlap segments can be used to protect one level from another. Or it is the paging that functions. If it is the paging the functions, what is the reason to use the overlap segments, providing independent virtual memory?

Re: Can overlap segments achieve protection?

Posted: Fri Feb 06, 2015 6:07 am
by alexfru
If page translation is disabled and segments of different privilege levels overlap, there's no protection in the overlapping regions. So, your understanding of this configuration in the OS is correct.

If page translation is enabled and kernel and user pages have different privilege levels, overlapping segments can still be protected, but now at the page translation level.

Another option is to make some of kernel code accessible within a conforming code segment. This code can be called into (using a far call, AFAIR) from both kernel and user code, but it will run with the privilege of the caller.

Protecting processes from each other can be done by using either non-overlapping segments or individual address spaces (each would have its own page table (sub)hierarchy, non-overlapping in physical memory).

Re: Can overlap segments achieve protection?

Posted: Fri Feb 06, 2015 7:44 am
by angwer
Thanks alexfru, it's nice to hear a certain answer. What about the second question? Do you have idea about the meaning of overlap segments usage mentioned in the Inter manual?

Re: Can overlap segments achieve protection?

Posted: Fri Feb 06, 2015 10:14 am
by JAAman
angwer wrote:Thanks alexfru, it's nice to hear a certain answer. What about the second question? Do you have idea about the meaning of overlap segments usage mentioned in the Inter manual?
what they are referring to here (in the quote from your OP) is the standard way used by all significant OSes:

all segments are exactly the same, all segments base == 0, all segments limit is 4GB (covering the entire virtual address space)

this effectively disables segmentation, since all segments are exactly the same, segmentation does nothing at all, and all isolation/protection is provided only by paging
for this scheme you still need separate ring0 and ring3 code segments, and ring0 and ring3 data segments, so you need at least 4 segments to be defined (but you should set them all to be exactly equal in what they cover), and you will also need GDT entries for a few other things, but those aren't technically segments

this is the standard way almost all OSes use, windows has primarily used this method since win3 (for 32-bit applications/drivers, although win3 uses segmentation for 16-bit applications/drivers) or win95(which, iirc, uses paging and not segmentation, even with 16-bit processes, as much as it can while still maintaining compatibility)

Re: Can overlap segments achieve protection?

Posted: Sat Feb 07, 2015 2:45 am
by angwer
Thanks for your reply JAAman. Actually when I read your answer I still don't quite understand why there are four segments not two, with a same privilege. I think maybe the paging matters. After searching the manual I found I am right. Pages will not provide privileges, they only test. So we still need two different privilege segments, one less than 3, one 3.

Thanks. It seems all questions about os dev can be answered here. :D

Re: Can overlap segments achieve protection?

Posted: Sat Feb 07, 2015 4:57 am
by alexfru
angwer wrote:Actually when I read your answer I still don't quite understand why there are four segments not two, with a same privilege.
1. kernel code segment
2. kernel data/stack segment
3. user code segment
4. user data/stack segment

That's the minimum for anything useful.

Re: Can overlap segments achieve protection?

Posted: Tue Feb 17, 2015 6:24 am
by rdos
angwer wrote:Thanks alexfru, it's nice to hear a certain answer. What about the second question? Do you have idea about the meaning of overlap segments usage mentioned in the Inter manual?
It was the initial intention of Intel that developers and OSes should use segmentation for protection, but unfortunately C-compilers are lousy at producing efficient segmented code, and in addition to that Intel made a big mistake when they kept the segment registers 16 bit in their 32-bit environment.

In my OS, I use segment protection in a simplified version in the kernel. Each driver gets it's own code and data segment, and some drivers also allocate additional segments at runtime. However, due to Intel's mistake with the width of segment registers, things that need to be allocated in larger amounts use flat addresses, leaving those parts of the kernel vulnerable to the usual problems of accessing things outside of limits.

Re: Can overlap segments achieve protection?

Posted: Tue Feb 17, 2015 7:43 am
by Brendan
Hi,
rdos wrote:
angwer wrote:Thanks alexfru, it's nice to hear a certain answer. What about the second question? Do you have idea about the meaning of overlap segments usage mentioned in the Inter manual?
It was the initial intention of Intel that developers and OSes should use segmentation for protection, but unfortunately C-compilers are lousy at producing efficient segmented code, and in addition to that Intel made a big mistake when they kept the segment registers 16 bit in their 32-bit environment.
Erm...

8086 was a 16-bit CPU that was "sort of source compatible" with older 8-bit Intel CPUs. The segment registers were a silly hack so that the CPU could use more than 64 KiB (without needing larger registers). It was ugly, but Intel didn't care much - at the time 8086 was just a temporary thing to keep people happy until their Intel's iAPX 432 was ready, and then (they hoped) it'd die.

Intel's iAPX 432 was a hideous thing - designed for high level languages, with built in support for things like object oriented programming, garbage collection, etc (it's not like you need a managed environment for these things ;) ). The "temporary" 8086 got popular fast, and the iAPX 432 failed.

Eventually 8086's 1 MiB limit got too, um, limiting. Intel wanted to increase that, but they also hadn't quite given up on some of the failed ideas from the failed iAPX 432 chip either. By this time Intel had also learnt the importance of backward compatibility - they needed to make it work so that old software designed for the 8086's segmentation could at least run on a new protected mode OS. They combined the silly hack (from 8086) with failed ideas (from iAPX 432) and "80286 protected mode" was born.

Next comes 80386. How to make it 32-bit while providing the crucial backward compatibility? They extended the "silly hack combined with failed ideas" so that old software designed for older 80x86 CPUs would still be able to run on a new 32-bit OS. Of course Intel was getting smarter - they also added paging. Most OSs abandoned segmentation (except for where it's necessary to execute old software designed for older CPUs - e.g. DOS and 16-bit windows programs). OS/2 was the only OS that bothered with segmentation for new 32-bit executables, and it paid the price (in terms of additional complexity for a feature nobody bothered to use). It turned out that given the choice between "safety" (from segmentation) and performance (from not having to do the protection checks that segmentation required), every sane programmer chose performance.

Things ticked along nicely for a while, with a few extensions to paging to support systems with more than 4 GiB of RAM. Eventually both Intel and AMD decided it was time for 64-bit. Intel wanted people to shift to a "not so backward compatible" Itanium (where they could lock out competitors). AMD had other plans.

AMD continued the old tradition - they provided enough backward compatibility so that old software designed for older (32-bit) CPUs would still be able to run under a 64-bit OS; which meant keeping segmentation for long mode. For 64-bit code (where backward compatibility wasn't important) they did what everyone had been hoping for - they took the old/deprecated "silly hack combined with failed ideas, now with 32-bit extensions" out behind the back shed and ended its suffering.

"Here lies segmentation, born 1978, died 2003, loved by nobody."


Cheers,

Brendan

Re: Can overlap segments achieve protection?

Posted: Tue Feb 17, 2015 9:25 am
by rdos
Brendan wrote: 8086 was a 16-bit CPU that was "sort of source compatible" with older 8-bit Intel CPUs. The segment registers were a silly hack so that the CPU could use more than 64 KiB (without needing larger registers). It was ugly, but Intel didn't care much - at the time 8086 was just a temporary thing to keep people happy until their Intel's iAPX 432 was ready, and then (they hoped) it'd die.
There are still 8086 compatible designs running. This environment is kind of as bad (or good, whatever) as any flat memory model environment as segments are just a way to increase address space. They have no limits and no base (other than a hardcoded one).
Brendan wrote: Intel's iAPX 432 was a hideous thing - designed for high level languages, with built in support for things like object oriented programming, garbage collection, etc (it's not like you need a managed environment for these things ;) ). The "temporary" 8086 got popular fast, and the iAPX 432 failed.
Yes, but not because of Intel. It was the usage of 8086 that made it popular, and the non-usage of the other design.
Brendan wrote: Eventually 8086's 1 MiB limit got too, um, limiting. Intel wanted to increase that, but they also hadn't quite given up on some of the failed ideas from the failed iAPX 432 chip either. By this time Intel had also learnt the importance of backward compatibility - they needed to make it work so that old software designed for the 8086's segmentation could at least run on a new protected mode OS. They combined the silly hack (from 8086) with failed ideas (from iAPX 432) and "80286 protected mode" was born.
Kind of. 80286 protected mode was ok, but it got used too much.
Brendan wrote: Next comes 80386. How to make it 32-bit while providing the crucial backward compatibility? They extended the "silly hack combined with failed ideas" so that old software designed for older 80x86 CPUs would still be able to run on a new 32-bit OS. Of course Intel was getting smarter - they also added paging. Most OSs abandoned segmentation (except for where it's necessary to execute old software designed for older CPUs - e.g. DOS and 16-bit windows programs). OS/2 was the only OS that bothered with segmentation for new 32-bit executables, and it paid the price (in terms of additional complexity for a feature nobody bothered to use). It turned out that given the choice between "safety" (from segmentation) and performance (from not having to do the protection checks that segmentation required), every sane programmer chose performance.
Not really. The 386 processor defined partly a completely new environment. Old real mode (which wasn't supported in the 286) could be emulated in a new submode. The primary problem was how they extended the GDT and descriptors to be backwards compatible with 286 protected mode, which was not at all necessary since this only affected OS-kernels and not applications.
Brendan wrote: Things ticked along nicely for a while, with a few extensions to paging to support systems with more than 4 GiB of RAM. Eventually both Intel and AMD decided it was time for 64-bit. Intel wanted people to shift to a "not so backward compatible" Itanium (where they could lock out competitors). AMD had other plans.
Itanium was the worse piece of junk ever from Intel. It was a really good thing that this designed died silently.
Brendan wrote: AMD continued the old tradition - they provided enough backward compatibility so that old software designed for older (32-bit) CPUs would still be able to run under a 64-bit OS; which meant keeping segmentation for long mode. For 64-bit code (where backward compatibility wasn't important) they did what everyone had been hoping for - they took the old/deprecated "silly hack combined with failed ideas, now with 32-bit extensions" out behind the back shed and ended its suffering.
Not really. If you had snooped around it a little more writing an emulator, you'd noticed about the only thing that happens when you switch to long mode is that base registers stops working as do the limit checks (except for FS and GS, but their bases are not loaded with descriptors). It's still valid to load selectors in 64-bit mode, and the whole descriptor cache is maintained between mode-switches.

Worse is that in AMDs design, compability-mode will clobber upper halves of 64-bit registers on 32-bit register loads, and this cannot be compensated for in compability-mode as manipulating 64-bit registers is not available. This is contrary to how real mode can still use 32-bit instructions, and that 16-bit register loads will not clobber upper parts of 32-registers. This seems a lot like a bug, but it still will cause unnecessary 64-bit register pushes and pops in mixed designs.

Another problem is how all implementations and uses of 64-bit mode still are essentially 32-bit implementations with more and wider registers and a larger heap. Also, addressing a random 64-bit address requires loading the address in a general register and then using the general register as a base, which is pretty slow and similar to how segment registers must be loaded in the segmented environment.

Lastly, 64-bit is a misnomer as the paging hardware only supports 48-bit linear addresses, and this cannot be fixed without breaking the paging implementation (similar to how PAE broke 32-bit paging).

Re: Can overlap segments achieve protection?

Posted: Tue Feb 17, 2015 10:12 am
by Brendan
Hi,
rdos wrote:Itanium was the worse piece of junk ever from Intel. It was a really good thing that this designed died silently.
Itanium was probably Intel's second most successful architecture (in that everything except 80x86 failed worse).
rdos wrote:
Brendan wrote:AMD continued the old tradition - they provided enough backward compatibility so that old software designed for older (32-bit) CPUs would still be able to run under a 64-bit OS; which meant keeping segmentation for long mode. For 64-bit code (where backward compatibility wasn't important) they did what everyone had been hoping for - they took the old/deprecated "silly hack combined with failed ideas, now with 32-bit extensions" out behind the back shed and ended its suffering.
Not really. If you had snooped around it a little more writing an emulator, you'd noticed about the only thing that happens when you switch to long mode is that base registers stops working as do the limit checks (except for FS and GS, but their bases are not loaded with descriptors). It's still valid to load selectors in 64-bit mode, and the whole descriptor cache is maintained between mode-switches.
That's the minimum needed to run old software designed for older (32-bit) CPUs. Doing additional things like saving/restoring (or even just clearing) the segment descriptor caches is "slightly less than the minimum". Of course they did recycle the same CPL/RPL stuff, and had to keep FS and GS (as it's used by most OSs for things like CPU local data and/or thread local data); but that doesn't really change the fact that 64-bit code is forced to use a flat virtual address space.
rdos wrote:Worse is that in AMDs design, compability-mode will clobber upper halves of 64-bit registers on 32-bit register loads, and this cannot be compensated for in compability-mode as manipulating 64-bit registers is not available. This is contrary to how real mode can still use 32-bit instructions, and that 16-bit register loads will not clobber upper parts of 32-registers. This seems a lot like a bug, but it still will cause unnecessary 64-bit register pushes and pops in mixed designs.
When you update half of a register (e.g. AX and not EAX; or EAX and not RAX) you end up with a dependency on the register's previous value which hampers things like register renaming and out-of-order execution, and this costs performance for nothing. AMD's "worse" approach avoids this pointless stupidity by zero extending so that there is no unnecessary dependency on the register's previous value.
rdos wrote:Another problem is how all implementations and uses of 64-bit mode still are essentially 32-bit implementations with more and wider registers and a larger heap. Also, addressing a random 64-bit address requires loading the address in a general register and then using the general register as a base, which is pretty slow and similar to how segment registers must be loaded in the segmented environment.
Except that everything that's dynamically allocated has to be accessed via. a pointer anyway so it makes no difference for those cases; and for things that can be allocated at compile/link time (e.g. things in your ".text" or ".data" or ".bss" section) it's extremely likely (assuming the OS developer isn't a fool) that these sections will be in the first 2 GiB of the virtual address space (and/or within reach of RIP relative addressing) and therefore 32-bit addresses are all you need; and for shared libraries you've got RIP relative addressing and things like the GOT/Global Offset Table.

In other words; the impact of not supporting 64-bit immediates for addresses is almost zero (assuming the OS developer isn't a fool, and hasn't done something insane like loading executables in the middle of nowhere).
rdos wrote:Lastly, 64-bit is a misnomer as the paging hardware only supports 48-bit linear addresses, and this cannot be fixed without breaking the paging implementation (similar to how PAE broke 32-bit paging).
"64-bit" typically means that general purpose registers are 64 bits wide (and has nothing to do with either physical or virtual address sizes). In the same way 8086 was a 16-bit CPU (even though addresses were 20-bit) and 6502 was considered an 8-bit CPU (even though addresses were 16-bit).


Cheers,

Brendan

Re: Can overlap segments achieve protection?

Posted: Thu Feb 19, 2015 2:17 am
by rdos
Brendan wrote: That's the minimum needed to run old software designed for older (32-bit) CPUs. Doing additional things like saving/restoring (or even just clearing) the segment descriptor caches is "slightly less than the minimum". Of course they did recycle the same CPL/RPL stuff, and had to keep FS and GS (as it's used by most OSs for things like CPU local data and/or thread local data); but that doesn't really change the fact that 64-bit code is forced to use a flat virtual address space.
That's already the case since C compiler's cannot cope with segmentation, and few of them even support it (OpenWatcom being a notable exception).

OTOH, this doesn't stop OSes from enforcing segmentation in kernel by running in 32-bit compatibility mode and serving 64-bit applications. Contrary to the claims of AMD, this is perfectly possible to do.
Brendan wrote: When you update half of a register (e.g. AX and not EAX; or EAX and not RAX) you end up with a dependency on the register's previous value which hampers things like register renaming and out-of-order execution, and this costs performance for nothing. AMD's "worse" approach avoids this pointless stupidity by zero extending so that there is no unnecessary dependency on the register's previous value.
That makes no sense as this is only part of 32-bit code where upper halves are NOT available without doing far jumps which essentially will stop out-of-order execution.
Brendan wrote: Except that everything that's dynamically allocated has to be accessed via. a pointer anyway so it makes no difference for those cases; and for things that can be allocated at compile/link time (e.g. things in your ".text" or ".data" or ".bss" section) it's extremely likely (assuming the OS developer isn't a fool) that these sections will be in the first 2 GiB of the virtual address space (and/or within reach of RIP relative addressing) and therefore 32-bit addresses are all you need; and for shared libraries you've got RIP relative addressing and things like the GOT/Global Offset Table.
That makes no sense. This is a worthless 32-bit legacy design.
Brendan wrote: In other words; the impact of not supporting 64-bit immediates for addresses is almost zero (assuming the OS developer isn't a fool, and hasn't done something insane like loading executables in the middle of nowhere).
The 16 upper bits in the address space can be used as segments much the same way as selectors can, and the protection is enforced by RIP addressing not being able to access out-of-bounds.

However, just as C compilers are worthless at supporting segmentation, the same goes for supporting this feature as well. In fact, C compilers are the primary obstacle for introducing better hardware protection methods of all sorts. It's the reason we have 4 level paging, as this can be done transparently to C programs without them even knowing it, but at a horrible cost of needing to construct huge hardware caches and complex, hardware intensive lookup-mechanisms.

Re: Can overlap segments achieve protection?

Posted: Thu Feb 19, 2015 2:53 am
by Brendan
Hi,
rdos wrote:
Brendan wrote:When you update half of a register (e.g. AX and not EAX; or EAX and not RAX) you end up with a dependency on the register's previous value which hampers things like register renaming and out-of-order execution, and this costs performance for nothing. AMD's "worse" approach avoids this pointless stupidity by zero extending so that there is no unnecessary dependency on the register's previous value.
That makes no sense at this is only part of 32-bit code where upper halves are NOT available without doing far jumps which essentially will stop out-of-order execution.
You're making the mistake of assuming most of the CPU cares if you're running 16-bit, 32-bit or 64-bit code. It doesn't. The only part that cares is the part that decodes instructions and generates micro-ops. The majority of the CPU (that executes the micro-ops) doesn't know or care if the original instruction was (e.g.) a 32-bit instruction in 64-bit code or a 32-bit instruction in 32-bit code.
rdos wrote:
Brendan wrote:Except that everything that's dynamically allocated has to be accessed via. a pointer anyway so it makes no difference for those cases; and for things that can be allocated at compile/link time (e.g. things in your ".text" or ".data" or ".bss" section) it's extremely likely (assuming the OS developer isn't a fool) that these sections will be in the first 2 GiB of the virtual address space (and/or within reach of RIP relative addressing) and therefore 32-bit addresses are all you need; and for shared libraries you've got RIP relative addressing and things like the GOT/Global Offset Table.
That makes no sense. This is a worthless 32-bit legacy design.
Um, what? There's no reason you can't use full 64-bit pointers, it's just slower. Fortunately it's extremely rare (I doubt I've ever seen an executable file that's larger than 4 GiB) so no sane people care that it's slower.

Note: I suspect that you're trying to blame AMD because your OS is poorly designed, and I really do think it's unreasonable to blame AMD for your mistake.
rdos wrote:
Brendan wrote:In other words; the impact of not supporting 64-bit immediates for addresses is almost zero (assuming the OS developer isn't a fool, and hasn't done something insane like loading executables in the middle of nowhere).
The 16 upper bits in the address space can be used as segments much the same way as selectors can, and the protection is enforced by RIP addressing not being able to access out-of-bounds.

However, just as C compilers are worthless at supporting segmentation, the same goes for supporting this.
If the OS developer is a fool (and uses the upper 16-bits of addresses for a hideous "pseudo-segment" pile of puke) then the OS developer gets what they deserve, and has no right to blame AMD, and has no right to blame the compiler developers either.

This was your decision. The consequences of your decision are your problem.


Cheers,

Brendan

Re: Can overlap segments achieve protection?

Posted: Thu Feb 19, 2015 3:54 am
by rdos
Brendan wrote:
rdos wrote: That makes no sense at this is only part of 32-bit code where upper halves are NOT available without doing far jumps which essentially will stop out-of-order execution.
You're making the mistake of assuming most of the CPU cares if you're running 16-bit, 32-bit or 64-bit code. It doesn't.
It does because loading a 32-bit register in 64-bit mode does NOT clobber the upper half of it. If it did, there would be no sense in having 32-bit operand overrides.
Brendan wrote: Um, what? There's no reason you can't use full 64-bit pointers, it's just slower. Fortunately it's extremely rare (I doubt I've ever seen an executable file that's larger than 4 GiB) so no sane people care that it's slower.
Which means that basically all 64-bit designs are wasting hardware and performance with 4 level paging when nobody cares for more than 2 levels. We could simply reduce the address space to 32-bit and just introduce 64-bit operations to existing 32-bit mode instead. To me it seems like software developers simply are not using the hardware features, which in this case is due to high-level compilers and linkers that cannot handle the setup.
Brendan wrote: Note: I suspect that you're trying to blame AMD because your OS is poorly designed, and I really do think it's unreasonable to blame AMD for your mistake.
Not at all. I blame the GCC team as they are the one's that haven't implemented this in a way that allows me to exploit it. The hardware is perfectly functional while the software (C compiler and linker) is not.

Re: Can overlap segments achieve protection?

Posted: Thu Feb 19, 2015 4:15 am
by Combuster
rdos wrote:nobody cares for more than 2 levels.
:^o
rdos wrote:It does because loading a 32-bit register in 64-bit mode does NOT clobber the upper half of it.
:^o
rdos wrote:If it did, there would be no sense in having 32-bit operand overrides.
:^o In 64-bit mode, 32-bit operand size is the default, and hence there's no override for that.

Re: Can overlap segments achieve protection?

Posted: Thu Feb 19, 2015 4:40 am
by rdos
Combuster wrote:
rdos wrote:nobody cares for more than 2 levels.
:^o
rdos wrote:It does because loading a 32-bit register in 64-bit mode does NOT clobber the upper half of it.
:^o
rdos wrote:If it did, there would be no sense in having 32-bit operand overrides.
:^o In 64-bit mode, 32-bit operand size is the default, and hence there's no override for that.
Tested it, and it's not sign-extension as previously claimed:

test32:
mov rax,123456789ABCDEF0h ; RAX will be set to 123456789ABCDEF0
mov eax,-5 ; upper part will be clobbered and RAX will be 00000000FFFFFFFB

test16:
mov rax,123456789ABCDEF0h ; RAX will be set to 123456789ABCDEF0
mov ax,-5 ; upper part will not be clobbered and RAX will be 123456789ABCFFFB

I think this does explain why 32-bit code manipulating 32-bit registers will zero upper half of the 64-bit register. This seems to be a "feature" (rather bug) of long mode.

What this essentially means is that general registers that needs to be preserved and does not return values must be saved before calling an unknown handler in 32-bit mode (as it cannot save the upper half if it uses a 32-bit register), but that if a value is returned there is no need to bother about the upper half as it will automatically be cleared when the 32-bit code loads that particular register.