Page 1 of 2

Keeping thread TSS and SS0 stack in LDT

Posted: Tue May 24, 2011 2:10 pm
by rdos
The reason for not having one TSS (and possibly SS0 selector) per thread would be because of GDT selector shortage, and a need to have thousands of threads.

But wouldn't it be perfectly possible to allocate both the TSS alias and the kernel stack in the LDT? I think it should be possible. Especially since my design uses a "core stack" in the scheduler, and switches away from the kernel stack after the register state is saved. If the LDT is loaded before the target (kernel) stack selector and the TR register, this should be feasible.

This change would also make the use of ring 1/2 more feasible, as the stack for these levels could also be allocated in the LDT rather than in the GDT.

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Wed May 25, 2011 12:15 am
by Brendan
Hi,
rdos wrote:The reason for not having one TSS (and possibly SS0 selector) per thread would be because of GDT selector shortage, and a need to have thousands of threads.

But wouldn't it be perfectly possible to allocate both the TSS alias and the kernel stack in the LDT? I think it should be possible.
You could store the TSS descriptor in the LDT or anywhere else, and copy it to the GDT before using it. You can't use a TSS descriptor in an LDT directly though (general protection fault if you try).

The key to minimising GDT entries is to dynamically modify a small/fixed number of GDT entries. For example, you only really need a "from TSS descriptor" and a "to TSS descriptor" for each CPU; and maybe one "LDT descriptor" per CPU too. At 3 GDT entries per CPU you can have 2730 CPUs (with 8190 GDT entries) and an infinite number of TSSs and LDTs that are copied into those GDT entries during task switches.

I can't help thinking you're focusing on the wrong problems though.
Bear Grylls wrote:An OS using segmentation in 2011, better drink my own...

Cheers,

Brendan

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Wed May 25, 2011 2:12 am
by rdos
Brendan wrote: You can't use a TSS descriptor in an LDT directly though (general protection fault if you try).
OK. There are other complications as well. The DPMI-server and the interrupt reflection mechanism will need substantial modifications in order to work with SS0 in LDT as well, so I don't think I'll do anything about this. Not that the DPMI-server is used for much, but I like to be able to keep this functionality just in case. Also, in order to be able to mix different executable formats in the future, I cannot break the "app model" that uses one LDT per active program (not per active process). After all, I might continue work on the GCC-toolchain.

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri May 27, 2011 4:25 am
by ErikVikinger
Hello,


The TSS-Descriptor cannot be inside of the LDT because it selects the LDT, see to the Offset 0x60 of the TSS.

And you need only one TSS per process not per Thread.

rdos wrote:After all, I might continue work on the GCC-toolchain.
If you really want support for Segmentation than the gcc seems to be an inadequate basement. In my opinion it is nearly impossible for the gcc to learn the segmentation. The LLVM could be for this the better choice. For the LLVM pointers are an extra base type and not only an additional integer type. I think it is possible to hide the segmentation behind the LLVM pointer type for only a minimal impact to the LLVM source code to learn the segmentation.

Brendan wrote:
Bear Grylls wrote:An OS using segmentation in 2011, better drink my own...
This sounds really negative, why? In the right hands, segmentation can be a powerful and sophisticated tool for a save operating system.


Greetings
Erik

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri May 27, 2011 5:02 am
by rdos
ErikVikinger wrote:The TSS-Descriptor cannot be inside of the LDT because it selects the LDT, see to the Offset 0x60 of the TSS.

And you need only one TSS per process not per Thread.
Yes. I outlined another idea in a similar thread (http://forum.osdev.org/viewtopic.php?f= ... 3&start=15). This idea seemed much more feasible to reduce GDT selector usage to a single selector per thread.
ErikVikinger wrote:
rdos wrote:After all, I might continue work on the GCC-toolchain.
If you really want support for Segmentation than the gcc seems to be an inadequate basement. In my opinion it is nearly impossible for the gcc to learn the segmentation. The LLVM could be for this the better choice. For the LLVM pointers are an extra base type and not only an additional integer type. I think it is possible to hide the segmentation behind the LLVM pointer type for only a minimal impact to the LLVM source code to learn the segmentation.
I currently use 32-bit flat for applications. I have adapted OpenWatcom to output native 32-bit flat applications for RDOS. I've left segmentation for applications since several years back. Therefore, GCC is a feasible toolchain for applications, but not for device-drivers. I'm about to finish the RDOS device-driver target in OpenWatcom, which will use the 32-bit compact memory model for device-drivers in C/C++.

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Sat May 28, 2011 3:13 am
by Brendan
Hi,
ErikVikinger wrote:
Brendan wrote:
Bear Grylls wrote:An OS using segmentation in 2011, better drink my own...
This sounds really negative, why? In the right hands, segmentation can be a powerful and sophisticated tool for a save operating system.
Segmentation is:
  • inadequate (on its own) for handling physical memory fragmentation (there's no way to avoid the "worst case, copy lots of RAM" problem).
  • inadequate (on its own) for handling things like swap space, memory mapped files, copy on write, allocation on demand, etc. You're mostly forced to work on entire segments (not small pieces) which defeats the purpose of these common VMM features and prevents any performance gain/s or RAM savings that could've been achieved.
  • bad (on its own) due to physical address space size limitations on 80x86 (you can't access more than 4 GiB of the physical address space, or about 3 GiB of RAM, even though most modern computers have more)
  • entirely pointless if it's used in conjunction with paging (to solve the first 3 problems). As soon as you start using paging you realise paging alone is enough for protection and isolation purposes, and segmentation becomes just an unnecessary layer of overhead.
  • not portable. Very few other architectures support it (including 64-bit 80x86).
  • it's "awkward" for programmers (and a pain in the neck for compilers/toolchains). A nice clean/contiguous space is conceptually easier than "many isolated islands".
  • Slow. The GDT/LDT lookups plus the number of protection checks involved can't be avoided, which makes it suck. Because it sucks nobody uses it and because nobody uses it CPU manufacturers don't bother optimising it (which makes it suck even more).
For all of these reasons no modern OS designed for (or ported to) 80x86 uses segmentation anymore. A few OSs used to use it for backward compatibility purposes (e.g. obsolete versions of Windows used segmentation for running ancient processes designed for "16-bit protected mode" and DOS/DPMI). Windows didn't use segmentation for native 32-bit software from the start, and abandoned segmentation completely as soon as they could drop support for legacy software. OS/2 was more interesting - it did support segmentation for 32-bit software in addition to "flat memory model"; but programmers could use either, and because most programmers don't want anything to do with segmentation they all wrote software for OS/2's flat memory model anyway. I'm not sure if eComStation (newer versions of OS/2) still supports segmentation for 32-bit 80x86 software or not (it's probably still there for backward compatibility, and still unused by everything). That only leaves the "small address spaces" support in some (32-bit, 80x86) versions of the L4 micro-kernel; which in my opinion is an "inventive misuse" of segmentation (as opposed to the way segmentation was intended to be used or the way "segmentation advocates" think of it).

In general, some beginners think segmentation is "good" because they only consider the overhead of TLB misses and don't consider the potential performance benefits of paging (when it's used well). Of course after a beginner makes the mistake of using segmentation, as time progresses more and more code depends on it, and more and more courage is needed for them to decide to fix it, even when they're no longer a beginner... ;)


Cheers,

Brendan

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Sat May 28, 2011 5:23 am
by Combuster
# entirely pointless if it's used in conjunction with paging (to solve the first 3 problems). As soon as you start using paging you realise paging alone is enough for protection and isolation purposes, and segmentation becomes just an unnecessary layer of overhead.
Segmentation is faster for the same reason paging is faster. Using it in addition to paging can improve performance where paging has its own design problems, especially if you realize that the common case is often not the worst case. "Inventive misuse" is a rather tainted remark as its still perfectly valid.

That does not excuse you from not being able to deal with the worst case. Use your tools properly and don't religiously exorcise paging (not giving names) or segmentation (not giving names...?) because your own design says so.

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Sat May 28, 2011 9:10 am
by rdos
Brendan wrote:Segmentation is:
  • inadequate (on its own) for handling physical memory fragmentation (there's no way to avoid the "worst case, copy lots of RAM" problem).
  • inadequate (on its own) for handling things like swap space, memory mapped files, copy on write, allocation on demand, etc. You're mostly forced to work on entire segments (not small pieces) which defeats the purpose of these common VMM features and prevents any performance gain/s or RAM savings that could've been achieved.
  • bad (on its own) due to physical address space size limitations on 80x86 (you can't access more than 4 GiB of the physical address space, or about 3 GiB of RAM, even though most modern computers have more)
  • entirely pointless if it's used in conjunction with paging (to solve the first 3 problems). As soon as you start using paging you realise paging alone is enough for protection and isolation purposes, and segmentation becomes just an unnecessary layer of overhead.
Not so. It is natural to use paging for physical memory allocation only (which is what it is good for), and segmentation for protecting parts of the system from each other parts. Paging is worthless in regard to protecting code from tampering with unless you accept the overhead of running code in separate address-spaces and using IPC for communication. This latter solution sucks big time, and is easily outperformed by a segmented design, even if segment register loads are expensive and non-optimized.
Brendan wrote:For all of these reasons no modern OS designed for (or ported to) 80x86 uses segmentation anymore. A few OSs used to use it for backward compatibility purposes (e.g. obsolete versions of Windows used segmentation for running ancient processes designed for "16-bit protected mode" and DOS/DPMI).
Right. The 16-bit code in Windows was a monster on its own. Combine that with terrible compilers that couldn't handle segmentation well, and you get a really bad solution that still exists as a really bad example of how segmentation should not be used. There is no reason to limit the design to 16-bit segments which Windows did. Another problem (probably the main one) with the whole design was that it was built on top of a non-reentrant DOS system. There is no reason to compare this fiasco with a modern, 32-bit, segmented design.
Brendan wrote:Windows didn't use segmentation for native 32-bit software from the start, and abandoned segmentation completely as soon as they could drop support for legacy software. OS/2 was more interesting - it did support segmentation for 32-bit software in addition to "flat memory model"; but programmers could use either, and because most programmers don't want anything to do with segmentation they all wrote software for OS/2's flat memory model anyway. I'm not sure if eComStation (newer versions of OS/2) still supports segmentation for 32-bit 80x86 software or not (it's probably still there for backward compatibility, and still unused by everything). That only leaves the "small address spaces" support in some (32-bit, 80x86) versions of the L4 micro-kernel; which in my opinion is an "inventive misuse" of segmentation (as opposed to the way segmentation was intended to be used or the way "segmentation advocates" think of it).
It is no problem if applications use flat memory model. A segmented kernel will work perfectly well with flat memory model applications, because flat memory model is a sub-case of a segmented memory model.
Brendan wrote:In general, some beginners think segmentation is "good" because they only consider the overhead of TLB misses and don't consider the potential performance benefits of paging (when it's used well). Of course after a beginner makes the mistake of using segmentation, as time progresses more and more code depends on it, and more and more courage is needed for them to decide to fix it, even when they're no longer a beginner... ;)
Paging is pretty useless for protection, and has a number of nightmare features when it comes to debugging:

Paging is terrible:
  • When invalid returns occur CS:EIP points to the invalid destination and there is no way of knowing where the call was from
  • Heap managers for flat memory model typically does not use guard-pages, so any overwrite will trash the heap links
  • Using a flat memory model in a kernel is a recipe for disaster as any pointer errors and writes outside of allocated memory will trash some unrelated function and these errors are close to impossible to track-down
Just as an example. The worst problem with our payment terminal currently is not all the segmented operating system code in assembly, but the C/C++ heap manager that uses a flat memory model. At regular intervals the heap becomes corrupt, and the thread that ultimately page faults (or hangs) is not the one that corrupted the heap, but the one that tried to use the corrupt heap. A segmented heap would quickly have exposed these errors with effective base & limit checking.

I think it will eventually be an attractive option to leave the flat memory model for applications, and instead use the 32-bit compact memory model. The normal heap-manager would allocate selectors in the LDT, with precise base and limit checking, which means the heap problems would eventually be resolved. There will also be an alternative heap-manager for things that are allocated in large quantities (like card lists). These will use an alternative allocator that uses a smart trick: It will allocate a linear address, and set the selector to the flat-memory selector, and thus effectively disable base & limit checking (and LDT selector allocation) for these objects.

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Sat May 28, 2011 9:46 am
by Brendan
Hi,
Combuster wrote:
# entirely pointless if it's used in conjunction with paging (to solve the first 3 problems). As soon as you start using paging you realise paging alone is enough for protection and isolation purposes, and segmentation becomes just an unnecessary layer of overhead.
Segmentation is faster for the same reason paging is faster. Using it in addition to paging can improve performance where paging has its own design problems, especially if you realize that the common case is often not the worst case.
What constitutes "using segmentation as intended"? For me, "using segmentation as intended" means processes use multiple segments (with different base addresses, limits and attributes) to restrict its access to its own code and data. One example of this would be where a process has one segment for ".text", one for ".rodata" and one for ".data and .bss", and a new descriptor is created every time the process calls "malloc()". Basically, segmentation used as a fine-grained protection mechanism within a process.

What constitutes "not using segmentation as intended"? For me, it's not using segments as a fine-grained protection mechanism within a process. This includes setting all segments to "base = virtual_address_start, limit = virtual_address_end" (which is as close as you can get to "not using segmentation at all" on 32-bit 80x86).

Now, can you think of any way of "using segmentation as intended" (as per my definition above) which improves performance? I can't, and I seriously doubt you can either.
Combuster wrote:"Inventive misuse" is a rather tainted remark as its still perfectly valid.
L4's "small address spaces" fits in the second category (with an inventive definition of "virtual_address_start" and "virtual_address_end"). It uses segmentation, but doesn't use segmentation as it was intended to be used. The word "misuse" is fairly accurate in this context (although perhaps there's a better word for "using something in ways it wasn't intended" - if there is, I can't think of it).


Cheers,

Brendan

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Sat May 28, 2011 9:58 am
by rdos
Brendan wrote:What constitutes "using segmentation as intended"? For me, "using segmentation as intended" means processes use multiple segments (with different base addresses, limits and attributes) to restrict its access to its own code and data. One example of this would be where a process has one segment for ".text", one for ".rodata" and one for ".data and .bss", and a new descriptor is created every time the process calls "malloc()". Basically, segmentation used as a fine-grained protection mechanism within a process.
Exactly. Or there could be two versions of malloc to save segment descriptors. One that allocates a new selector, and one that allocates a linear address and returns the flat memory selector. This way the problems with number of available selectors in LDT becomes minor (just change some frequently used allocations to allocate with method two after debugging them).
Brendan wrote:Now, can you think of any way of "using segmentation as intended" (as per my definition above) which improves performance? I can't, and I seriously doubt you can either.
Performance and protection is always a trade-off. If you want applications that crashes for no apparent reason, and no usable trace information, selecting a flat memory model is a good way towards that. Especially if you use the heap frequently.

We should also remember that the interpreted languages with built-in garabage collectors are popular because it is so hard to get C/C++ programs to work with heap and pointers. The interpreted languages practically never crash, but the cost is high. Much higher than to use a sane segmentation approach.

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri Jun 03, 2011 7:38 am
by ErikVikinger
Hello,

Brendan wrote:[*]inadequate (on its own) for handling physical memory fragmentation (there's no way to avoid the "worst case, copy lots of RAM" problem).
Yes, you have to do a memory-defragmentation (with paging you can do that in the background without disturbing the applications), but if you do this job good than you can switch off the paging completely and become a higher performance (segmentation is cheaper then paging). With an (hypothetical) CPU with better support for segmentation (not x86) you can activate the paging on a per segment base and only the application with fragmented segments have to pay the paging penalty.
Brendan wrote:[*]inadequate (on its own) for handling things like swap space, memory mapped files, copy on write, allocation on demand, etc. You're mostly forced to work on entire segments (not small pieces) which defeats the purpose of these common VMM features and prevents any performance gain/s or RAM savings that could've been achieved.
This are thinks that paging is designed for. Good segmentation need paging for some low level tasks that are used in special situation but typical are off.
Swapping is the ultimate job for paging, swapping with complete segments is really not applicable. The same with memory mapped files. Copy on write is working with segmentation too. Allocation on demand is nice but not every time useful, my PC has enough RAM, and the segments can have the correct size for now and expanded as needed at a later time.
Brendan wrote:[*]bad (on its own) due to physical address space size limitations on 80x86 (you can't access more than 4 GiB of the physical address space, or about 3 GiB of RAM, even though most modern computers have more)
Yes, with segmentation you have only one linear address space for all applications and the OS but on a 64 bit platform this is today not a problem.
With paging you have other limitations. How many stack space can have a flat memory application per stack? Flat memory stacks are typical fixed in its size, with segmentation you can use all available memory for stacks (or heap) without a special limit (as long the sum of both fits into the memory). On segmentation the programmer must not be aware that the stack has a significant smaller limit than the heap, both are full flexible.
Brendan wrote:[*]entirely pointless if it's used in conjunction with paging (to solve the first 3 problems). As soon as you start using paging you realise paging alone is enough for protection and isolation purposes, and segmentation becomes just an unnecessary layer of overhead.
As long as every simple integer can be interpreted as a pointer that points to any location (possible valid but not intended) the paging has not enough protection.
Brendan wrote:[*]not portable. Very few other architectures support it (including 64-bit 80x86).
Yes, thats right.
Brendan wrote:[*]it's "awkward" for programmers
Why? A typical C programmer can not see that his program will run with segmentation, this assumes a good compiler for segmentation. Except the OS kernel and the heap management part of the libc, you have nothing to do with the used memory philosophy of the target platform.
Brendan wrote:(and a pain in the neck for compilers/toolchains).
May be true, but is not a real problem.
Brendan wrote:A nice clean/contiguous space is conceptually easier than "many isolated islands".
Yes, for the beginners.
Brendan wrote:[*]Slow. The GDT/LDT lookups plus the number of protection checks involved can't be avoided
This checks can be done in parallel with the memory access and costs nothing (in the same manner as the protection check of paging). And the GDT/LDT lookups are only done at loading a segment register, if your CPU have enough segment registers (not true for x86) than this is very rarely.
Brendan wrote:Because it sucks nobody uses it and because nobody uses it CPU manufacturers don't bother optimising it (which makes it suck even more).
This is the real problem.

The performance looses by paging rise with increased amount of memory used by the applications, the costs for segmentation are nearly constant.


Greetings
Erik


Sorry for my terrible English, my native languages are VHDL and German
and sorry for my delay

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri Jun 03, 2011 8:25 am
by Combuster
just translate? O(1) for any implementation :D

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri Jun 03, 2011 10:18 am
by Owen
VirtualToLinear(Seg, Off) = SegTable[Seg].base + Off

...

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri Jun 03, 2011 12:16 pm
by ErikVikinger
Hello,

berkus wrote:Can be done in O(1) for paging, an adequate equivalent for segmentation, please?
For segmentation this is also O(1) and can be faster (in case of multiple levels of paging directories).
berkus wrote:"As a kernel I want to be able to translate clients' virtual addresses to physical addresses easily to be able to manipulate clients memory."
For accessing the user memory the kernel do not need its physical address, the kernel can access it with the same far pointer if the correct LDT is loaded/selected.


Greetings
Erik

Re: Keeping thread TSS and SS0 stack in LDT

Posted: Fri Jun 03, 2011 3:33 pm
by ErikVikinger
Hello,

berkus wrote:I'd be happy to see how your kernel explains this to a DMA driver.
Okay, i understand, but we know this is with segmentation at least as fast as with paging.


Greetings
Erik