Feasibility to keep GDTR and GDT data only into registers

~ · Post by ~ » Mon Oct 24, 2016 2:00 pm

I have been thinking how in 64-bit mode it's supposed that segment registers aren't really needed as it mainly uses a flat memory model.

I have been thinking how segment registers are loaded only when we write or POP them with new values, and meanwhile they keep a cache of the selector data (8 bytes for 32-bit mode and 16 bytes for 64-bit, besides the writable 16-bit value in CS, SS, DS, ES, FS, GS...).

Well, if segment registers are modified only when we do far jumps with an immediate segment value or when we write segment registers, why not just create the GDT once, load all segment registers once with a flat address, free up the GDTR/GDT table memory and don't modify segment registers again, relying on the hidden selector cache which every segment register has?

The same with the GDTR, but if we use a flat memory model we only need to load segment registers at once, and completely free the memory containing the GDTR and GDT, and don't ever modify those values unless we want to do something very special like switching to other CPU mode (16, 32, 64).

In case that operations like interrupts or IRET effectively reload segment registers, why not just keep a GDT with at most the NULL selector and a default code segment just to reload the same segment only so that the system isn't left without a GDT if it ever needs that... and we could test whether we can free the GDT completely and still perform far jumps, far calls, interrupts and IRETs but using CS: instead of an immediate value?

As for data and stack, it's supposed that we would never modify those segment values again after startup. Then we would either have to run in Ring 0, or control privileges with paging to assign privilege levels to pages on top of a flat Ring 0 code/data 32 or 64-bit address space.

It seems that it could give us a good memory saving, and a great simplification of the system as we would ensure that we wouldn't rely anymore on segment selectors (unless we need to run V86, change selectors, keep hardware-based tasks, and the like, which could as well be expected, but we could probably free up the memory holding GDTR/GDT table every time, as long as we don't do those things).

Octocontrabass · Post by **Octocontrabass** » Mon Oct 24, 2016 3:40 pm

~ wrote:Well, if segment registers are modified only when we do far jumps with an immediate segment value or when we write segment registers, why not just create the GDT once, load all segment registers once with a flat address, free up the GDTR/GDT table memory and don't modify segment registers again, relying on the hidden selector cache which every segment register has?

Because you still need to handle interrupts, and you can't do that without a GDT.

~ wrote:In case that operations like interrupts or IRET effectively reload segment registers, why not just keep a GDT with at most the NULL selector and a default code segment just to reload the same segment only so that the system isn't left without a GDT if it ever needs that...

Because then you waste time filling up a new GDT every time you need to load a segment register, which will be very often if you're running code in ring 3.

~ wrote:and we could test whether we can free the GDT completely and still perform far jumps, far calls, interrupts and IRETs but using CS: instead of an immediate value?

How do you put a CS prefix on an interrupt?

~ wrote:It seems that it could give us a good memory saving,

Of how much? Less than a kilobyte? Why should anyone waste their time on such a large amount of work for a useless "optimization" like that?

~ · Post by ~ » Mon Oct 24, 2016 4:09 pm

Octocontrabass wrote:
~ wrote:Well, if segment registers are modified only when we do far jumps with an immediate segment value or when we write segment registers, why not just create the GDT once, load all segment registers once with a flat address, free up the GDTR/GDT table memory and don't modify segment registers again, relying on the hidden selector cache which every segment register has?
Because you still need to handle interrupts, and you can't do that without a GDT.

In this way we could create a temporary discardable GDT only to load segment registers once at boot time or at CPU mode switch and no more. Technically the GDT would still be contained in every individual segment register loaded (we could load them all with flat memory selectors and then use paging for better management). We could probably use only paging to separate stuff into different privileges, and leave the GDT purely in Ring 0.

Wouldn't it be the same as fully switching to paging-based virtual addressing and secure access and leave the GDT only for the most basic required backward-compatible CPU runtime structure, and also if we get to use V86 or hardware multitasking (in those cases we would still be able to modify the GDT per application but only when needed... probably that's why Win9x made mistakes with segment selector register values, blue screens occurred and sometimes it could rebuild the selectors, sometimes not... the same thing could always be done more stably and more cleanly)?

Octocontrabass wrote:
~ wrote:In case that operations like interrupts or IRET effectively reload segment registers, why not just keep a GDT with at most the NULL selector and a default code segment just to reload the same segment only so that the system isn't left without a GDT if it ever needs that...
Because then you waste time filling up a new GDT every time you need to load a segment register, which will be very often if you're running code in ring 3.

It would probably be possible to just have a flat memory space in Ring 0, and define memory pages with different privilege ring levels. Then it would look like it would become unnecessary to keep modifying segment registers.

Octocontrabass wrote:
~ wrote:and we could test whether we can free the GDT completely and still perform far jumps, far calls, interrupts and IRETs but using CS: instead of an immediate value?
How do you put a CS prefix on an interrupt?

I had the impression that IRET could cause that segment register reloading, or an exception, or that it only takes place when the 16-bit segment value changes, but if not, much better for 32 and 64-bit modes. Those are the kind of tiny bits of knowledge, of fragile yet raw-hardware-grade optimizations that operating systems development seeks... Another thing to try out thoroughly to an extreme...

Octocontrabass wrote:
~ wrote:It seems that it could give us a good memory saving,
Of how much? Less than a kilobyte? Why should anyone waste their time on such a large amount of work for a useless "optimization" like that?

Depending on the number of things to handle at a time, we could now use only 3, 4 or 5 selectors for the whole system, or even up to eight at the very most (NULL, default code, default data, dynamic selector, second dynamic selector). In this way we could reuse them for each applications if ever needed instead of getting into the complication of figuring out how much memory to reserve for selectors at boot time, or how to allocate more memory for the GDT if we needed that. So that way to manage the GDT and recalculate as needed (truly not that often) seems to have real and big advantages over naively defining lots and lots of in-memory selectors instead of exploiting the hidden selector part of the segment registers, as we do with the reuse of the General Purpose Registers throughout the whole system binaries running without running into trouble. We would be missing that optimization.

hgoel · Post by **hgoel** » Mon Oct 24, 2016 4:13 pm

While I agree that there isn't much of a point in worrying about a few bytes when we have huge amounts of memory, I think you might find the idea of using the NULL entry in the GDT to store the GDT Base neat. I can't find the link to the article where I read about it, but it is in my opinion a neat trick. You basically use the fact that a segment register can't actually reference the NULL descriptor and instead of just leaving it blank, use it to keep the GDT base data, saving a few bytes. (The article was from the time when a few bytes did matter)

~ · Post by ~ » Mon Oct 24, 2016 4:27 pm

hgoel wrote:While I agree that there isn't much of a point in worrying about a few bytes when we have huge amounts of memory, I think you might find the idea of using the NULL entry in the GDT to store the GDT Base neat. I can't find the link to the article where I read about it, but it is in my opinion a neat trick. You basically use the fact that a segment register can't actually reference the NULL descriptor and instead of just leaving it blank, use it to keep the GDT base data, saving a few bytes. (The article was from the time when a few bytes did matter)

You can find about it in the following topic. Look at Point 2 (Enabling Simple Protected Mode); it's there:
Understanding a Good Generic FAT12 Floppy Boot Sector

The trick to store the GDTR into the NULL selector is used in the Protected Mode capable boot sectors found at osdever.net (for example bootf02 by John Fine):
http://devel.archefire.org/mirrors/osde ... loads.html

gerryg400 · Post by **gerryg400** » Mon Oct 24, 2016 5:53 pm

hgoel wrote:While I agree that there isn't much of a point in worrying about a few bytes when we have huge amounts of memory, I think you might find the idea of using the NULL entry in the GDT to store the GDT Base neat. I can't find the link to the article where I read about it, but it is in my opinion a neat trick. You basically use the fact that a segment register can't actually reference the NULL descriptor and instead of just leaving it blank, use it to keep the GDT base data, saving a few bytes. (The article was from the time when a few bytes did matter)

It's a stupid idea like most 'neat tricks'. It's how system code quickly becomes un-maintainable. To save a few bytes and keep code readable an maintainable store the GDT base on the stack or heap so that it will be freed when it's no longer needed.

~ · Post by ~ » Mon Oct 24, 2016 6:18 pm

Using the NULL descriptor to store the GDTR is a widely used trick in tiny systems, and it's very easy to understand.

One thinks why there is a NULL selector. If it isn't because of an error to divide by 0, then it kind of leaves open the supposition that it could have been intended to store some custom data, most likely the GDTR. So it's a good trick that is used at least in several boot sectors and programs that enter Protected or Unreal Mode.

It's a stable feature and it's capable to keep all GDT structures in a single packaged data buffer, and the first 8 or probably 16 bytes (16 bytes for 64-bit) of the GDT are always unused, so it's OK.

gerryg400 · Post by **gerryg400** » Mon Oct 24, 2016 6:51 pm

~ wrote:Using the NULL descriptor to store the GDTR is a widely used trick in tiny systems, and it's very easy to understand.

One thinks why there is a NULL selector. If it isn't because of an error to divide by 0, then it kind of leaves open the supposition that it could have been intended to store some custom data, most likely the GDTR. So it's a good trick that is used at least in several boot sectors and programs that enter Protected or Unreal Mode.

It's a stable feature and it's capable to keep all GDT structures in a single packaged data buffer, and the first 8 or probably 16 bytes (16 bytes for 64-bit) of the GDT are always unused, so it's OK.

I completely disagree with your logic. The information in the GDTR is only needed very briefly. If you were _really_ trying to save memory you would store the GDTR on the stack or heap where it could be freed immediately after use. You could use the few spare bytes in the GDT for data that is required permanently.

As I said, it is a stupid idea.

~ · Post by ~ » Mon Oct 24, 2016 10:52 pm

What would we do with the unused GDT bytes then?

Why are those bytes unused to begin with?

Using that selector or another packing method provides the same possibility to free up memory. It feels easier to have that structure packed in a single place.

I wonder why the NULL selector really exists, but it feels like this is one of the reasons, so if it's a feature of the CPU itself it cannot be a bad idea to use it. Probably the CPU internally uses the space of an internal GDT space at selector 0 to store the GDTR so it really doesn't matter. It isn't a critical thing to pay that much attention after being implemented stably.

In any case saying all this no longer provides any information about if it's a stable thing to do to just load flat Ring 0 segments in all segment registers, free up the GDT, forget it (or just keep default GDT segments) and use only paging for assigning application privileges system-wide.

Octocontrabass · Post by **Octocontrabass** » Tue Oct 25, 2016 1:40 am

~ wrote:In this way we could create a temporary discardable GDT only to load segment registers once at boot time or at CPU mode switch and no more.

You need a GDT to handle interrupts. Interrupts will reload the segment registers.

~ wrote:We could probably use only paging to separate stuff into different privileges, and leave the GDT purely in Ring 0.

It would probably be possible to just have a flat memory space in Ring 0, and define memory pages with different privilege ring levels. Then it would look like it would become unnecessary to keep modifying segment registers.

Ring 0 always has permission to bypass privilege checks. You need ring 3 to enforce privilege levels.

~ wrote:I had the impression that IRET could cause that segment register reloading, or an exception, or that it only takes place when the 16-bit segment value changes, but if not, much better for 32 and 64-bit modes. Those are the kind of tiny bits of knowledge, of fragile yet raw-hardware-grade optimizations that operating systems development seeks... Another thing to try out thoroughly to an extreme...

IRET always reloads CS and SS.

~ wrote:Depending on the number of things to handle at a time, we could now use only 3, 4 or 5 selectors for the whole system, or even up to eight at the very most (NULL, default code, default data, dynamic selector, second dynamic selector).

The minimum for a working system is 6: null, ring 0 code, ring 0 data, ring 3 code, ring 3 data, TSS. Most kernels use a few more.

~ wrote:In this way we could reuse them for each applications if ever needed instead of getting into the complication of figuring out how much memory to reserve for selectors at boot time, or how to allocate more memory for the GDT if we needed that.

Unless you are doing something extremely complex, your GDT will be a fixed size, and you can statically allocate the memory for it. You don't need to do anything to figure out how big it will be, and you don't need to allocate more memory for it.

~ wrote:So that way to manage the GDT and recalculate as needed (truly not that often) seems to have real and big advantages over naively defining lots and lots of in-memory selectors instead of exploiting the hidden selector part of the segment registers, as we do with the reuse of the General Purpose Registers throughout the whole system binaries running without running into trouble. We would be missing that optimization.

It's not an optimization, it's a waste of time. Modern operating systems use fewer than 10 selectors. There is nothing to gain by trying to reduce it further.

~ wrote:What would we do with the unused GDT bytes then?

Nothing. If you truly need 8 or 16 bytes of memory, you can find it elsewhere.

~ wrote:Why are those bytes unused to begin with?

Intel wanted selector 0 to be the null selector, and leaving an empty space in the GDT was the simplest way to do it.

~ wrote:In any case saying all this no longer provides any information about if it's a stable thing to do to just load flat Ring 0 segments in all segment registers, free up the GDT, forget it (or just keep default GDT segments) and use only paging for assigning application privileges system-wide.

You still need ring 3 to enforce privilege levels.

issamabd · Post by **issamabd** » Tue Oct 25, 2016 2:44 am

Hi,

This is what I have found, in INTEL 80386 manual, about GDT's first null entry:

Because the first entry of the GDT is not used by the processor, a selector
that has an index of zero and a table indicator of zero (i.e., a selector
that points to the first entry of the GDT), can be used as a null selector.
The processor does not cause an exception when a segment register (other
than CS or SS) is loaded with a null selector. It will, however, cause an
exception when the segment register is used to access memory. This feature
is useful for initializing unused segment registers so as to trap accidental
references.

Kevin · Post by **Kevin** » Tue Oct 25, 2016 5:43 am

~ wrote:Depending on the number of things to handle at a time, we could now use only 3, 4 or 5 selectors for the whole system, or even up to eight at the very most (NULL, default code, default data, dynamic selector, second dynamic selector)

As it happens, this is already the number of descriptors that most people have without playing stupid tricks: Null descriptor, kernel code, kernel data, user code, user data, TSS (which you forgot, but you'll still need). I like to have another TSS for double faults, but this is strictly optional. Six descriptors, each eight bytes, that's 48 bytes. Not a whole lot of memory to save anyway.

What would we do with the unused GDT bytes then?

If you really badly want to use them for something else, allocate memory only for the GDT entries after the null descriptor and lgdt at offset -8. But putting something there that isn't used any more than the null descriptor doesn't really save memory (assuming you do a lgdt once on startup and then never again, like most people).

~ · Post by ~ » Tue Oct 25, 2016 3:23 pm

Probably it looks wrong to you because it isn't using standard C or C++ stack, heap or memory allocation concepts.

But it's about one of the things that is just about as low level as the x86 software can go.

So I like to take advantage of any tricks allowed and supported by the whole x86 family that are actual optimizations for that architecture.

I also like to use the proper standard methods when I treat purely with end-user programs.

So I like to write low-level code that is optimized for the hardware, and if I know how it's actually implemented, mirror that, no matter how much I ignore higher-level standards, as is logical.

And I like to write end-user code that is high-level and completely separated from the details of the machine, optimized for the elements that are really relevant for the implementation.

The end goal is that I can write end-user code that looks easy to read and that is actually easily portable anywhere, while also writing but keeping separate the platform, hardware and CPU-specific code with optimizations of all kinds that better serve the underlying machine (making it portable across modes to use the same code for 16/32/64, compressing instructions by using the result of previous instructions without generating it again by hand). The end-user program would only be an interface, a bag or a shell to use the resources of different types of machines with the same high-level, highly-abstracted script, but the lowest level software functions must mirror the hardware only to present a virtual executable model of it to the executing CPU exactly as is while being as thin and light as possible, not bloat what is already optimum from the electronics.

I don't see what is "stupid" here by doing things this way, when they have been done like this from the start and have proven to be considerably more efficient than just thinking everything in high-level language terms.

FusT · Post by **FusT** » Wed Oct 26, 2016 12:25 am

So basically you're trying to create a microkernel that is able to run platform-independent code (like JAVA/PHP/<insert your favorite high-level virtualized/interpreted language here>)?
Even then, why optimize code that only saves a couple of bytes while (fairly) modern machines all have huge amounts of RAM available?
Even if you were to develop a system that can run on old (not ancient) machines, you'd still have way more than 128MB of RAM so a few bytes (or kilobytes) that only get created and used once don't matter.

The end-user program would only be an interface, a bag or a shell to use the resources of different types of machines with the same high-level, highly-abstracted script, but the lowest level software functions must mirror the hardware only to present a virtual executable model of it to the executing CPU exactly as is while being as thin and light as possible, not bloat what is already optimum from the electronics.

This is (very) basically what e.g. the JAVA VM does, so why not just build a tiny kernel that can run such a VM/interpreter and then implement all "end-user code" in that high-level, abstracted language?

Brendan · Post by **Brendan** » Wed Oct 26, 2016 12:47 am

Hi,

Random notes...

It's perfectly safe to have no GDT at all, with the following restrictions:

No task gates (for protected mode), or no "IST mechanism" for long mode; which means no reliable way to recover from things like double fault exceptions
No emulation of different environments (e.g. Wine, virtual8086 mode, etc)
No support for 32-bit processes running under a 64-bit version of the same OS
Only ever use one privilege level (e.g. everything running at CPL=0) with no security at all (including "software based security" which counts as "no security" due to bugs and/or vulnerabilities in compiler and/or VM and/or hardware)

It's potentially possible (with even more severe limitations) to work around the last restriction (and use 2 privilege levels without a GDT); because SYSCALL/SYSRET (and SYSENTER/SYSEXIT) don't need a GDT anyway. For interrupts and exceptions you can:

Set "IDT.limit" to zero, so that you get a triple fault when any kind of interrupt or exception occurs. This might be fine for something like a game console where the OS is in ROM.
For protected mode only; have interrupt and exception handlers at CPL=3 (e.g. in a shared library maybe). These could do nothing more than call the kernel (via. SYSCALL or SYSENTER) where the kernel still handles the cause of the interrupt or exception. Note: this is only "in theory", because in practice you'd want an IDT for CPL=0 and an additional IDT for CPL=3 and the additional IDT will cost more than than the GDT you're trying to avoid would have.

The NULL descriptor in the GDT doesn't need to exist at all. For example, the GDT can begin 8 bytes from the end of a "not present" page, so that the second GDT descriptor begins on the next page (which would be present).

An OS typically uses a GDT entry per CPU for that CPU's TSS, plus a GDT entry per CPU for that CPU's thread local storage. With 123 CPUs this would add up to 251 GDT entries (or almost 2 KiB). Both of these "per CPU descriptors" are avoidable in multiple ways.

Cheers,

Brendan

OSDev.org

Feasibility to keep GDTR and GDT data only into registers

Feasibility to keep GDTR and GDT data only into registers

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register

Re: Feasibility to keep GDTR and GDT data only into register