Code selectors and data selectors

Neob91 · Post by **Neob91** » Wed Sep 08, 2010 3:26 pm

Hello. I am preparing to start writing my own OS (wanted to have it all figured out before I actually start writing anything). I wonder, why can't I have a GDT entry that would specify a segment in which both data and code can reside (can be modified and executed)? How can I know where an application has its data and where it has its code to distinguish it? It's very unclear to me. I would appreciate your help very much.

Combuster · Post by **Combuster** » Wed Sep 08, 2010 5:24 pm

A segment describes a location and a purpose. A readable segment can be either data or code, but not both - after all using data as code or vice versa is in 99% of the cases a security problem. Alternatively, a segment can also point to a system structure, like the TSS or LDT.

At the other end, the CPU holds a bunch of segment registers: CS for code, and DS ES FS GS SS for data. since all code lookups are relative to CS and data lookups are relative to DS, ES or SS depending on context the address space is separated by design. You can use this to tighten security by not allowing write access to code, and not allowing data to be executed, as long as the area pointed to by CS does not overlap with the area pointed to by any of the other segment registers - the CPU takes care that only code segments can end up in CS and only data segments can end up in each of the other registers.

The thing is, segmentation is considered an obsolete relic of the past by some, and some others don't understand its potential at all. Altogether, 99% of the people ignore segments and use paging alone (or no protection at all). Since segmentation can not be disabled, people will just set the code and data segments to the maximum size with an offset of zero, so that each address prior to segmentation is the same address after segmentation, and only the effects of paging apply. In fact, many compilers assume just that: they expect no segmentation and they do expect paging. You need to mess with linker scripts the moment that assumption does not hold (like a kernel being loaded into an environment without paging), and the cases where the code segment does not map to the same area as data segments are especially nasty. It is possible, but it's also a lot of work that doesn't add any user-visible features. In any case you'll have to know exactly what you want when you want to stray from the "default".

Neob91 · Post by **Neob91** » Wed Sep 08, 2010 6:01 pm

Thank you for your response. Correct me if I'm wrong, please, because I am still not sure about several things. From what I know, x86-32 starts in 16-bit real mode. To use 32-bit addressing and instructions, it's necessary to enter protected mode. When in protected mode, instead of using physical segment numbers, offsets of GDT entries have to be used. Therefore, if I have binary data and code in the memory, I have to set up 2 GDT entries for this (both describing the same memory space, one being a data segment, the other a code segment and put them in CS and DS), so that the code will be able to access the data correctly. (or is there a better way?)

If what I wrote is true, then I believe I won't have further questions for a while.

Brendan · Post by **Brendan** » Wed Sep 08, 2010 6:44 pm

Hi,

Neob91 wrote:From what I know, x86-32 starts in 16-bit real mode.

Modern x86 actually starts in protected mode, executing firmware code at 0xFFFFFFF0. For "PC BIOS", it configures the chipset, does RAM detection, does a Power On Self Test (POST), etc; and copies part of itself to the legacy area (from 0x000F0000 to 0x000FFFFF). Eventually it switches to real mode (for backward compatibility) before starting the OS.

Different types of firmware may be different. For example, for UEFI the firmware can start in protected mode and stay in protected mode the entire time.

Neob91 wrote:To use 32-bit addressing and instructions, it's necessary to enter protected mode.

Technically, you can use both 32-bit addressing and 32-bit instructions in real mode. However, (like protected mode) you can't access anything beyond a segment's limit, and (unlike protected mode) the segment limits are fixed at 64 KiB in real mode. For example, you can do "mov eax,[ebx+0x00001234+ecx*4]" in real mode (but you will get a general protection fault if the values in EBX or ECX are too high). It would be nice (although not strictly required) to test if the CPU is "80386 or later" before using any 32-bit instructions - that way if the CPU is ancient you can display a nice error message rather than crashing.

Neob91 wrote:When in protected mode, instead of using physical segment numbers, offsets of GDT entries have to be used. Therefore, if I have binary data and code in the memory, I have to set up 2 GDT entries for this (both describing the same memory space, one being a data segment, the other a code segment and put them in CS and DS), so that the code will be able to access the data correctly. (or is there a better way?)

If you want to be able to write to the data (e.g. not "read only") then you must have at least 2 GDT entries - one for the code and one for the data. There is no other way.

Notes: The very first GDT entry is "null" and can't be used for anything else (so you actually need 3 entries where the first is ignored by the CPU). A lot of OSs set the base address of each segment to zero and the limit to 4 GiB, so that segmentation is effectively disabled, and then use paging for everything. A lot of OSs also use 2 privilege levels, which requires a minimum of 4 GDT entries (CPL=0 code, CPL=0 stack, CPL=3 code, CPL=3 stack/data).

Cheers,

Brendan

Neob91 · Post by **Neob91** » Wed Sep 08, 2010 9:56 pm

Ok, I set my GDT entries so that the segmentation is disabled and tried to print something on the screen from within the 32-bit mode. It doesn't work. Could you have a look at my code? I've been fighting with it for over an hour now...

Code: Select all

[ORG 0x7C00]
[BITS 16]
  jmp start

  gdtr dw 0
       dd 0

  gdt db 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
      db 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x9A, 0xFC, 0x00
      db 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x92, 0xFC, 0x00
[BITS 32]
start:
  cli ; disable interrupts

  mov ebx, dword[gdtr + 2]
  mov edx, gdt

  mov ecx, 6

  fillgdt: ; copy the three GDT entries to 0x7E00 (GDT location)
    mov eax, [edx]
    mov [ebx], eax

    add ebx, 4
    add edx, 4
    loop fillgdt

  mov ecx, 16378

  zerogdt: ; fill the remaining 8189 entries with zeros
    mov dword [eax], 0x0000
    add eax, 4
    add ebx, 4
    loop zerogdt

  xor eax, eax
  mov eax, 0x7E00 ; GDT location
  mov [gdtr + 2], eax
  mov eax, 0xFFFF ; GDT size minus 1 (65535)
  mov [gdtr], ax
  lgdt [gdtr]

  mov byte [0xB8000], 48
  mov byte [0xB8001], 7

  times 510-($-$$) db 0
  dw 0AA55h

gerryg400 · Post by **gerryg400** » Wed Sep 08, 2010 10:05 pm

Neob91 wrote:... It doesn't work ...

What doesn't work ?

Unless you are very lucky, you need to give a lot more information if you expect the answer quickly.

Neob91 · Post by **Neob91** » Wed Sep 08, 2010 10:14 pm

Nothing is being printed on the screen. A '0' (ASCII 48) character is supposed to be printed on the screen, but it's not.

gerryg400 · Post by **gerryg400** » Wed Sep 08, 2010 10:26 pm

Neob91 wrote:Nothing is being printed on the screen. A '0' (ASCII 48) character is supposed to be printed on the screen, but it's not.

Are you in protected mode ? You need to set the protected mode bit to execute 32 bit code.

Neob91 · Post by **Neob91** » Wed Sep 08, 2010 10:41 pm

Ok, I added code that would enter the protected mode, but now the MS Virtual PC says "An unrecoverable processor error has been encountered. The virtual machine will reset now."

Here's the new code:

Code: Select all

[ORG 0x7C00]
[BITS 16]
  jmp start

  gdtr dw 0
       dd 0

  gdt db 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
      db 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x9A, 0xFC, 0x00
      db 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x92, 0xFC, 0x00
[BITS 32]
start:
  cli ; disable interrupts

  mov ebx, dword[gdtr + 2]
  mov edx, gdt

  mov ecx, 6

  fillgdt: ; copy the three GDT entries to 0x7E00 (GDT location)
    mov eax, [edx]
    mov [ebx], eax

    add ebx, 4
    add edx, 4
    loop fillgdt

  mov ecx, 16378

  zerogdt: ; fill the remaining 8189 entries with zeros
    mov dword [eax], 0x0000
    add eax, 4
    add ebx, 4
    loop zerogdt

  xor eax, eax
  mov eax, 0x7E00 ; GDT location
  mov [gdtr + 2], eax
  mov eax, 0xFFFF ; GDT size minus 1 (65535)
  mov [gdtr], ax
  lgdt [gdtr]

  mov eax, cr0
  or al, 1
  mov cr0, eax

  mov ax, 16h
  mov ds, ax

  jmp 08h:7C00h+print

  print:
    mov byte [ds:0xB8000], 48
    mov byte [ds:0xB8001], 7

  times 510-($-$$) db 0
  dw 0AA55h

I'm not sure if the jump to print is done correctly.

gerryg400 · Post by **gerryg400** » Wed Sep 08, 2010 10:59 pm

The processor is in 16 bit mode until you set the PM bit and far jmp to the 32 bit bit code segment. Therefore the BITS 32 (or the code that sets the PM bit) is in the wrong place.

Brendan · Post by **Brendan** » Wed Sep 08, 2010 11:53 pm

Hi,

For NASM there's 2 forms for most directives. There's the user-level form (e.g. "org 0x7C00") that's intended for normal code, and there's a primitive form (e.g. "[org 0x7C00]") that's mostly intended for the assembler's internal use. In some cases the different forms are functionally equivalent, but in some cases they aren't, and in almost all cases (excluding tricky macros) you should use the user-level form and not the primitive form. For an example of where this matters, see "6.3 SECTION or SEGMENT: Changing and Defining Sections" in the manual.

For 80x86, there's a "default code size" which is determined by the attributes for the code segment (and is "16-bit" in real mode). The default size is effected by instruction prefixes, and there's an operand size override prefix and an address size override prefix. For example, in 16-bit code an instruction is 16-bit unless there's an override that makes it 32-bit; and in 32-bit code an instruction is 32-bit unless there's an override that makes it 16-bit. The assembler knows when an operand size override and/or address size override is needed (by checking if you've asked for a 32-bit instruction while in 16-bit code, or if you've asked for a 16-bit instruction in 32-bit code) and automatically inserts them for you. So that this works, you must make sure that the assembler knows what the default code size is. That's what the "bits 16" and "bits 32" directives are for. If you tell the assembler that the code is 32-bit when it's not (e.g. when you haven't changed CS to a 32-bit segment), then the assembler will fail to insert these overrides when they're needed and will insert them when they're not wanted.

You can load a GDT in real mode, but you must be in protected mode to use the GDT. To enter protected mode you need to set the bit 0 in CR0. When you do set this bit you'll still be operating in 16-bit code, and you'd need to load CS with a 32-bit segment (e.g. maybe with a "jmp far").

I doubt you'll ever need all 8192 GDT entries (e.g. a full-blown OS like Linux probably uses 6 of them). You can reduce the size of the GDT by setting the GDT limit to whatever you need (e.g. "GDT.limit = number_of_entries * 8 - 1") and save some RAM. Also, usually boot code sets up a temporary GDT that's used until the kernel (or something else) replaces it with the real GDT; which means to start with you probably only need a small GDT (with "null" and 2 actual entries) and there's no need to copy it elsewhere or allow space for any extra entries. You can also save 8 bytes by relying on the fact that the CPU never touches the first entry of the GDT and doesn't care what those first 8 bytes contain.

For instructions that reference memory (e.g. "mov ebx, dword[gdtr + 2]") there's an implied segment register involved (e.g. the DS segment register). Most instructions use DS as the implied segment register (but instructions that use EBP/BP or ESP/SP, like "mov eax,[esp]", use SS as the implied segment register; and there's a few instructions like STOSD that use ES as the implied segment register). In almost all cases (except string instructions like STOSD that use ES) the implied segment register can be overridden with a segment override prefix. When the BIOS starts your code, you can't assume that any of the segment registers contain any specific value. This means that an instruction like "mov ebx, dword[gdtr + 2]" (which relies on the "unknown" value in DS) may not access the memory location you expect, unless you explicitly set the segment register (e.g. set DS to zero) first. In this specific case, you could also set CS to zero (e.g. with a "jmp far 0x0000:start" at the start of the code) so that CS is set to a know value, and then use an override prefix (e.g. "mov ebx, [cs: gdtr + 2]") to use CS instead of DS.

The BIOS only loads 512 bytes (1 sector) from disk, which isn't enough to do much. To make it worse, for floppy disks there probably should be a "BIOS_Parameter_Block" , and for hard disks there probably should be space for a partition table, so you can't even use all of those 512 bytes. About the only thing you should do in the limited space you have is load more stuff from disk. If you switch to protected mode then you won't be able to use the BIOS (because you must be in real mode to use the BIOS), so switching to protected mode in those 512 bytes is a waste of time.

Cheers,

Brendan

Neob91 · Post by **Neob91** » Thu Sep 09, 2010 1:27 am

I finally made it work!

But I wonder, how will I receive keyboard input if I have to disable interrupts to enter protected mode?

Combuster · Post by **Combuster** » Thu Sep 09, 2010 3:15 am

Did you check the Interrupts page?

Neob91 · Post by **Neob91** » Thu Sep 09, 2010 3:23 am

Yes, which part are you referring to?

Neob91 · Post by **Neob91** » Sat Sep 18, 2010 4:50 pm

Hello again. I thought I shouldn't open a new thread for this, because my question is partially related to this thread. I'm now trying to understand paging. How to make sure that the code running with paging enabled runs in ring 3? From what I know ring is specified in the GDT entries, although I think paging is independent of the GDT, is that correct? How does it work? Also, why is it necessary to have a TSS, because I believe the code of an interrupt handler residing in a ring 0 selector should already be in ring 0 after an IRQ or a system call, shouldn't it?

OSDev.org

Code selectors and data selectors

Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors

Re: Code selectors and data selectors