Long mode task switching

Adan · Post by **Adan** » Fri Apr 13, 2007 9:02 pm

Please, could anybody give me a DETAILED example of how to set up a RING3 task for execution in long mode? And when I say detailed, I really mean DETAILED, if it´s not too much work of course.

Which descriptors and segments do I have to initialize and how?
Once these system resources are set up, how do I jump to RING3 in LONG MODE?
Where is stored the data that the long mode TSS misses comparing it to the protected mode one? Where are for example the CS ,RIP or CR3 of different tasks (including the kernel task) saved and when exactly? Is this saved by the scheduler to some memory address? How do you change the current code selector in CS to a DPL3 segment selector and RIP while being still in kernelspace if they are not read from a TSS? (I really don´t understand software task switching what is too bad

).

I´ve already set up paging so I can modify the kernel memory map in all ways. But, do I have to set up new PML4, PD(s), PDT(s) and PT(s) to switch to usermode?, or just change some of the kernel pages attributes?

I´ve also a working 64bit IDT and exception handlers that print registers content on an exception, just in case something goes wrong.

Hope someone can help me with this cause I´m really stuck and fed up with reading a lot in the AMD64 manuals without being able to do a step further in OS dev on this architecture.

Thanks in advance.

Combuster · Post by **Combuster** » Sat Apr 14, 2007 6:19 am

The basic steps are:

1: create ring 3 descriptors in the GDT
2: create a tss and load it with the appropriate values
3: load something to execute into memory
4: make sure it is paged in with the correct flags (userspace bit)
5: push the necessary values for an iret onto the stack (either the current one or a new one)
6: execute the iret
(which is just one of many alternatives)

I´m really stuck and fed up with reading a lot in the AMD64 manuals without being able to do a step further in OS dev on this architecture.

Get used to it.

I suggest you try it again and come back when you have problems with one specific issue.

Adan · Post by **Adan** » Sat Apr 14, 2007 8:54 am

Thank you very much Combuster, I tried that last night but I´m having a page fault exeption, I think it´s a problem with the memory map. I´m using the kernel´s page tables and - only changed the PTE - (where I loaded some code) to user mode. The PML4E, PDPTE and PDTE are still in supervisor mode. Could be this the mistake?. Do I have to change all the path from the PDPTE to the PTE to usermode? I´ll post some code later.

Also, Why do I need a DATA segment to access a CPL3 stack or varibles if I don´t need a CPL0 DATA segment to write to kernel variables or use its stack as soon as I enter in long mode?. As far as I know, CODE segments are not writable, but I´m writing to it all the time when i set a variable inside the kernel!!

Could you explain this to me?

Thanks again.

Brendan · Post by **Brendan** » Sat Apr 14, 2007 9:39 am

Hi,

Adan wrote:I´m using the kernel´s page tables and - only changed the PTE - (where I loaded some code) to user mode. The PML4E, PDPTE and PDTE are still in supervisor mode. Could be this the mistake?. Do I have to change all the path from the PDPTE to the PTE to usermode? I´ll post some code later.

Yes. You'd need to give the CPL=3 code read access (at least) in all levels of the paging structures.

Adan wrote:Also, Why do I need a DATA segment to access a CPL3 stack or varibles if I don´t need a CPL0 DATA segment to write to kernel variables or use its stack as soon as I enter in long mode?. As far as I know, CODE segments are not writable, but I´m writing to it all the time when i set a variable inside the kernel!! Could you explain this to me?

There's "sections" in most executables (e.g. ".text", ".data" and ".bss"). Typically these are mapped by the OS to relevant page types. For example the ".text" section might be mapped as "read-only, executable", while the ".data" section might be mapped as "read/write, non-executable" and the ".bss" might be mapped as "allocate on demand, read/write, non-executable").

It's the OS's responsibility to setup these mappings to prevent incorrect access. For the kernel (or CPL=0 code in general) this gets a little more complicated as the CPU assumes the code knows what it's doing and (usually) gives it full access regardless of what the paging protection flags say.

Of course none of this has anything to do with the CPUs segment registers, which are (except for FS and GS) unused in 64-bit code.

Cheers,

Brendan

Adan · Post by **Adan** » Sat Apr 14, 2007 11:38 am

It´s clear to me now, but, some more doubts:

1. If I want to use a different value for the task´s CR3 (I´m using the KERNEL´S memory map for now at the time I want to switch to CPL3), how do I change that register (to a new usermode memory map) before ireting?

2. About the rest of the mechine state (GPRs and all..., where PML4 base is part of it), where do I have to save them if the long mode TSS doesn´t take care of them, and how and where (INSIDE THE SCHEDULING ROUTINE?) do I switch machine states when switching from a RING3 task to another RING3 task or from a RING3 task to the kernel and viceversa?.

3. Has the scheduler routine to take care of a "protected mode TSS like structure" dynamic array?

Hope you can help me and thank you all.

Combuster · Post by **Combuster** » Sun Apr 15, 2007 4:58 am

Adan wrote:1. If I want to use a different value for the task´s CR3 (I´m using the KERNEL´S memory map for now at the time I want to switch to CPL3), how do I change that register (to a new usermode memory map) before ireting?

In practice, all address spaces contain the kernel. The only thing you need to do is to create a new set of paging structures, and copy the kernel's pages into it. This allows you to change CR3 at will without crashes as everything will be in the same place before and after the switch. After that, its up to you which address space you use for which thread.

2. About the rest of the mechine state (GPRs and all..., where PML4 base is part of it), where do I have to save them if the long mode TSS doesn´t take care of them, and how and where (INSIDE THE SCHEDULING ROUTINE?) do I switch machine states when switching from a RING3 task to another RING3 task or from a RING3 task to the kernel and viceversa?.

The common approach is stack switching:
The thread gets interrupted, which stores a stack frame containing RIP and RFLAGS. you push all GPRs and segment registers, and you remember the value of RSP at the current location. Then you store this value somewhere in the thread's information structure, pick a new thread, load CR3 and RSP, then pop all registers the way you pushed them. Then you can do an IRET to restore the flags and program location.

3. Has the scheduler routine to take care of a "protected mode TSS like structure" dynamic array?

Even in protected mode, there is usually only one TSS (stack switching aka "software task switch" is faster than hardware switches). during task switch you just load a different value for ring 0 SP so that on interrupt the correct piece of memory is used as kernel stack.

Adan · Post by **Adan** » Sun Apr 15, 2007 8:31 pm

Thank you but...

The only thing you need to do is to create a new set of paging structures, and copy the kernel's pages into it.

1. What do you mean with "kernel´s pages"?. The complete kernel region ("kernel_bss_end - kernel_start" image)? The problem is that I´ve created one PML4, one PDPT, one PDT and four initial PTs, all of them are allocated using a stack based page frame allocator. They are part of the "kernel world" too but are not inside the contiguous region starting from where the kernel was loaded at start. 2. How do I copy the "kernel image" to a new memory map if part of it are non cuntiguous page frames? They are hard to trace!.

On the other side, if what you mean is to create an exact copy of the kernel´s paging structures BUT WITH A DIFFERENT PML4 BASE ADDRESS, that´s seem to be easier to do, but I still have to trace the paging structures to recreate an exact copy of them, do I?

Jeje, this throws away about a 40% of the paging tutorials I´ve read where there is always a VERY SIMPLE BUT AT THE SAME TIME A VERY USELESS example for a real OS.

Thanks again.