Page 1 of 1

[Solved] Returning to ring 0 with an interrupt triple faults

Posted: Sun Jun 21, 2020 10:43 pm
by Codepixl
EDIT: See my post below with the register dumps from QEMU.

I'm sure I'm doing something stupid, but I've been trying to fix this for the better part of a day and I cannot figure it out for the life of me.

In my x86 OS, I've successfully set up paging (with a higher-half kernel) and software preemptive multitasking, although at the moment, only ring 0 processes work properly. When the PIT generates an interrupt for my kernel to preempt, it pushes all of the current task's registers onto the stack, loads the page table for the next process, sets the correct stack pointer for the next process, pops the next process's registers off the stack, and then IRETs. This has been working fine, until I tried to begin switching to ring 3.

I'm able to get into ring 3 just fine by pushing the correct segment registers to the stack along with a the stack pointer and ss. Stepping through it with GDB and qemu, I can tell that execution within ring 3 is working fine right up until I call an interrupt. As soon as an interrupt is called, execution immediately jumps to 0xe05b (which is weird, because absolutely nothing is mapped there). Calling different interrupts always jumps to 0xe05b.

Could there be something wrong with my TSS, or am I switching to usermode wrong? Could it be because I'm switching to usermode with an IRET from my PIT interrupt that I use to preempt? Scanning the intel manual hasn't gotten me anywhere, although I am a bit tired :)

Here's my code for setting up my TSS (It's a bit messy, because I forwent using my convenience methods to set up GDT entries for the TSS so that I could be sure everything was right, and I used https://wiki.osdev.org/Getting_to_Ring_3 for it):

Code: Select all

uint32_t base = (size_t) (&TaskManager::tss) - HIGHER_HALF; //HIGHER_HALF is 0xC0000000, AKA where the kernel starts in vmem
uint32_t limit = base + sizeof(TSS);

// Now, add our TSS descriptor's address to the GDT.
gdt[5].limit_low = limit & 0xFFFFu;
gdt[5].base_low = base & 0xFFFFFFu;
gdt[5].access.bits.accessed = true;
gdt[5].access.bits.read_write = false; 
gdt[5].access.bits.direction = false; 
gdt[5].access.bits.executable = true; 
gdt[5].access.bits.type = false; 
gdt[5].access.bits.ring = 3; 
gdt[5].access.bits.present = true;
gdt[5].flags_and_limit.bits.limit_high = (limit & 0xF0000u) >> 16u; 
gdt[5].flags_and_limit.bits.zero = 0;
gdt[5].flags_and_limit.bits.size = false;
gdt[5].flags_and_limit.bits.granularity = false; 
gdt[5].base_high = ( base & 0xFF000000u) >> 24u;

memset(&TaskManager::tss, 0, sizeof(TSS));

TaskManager::tss.ss0 = 0x10; //Kernel data
TaskManager::tss.esp0 = (size_t) stack; //Kernel stack
(After that code, my GDT is flushed and ltr is used)

And here's the handler for the PIT interrupt that preempts:

Code: Select all

	push eax
	mov eax, 0x20
	out 0x20, al ;EOI
	pop eax
	cmp byte [tasking_enabled], 0 ;tasking_enabled is a variable in my C++ code
	jne preempt_do
	iret
preempt_do:
	call preempt
	iret
And finally, here's preempt():

Code: Select all

//push current_proc process' Registers on to its stack
asm volatile("push %eax");
asm volatile("push %ebx");
asm volatile("push %ecx");
asm volatile("push %edx");
asm volatile("push %esi");
asm volatile("push %edi");
asm volatile("push %ebp");
asm volatile("push %ds");
asm volatile("push %es");
asm volatile("push %fs");
asm volatile("push %gs");
asm volatile("mov %%esp, %%eax":"=a"(current_proc->registers.esp));
//pop all of next process' Registers off of its stack
current_proc = current_proc->next;
asm volatile("movl %0, %%cr3": : "r"(current_proc->page_directory_loc)); //Load page directory for process
asm volatile("mov %0, %%esp" :: "r"(current_proc->registers.esp));
asm volatile("pop %gs");
asm volatile("pop %fs");
asm volatile("pop %es");
asm volatile("pop %ds");
asm volatile("pop %ebp");
asm volatile("pop %edi");
asm volatile("pop %esi");
asm volatile("pop %edx");
asm volatile("pop %ecx");
asm volatile("pop %ebx");
asm volatile("pop %eax");
(I realize most of the inline assembly in preempt() could be moved into my standalone assembly, but it was working before so I'm too scared to touch it ;)

Re: Calling interupts from ring 3 jumps to weird location

Posted: Mon Jun 22, 2020 5:16 am
by nexos
Hello,
Have you set the DPL bit of the IDT entries to be equal to ring 3? If you don't know what this means, look at Chapter 6 of the Intel Manuals.

Re: Calling interupts from ring 3 jumps to weird location

Posted: Mon Jun 22, 2020 10:23 am
by Codepixl
nexos wrote:Hello,
Have you set the DPL bit of the IDT entries to be equal to ring 3? If you don't know what this means, look at Chapter 6 of the Intel Manuals.
I hadn’t done that, thank you! Sadly, I’m still getting the exact same issue. I’ll keep futzing around with the idt entries and the TSS, but I haven’t gotten any results so far.

Re: Calling interupts from ring 3 jumps to weird location

Posted: Mon Jun 22, 2020 10:30 am
by nexos
I looked at your code again, and the inline assembly is probably the issue. Inline assembly always causes problems. It is generally best to put all assembly in separate files. It is more stable and portable to do it that way.

Re: Calling interupts from ring 3 jumps to weird location

Posted: Mon Jun 22, 2020 10:35 am
by nexos
I spotted the issue. In preempt_do, you call preempt. Call pushes the address to return to on the stack. The C function uses a ret instruction when finished to pop the address into the eip register. So a call looks like

Code: Select all

; This code is not exact, just an example
push eip
jmp whatever
and a ret looks like

Code: Select all

pop eip
What happened is you changed stacks in the C function. So the stack frame is not preserved, and ret is pulling a bogus address off the stack and jumping there. If you execute an iret at the end of the C function, that might just fix your problem. The C ABI does not permit stack changes in the middle of a function, as it messes up the stack frame.

Re: Calling interupts from ring 3 jumps to weird location

Posted: Mon Jun 22, 2020 11:03 am
by Codepixl
nexos wrote:I spotted the issue. In preempt_do, you call preempt. Call pushes the address to return to on the stack. The C function uses a ret instruction when finished to pop the address into the eip register. So a call looks like

Code: Select all

; This code is not exact, just an example
push eip
jmp whatever
and a ret looks like

Code: Select all

pop eip
What happened is you changed stacks in the C function. So the stack frame is not preserved, and ret is pulling a bogus address off the stack and jumping there. If you execute an iret at the end of the C function, that might just fix your problem. The C ABI does not permit stack changes in the middle of a function, as it messes up the stack frame.
So the story is, I wrote this tasking code ~4 years ago, dropped the project, and came back to it recently (and I realized how big of a hot, steaming pile of garbage it is). Turns out, the only reason my tasking is working before was because I was pushing the correct address to ret to on the stack of each new task I created. I'm now doing what you suggested, but I'm still getting the same issue.

The problem isn't that I can't switch into userspace or between tasks - those are both working fine. GDB shows the execution of my task in ring 3 is working just fine, right up until I call an interrupt. If I step through it, I'll see the int 0x80 instruction for my syscall, and then the next step immediately after that is outside both my kernel and program's space at 0xe05b. At that point, using the x command in qemu to prod around the virtual memory shows that neither my program nor kernel are even mapped anymore, even though they were mapped properly right before the int instruction. Does this mean something is messing with cr3 or cr0? Execution thereon generates a triple fault. The IDT gate for that interrupt is setup with CPL 3 and everything, and works completely fine when the exact same task is run in ring 0.

The fact that it ends up at some weird address immediately after calling the interrupt tells my gut that either some sort of exception is being fired and it isn't able to look up the handler routine for it properly, or the IDT isn't working properly in ring 3 for some reason. I'm going to keep poking around the memory with gdb and qemu and try to figure out what's going on.

Re: Calling interupts from ring 3 loads garbage registers

Posted: Mon Jun 22, 2020 11:11 am
by Codepixl
New discovery: dumping my registers in the qemu monitor shows that as soon as an interrupt is called while in ring 3, garbage values (most of which are zero) are loaded into every register (including the segment registers).

Registers in ring 3 before the int 0x80 instruction:

Code: Select all

EAX=00000001 EBX=00000000 ECX=00000000 EDX=080497c0
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000fd0
EIP=08048245 EFL=00000216 [----AP-] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
CS =001b 00000000 ffffffff 00cffa00 DPL=3 CS32 [-R-]
SS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
DS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =002b 0000b660 0001b6c8 0001e900 DPL=3 TSS32-avl
GDT=     c011b6e0 0000002f
IDT=     c0179020 000007ff
CR0=80000011 CR2=00000000 CR3=002d1000 CR4=00000010
1 step after:

Code: Select all

EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
Is this a TSS issue, or is it because of something wrong with my stack? All of these garbage values get loaded in with just one single instruction. Right before the interrupt, the IDT is mapped in the correct place in memory and everything in it seems to be the way I set it.

EDIT: It's probably a TSS issue, right? The SS and ESP are probably being loaded with garbage values when I switch back to CPL=0 with the interrupt and that in turn screws everything else up.

Re: Returning to ring 0 with an interrupt jmps to weird loca

Posted: Mon Jun 22, 2020 11:54 am
by nexos
Have you executed a ltr instruction to load the TSS into the task register? I had a problem like this a couple months ago, I ended up almost pulling my hair out and then decided to re-code my scheduler. Is there an online repository I can see you OS code at? Have you set the cs, ds, es, fs, and gs members of the TSS?

Re: Calling interupts from ring 3 jumps to weird location

Posted: Mon Jun 22, 2020 1:01 pm
by Octocontrabass
Codepixl wrote:If I step through it, I'll see the int 0x80 instruction for my syscall, and then the next step immediately after that is outside both my kernel and program's space at 0xe05b.
That address is the second instruction of the BIOS ROM. Your code is causing a triple fault.

Since you're using Qemu, you might tell it to log interrupts ("-d int" on the command line) so you can see which exception is occurring and what state the CPU is in when it happens. And while you're at it, you might want to check your exception handlers, since they should be catching whatever is happening. (You do have exception handlers, right?)

And it's been mentioned already, but that inline assembly in preempt() is all kinds of wrong. There's no guarantee it will keep working, and honestly I'm surprised it ever worked in the first place.

Re: Returning to ring 0 with an interrupt jmps to weird loca

Posted: Mon Jun 22, 2020 1:19 pm
by nullplan
The base address of the TSS segment must be a linear (i.e. virtual) address, not a physical one. And the limit is supposed to be one less than the size. So your snippet should properly start with:

Code: Select all

uint32_t base = (uint32_t)&TaskManager::tss;
uint32_t limit = sizeof (TaskManager::tss) - 1;
Next, never modify stack in inline assembly. This will break things. Write it in assembly. Something like this:

Code: Select all

extern void preempt_asm(void **oldstack, void *newstack);
void preempt(void) {
  auto proc = current_proc;
  current_proc = proc->next;
  preempt_asm(&proc->registers.esp, current_proc->registers.esp);
}

Code: Select all

preempt_asm:
  mov eax, [esp+4]
  mov ecx, [esp+8]
  push ebp
  push ebx
  push edi
  push esi
  push fs
  push gs
  mov [eax], esp
  mov esp, ecx
  pop gs
  pop fs
  pop esi
  pop edi
  pop ebx
  pop ebp
  ret
DS and ES have to be saved and restored on a userspace-->kernel space transition (and back) anyway, so there is no need to save them here. FS and GS are usually not used in-kernel, so saving them on process switch is alright here. And EAX, ECX, and EDX are volatile, so their values are lost after the call.

Re: Returning to ring 0 with an interrupt jmps to weird loca

Posted: Mon Jun 22, 2020 3:30 pm
by Codepixl
nullplan wrote:The base address of the TSS segment must be a linear (i.e. virtual) address, not a physical one. And the limit is supposed to be one less than the size. So your snippet should properly start with:

Code: Select all

uint32_t base = (uint32_t)&TaskManager::tss;
uint32_t limit = sizeof (TaskManager::tss) - 1;
Next, never modify stack in inline assembly. This will break things. Write it in assembly. Something like this:

Code: Select all

extern void preempt_asm(void **oldstack, void *newstack);
void preempt(void) {
  auto proc = current_proc;
  current_proc = proc->next;
  preempt_asm(&proc->registers.esp, current_proc->registers.esp);
}

Code: Select all

preempt_asm:
  mov eax, [esp+4]
  mov ecx, [esp+8]
  push ebp
  push ebx
  push edi
  push esi
  push fs
  push gs
  mov [eax], esp
  mov esp, ecx
  pop gs
  pop fs
  pop esi
  pop edi
  pop ebx
  pop ebp
  ret
DS and ES have to be saved and restored on a userspace-->kernel space transition (and back) anyway, so there is no need to save them here. FS and GS are usually not used in-kernel, so saving them on process switch is alright here. And EAX, ECX, and EDX are volatile, so their values are lost after the call.
My tasking code ultimately wasn't the problem; it was the TSS being set up improperly. Thank you! I did refactor the tasking code anyway like you suggested. I also ended up pushing/popping the general purpose registers around the call to preempt under preempt_do and added an extra argument to preempt_asm for a new cr3 register value so I can switch page directories if needed.