Page 1 of 1

GPF/Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 7:34 am
by zhiayang
As the title says, I'm getting a reset on iret. I'm following a multitasking tutorial from
http://code.google.com/p/onyxkernel/wik ... ltitasking, which I know worked before.

Either way, some background --
It's a long mode OS -- however, the ELF64 is objcopy'd to a ELF32 to be loaded by GRUB (legacy).


Switch.S:

Code: Select all



.global Switch
.type Switch, @function

Switch:
	push %r15
	push %r14
	push %r13
	push %r12
	push %r11
	push %r10
	push %r9
	push %r8
	push %rdx
	push %rcx
	push %rbx
	push %rax
	push %rbp
	push %rsi
	push %rdi

	mov %rsp, %rdi
	call puthex

	movq %ds, %rbx
	pushq %rbx

	movq %es, %rbx
	pushq %rbx


	pushq %fs
	pushq %gs



	// Load the kernel data segment.
	movw $0x10, %bp
	movl %ebp, %ds
	movl %ebp, %es
	movl %ebp, %fs
	movl %ebp, %gs






	// AMD system V calling convention mandates first integer parameter is passed in %rdi
	movq %rsp, %rdi			// stack => rdi
	call SwitchProcess
	movq %rax, %rsp


	mov $0x20, %al
	outb %al, $0x20


	mov $('I'), %rdi
	call VT_PutChar


	popq %gs
	popq %fs

	popq %rbx
	movq %rbx, %es

	popq %rbx
	movq %rbx, %ds



	pop %rdi
	pop %rsi
	pop %rbp
	pop %rax
	pop %rbx
	pop %rcx
	pop %rdx
	pop %r8
	pop %r9
	pop %r10
	pop %r11
	pop %r12
	pop %r13
	pop %r14
	pop %r15

	// Return to where we came from.

	mov %rsp, %rdi
	call puthex

	iretq

SwitchProcess:

Code: Select all


uint64_t SwitchProcess(uint64_t context)
{
	TaskList[CurrentPID].StackPointer = context;      // save the old context into current task
	switch(CurrentPID)
	{
		case 0:
			CurrentPID = 1;
			break;

		case 1:
			CurrentPID = 0;
			break;
	}
	return TaskList[CurrentPID].StackPointer;         // Return new task's context.
}

CreateTask:

Code: Select all


void CreateTask(int id, char name[64], void (*thread)())
{
		uint64_t* stack;
		Task_type *Task = (Task_type*)HAL_DMemAllocateChunk(sizeof(Task_type));

		Task->StackPointer = HAL_AllocatePage() + 0x800;       // Allocate 4 kilobytes of space

		memset((void*)((uint64_t)Task->StackPointer - 0x800), 0x00, 0x1000);

		stack = (uint64_t*)Task->StackPointer;

		// Expand down stack
		// processor data
		*--stack = 0x10;				// SS
		*--stack = 0;					// UserESP
		*--stack = 0x202;				// RFLAGS
		*--stack = 0x08;				// CS
		*--stack = (uint64_t)thread;	// RIP



		*--stack = 15;					// R15
		*--stack = 14;					// R14
		*--stack = 13;					// R13
		*--stack = 12;					// R12
		*--stack = 11;					// R11
		*--stack = 10;					// R10
		*--stack = 9;					// R9
		*--stack = 8;					// R8

		*--stack = 0xD;					// RDX
		*--stack = 0xC;					// RCX
		*--stack = 0xB;					// RBX
		*--stack = 0xA;					// RAX


		*--stack = 0xA0;				// RSP
		*--stack = 0xB0;				// RBP
		*--stack = 0xC0;				// RSI
		*--stack = 0xD0;				// RDI



		// data segments
		*--stack = 0x10;				// DS
		*--stack = 0x10;				// ES
		*--stack = 0x10;				// FS
		*--stack = 0x10;				// GS



		Task->State = 1;
		Task->StackPointer = (uint64_t)stack;
		Task->Thread = thread;
		// strcpy((char*)Task->Name, (char*)name);
		printk("<%x>\n", (uint64_t)Task->StackPointer);


		SetTaskInList(id, Task);
		SetCurrentProcessID(id);
}
Main:

Code: Select all



	void Idle();
	void Idle2();
	CreateTask(0, (char*)"Idle", Idle);
	CreateTask(1, (char*)"Idle2", Idle2);

	SetCurrentProcessID(0);

	void Switch();

        HAL_IDTSetGate(32, (uint64_t)Switch, 0x08, 0x8E);

	asm("sti");
(In main.c, Idle() and Idle2() are simply functions with a while(true) loop printing 'a' and 'b')


Any help?

I've used a bunch of hlt's and determined that the problem happens *at* the IRET.

Thanks for the help!

EDIT: Edited title and code to reflect changes


PS: You may be thinking -- where have I seen this question before..? Didn't this guy already ask these?
Yes! But that was a 32-bit OS, full of messy code C/P from tutorials -- this one is mostly from scratch, aside from this multitasking thing.

Re: GPF on IRET while doing multitasking

Posted: Thu Jun 13, 2013 8:03 am
by AJ
Hi,

I've just had a quick look, but a couple of points:

1. Shouldn't you explicitly be using iretq rather than iret?
2. Have you pedantically checked the stack?

Cheers,
Adam

Re: GPF on IRET while doing multitasking

Posted: Thu Jun 13, 2013 8:44 am
by zhiayang
AJ wrote:Hi,

I've just had a quick look, but a couple of points:

1. Shouldn't you explicitly be using iretq rather than iret?
2. Have you pedantically checked the stack?

Cheers,
Adam

1. Aha! That made it to a what appears to be a triple fault... It doesn't make sense though, I most definitely have installed ISR handlers, I suspect it's one of those weird resets...
2. How do you pedantically check the stack?


Also: I managed to get bochs to work via serial-out (Who would've known, boch's BGA doesn't work. Bochs Graphics Adapter.)
A different thing occurs, and not where I'd expected -- It's a page fault way before the actual task code; somewhere when I'm printing GRUB's memory map. It gets interrupted midway, which is odd because interrupts are disabled at that point. Bochs and I don't have the best relationship.


QEMU and VBox don't have problems there, only later on.
As for QEMU, it just resets constantly -- using -d int,cpu_reset don't help, because all the registers are 0x00 when it gets printed.
Virtualbox just hangs at the point where it'd reset in QEMU.

Re: Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 10:13 am
by AJ
Hi,
requimrar wrote:2. How do you pedantically check the stack?
You do it very thoroughly! Sorry - with pedantic being used as a GCC flag, this was perhaps not the best word to use! If you are getting an exception on the iret instruction, always check your stack alignment carefully first.

It sounds like this is not a single bug, but a collection of them. Are you switching privilege levels? If so, see what happens if you simply try an iret to ring 0. If that works, then chances are that some of your pages accessed by ring 3 code are supervisor pages. Also, check that the u/s bit in your page table entries works the way you think it does!

Finally, if all registers (including RIP?) are zero when you print your debug info, have you checked that 'void (*thread)()' is not a null pointer? If not and RIP is zero, this further suggests stack trashing.

Cheers,
Adam

Re: Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 10:21 am
by zhiayang
AJ wrote:Hi,
requimrar wrote:2. How do you pedantically check the stack?
You do it very thoroughly! Sorry - with pedantic being used as a GCC flag, this was perhaps not the best word to use! If you are getting an exception on the iret instruction, always check your stack alignment carefully first.

It sounds like this is not a single bug, but a collection of them. Are you switching privilege levels? If so, see what happens if you simply try an iret to ring 0. If that works, then chances are that some of your pages accessed by ring 3 code are supervisor pages. Also, check that the u/s bit in your page table entries works the way you think it does!

Finally, if all registers (including RIP?) are zero when you print your debug info, have you checked that 'void (*thread)()' is not a null pointer? If not and RIP is zero, this further suggests stack trashing.

Cheers,
Adam

Hi!

Stack alignment just involves making sure %sp points to a 16-byte aligned memory region... Right?!

Also, no ring switching for now. And by all registers being zero, I meant when qemu prints it to ardour, likely *after* the actual crash.

Thanks for your help AJ (adam)!

Re: Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 10:44 am
by AJ
Hi,

Again, my fault with the terminology - that's what comes of doing this between other bits of work! When I said "stack alignment", I meant more that when you IRETQ, you are actually at the stack location that you should be, so, for example, RIP is not actually receiving the stack entry you had intended for RAX.

You may like to load different values (non-zero, perhaps just 1,2,3 etc...) in CreateTask so that you can test whether the registers contain the expected values if you then stick a hlt instruction before IRETQ.

Cheers,
Adam

Re: Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 5:51 pm
by zhiayang
Ah yes, that makes sense -- why didn't I think of that?!
I'll fiddle with it when I get home today, thanks again for your help!

Re: Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 6:40 pm
by gravaera
Yo:

But isn't that just...one of the most common sense ways to confirm your assumptions, or test the register popping code you wrote? Assuming you tested it at all? And shouldn't this be one of the most obvious ways to debug problems when dealing with register popping code?

--Peace out,
gravaera

Re: Reset on IRET while doing multitasking

Posted: Thu Jun 13, 2013 6:50 pm
by zhiayang
gravaera wrote:Yo:

But isn't that just...one of the most common sense ways to confirm your assumptions, or test the register popping code you wrote? Assuming you tested it at all? And shouldn't this be one of the most obvious ways to debug problems when dealing with register popping code?

--Peace out,
gravaera

The register popping code is taken from my interrupt stubs, it works -- I didn't actually think that was the problem. In hindsight yes, it is quite obvious that that piece of code may be faulty. But it remains to be seen if that's the actual problem.

I may be an assembly programmer, but I'm not exactly the most competent one you'll find.

Re: Reset on IRET while doing multitasking

Posted: Fri Jun 14, 2013 7:28 am
by zhiayang
Right. So I've done my tests, the stack values are correct -- the IRQ/INT pushes SS, U-ESP, RFLAGS, CS then RIP, everything should be fine stack-wise.


So. It seems adding any kind of significant function call to the thread causes a f-up;
These are the 'threads':

Code: Select all

void Idle1()
{
	while(true);
}

void Idle2()
{
	while(true);
}
Nothing works, no matter what I put in the two functions.

Here's the ASM output of the functions Idle and Idle2:

ELF64 (original):

Code: Select all


0000000000105630 <Idle1>:
  105630:	e9 00 00 00 00       	jmpq   105635 <Idle1+0x5>
  105635:	e9 fb ff ff ff       	jmpq   105635 <Idle1+0x5>
  10563a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

0000000000105640 <Idle2>:
  105640:	e9 00 00 00 00       	jmpq   105645 <Idle2+0x5>
  105645:	e9 fb ff ff ff       	jmpq   105645 <Idle2+0x5>
  10564a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

ELF32 (objcopied):

Code: Select all

00105630 <Idle1>:
  105630:	e9 00 00 00 00       	jmp    105635 <Idle1+0x5>
  105635:	e9 fb ff ff ff       	jmp    105635 <Idle1+0x5>
  10563a:	66 0f 1f 44 00 00    	nopw   0x0(%eax,%eax,1)

00105640 <Idle2>:
  105640:	e9 00 00 00 00       	jmp    105645 <Idle2+0x5>
  105645:	e9 fb ff ff ff       	jmp    105645 <Idle2+0x5>
  10564a:	66 0f 1f 44 00 00    	nopw   0x0(%eax,%eax,1)

When QEMU resets, it's a triple fault (-d int,cpu_reset) -- at that point, RIP is 0x105640


Any help?

I'm really stumped on this...

Re: Reset on IRET while doing multitasking

Posted: Fri Jun 14, 2013 10:20 am
by AJ
Hi,

I'm clutching at straws a bit here, but another thought occurs to do with how the stack is paged. Your stack could be overrunning the area that you have created for it, thus causing a PFE. To debug this, try to:

1) Create the new process stack somewhere in the middle of the page you assign it instead of at the lowest possible address. I assume that you have created guard pages - maybe you are hitting one of those? If you are, then because you are in ring 0, the kernel may be trying to use the same stack for the exception handler. The stack is invalid already, therefore you are getting PF->DF->TF.
2) On the same note, you may like to set up the IST, so that when the PFE handler and/or double fault handlers are called, you switch stacks. This should mean that at least you can dump some meaningful register values to assist with your debugging.

If this debugging approach fails, then you'll have to post a link to your repo and hope that some kindly soul will have a look at it (it won't be me :) ).

Cheers,
Adam

Re: Reset on IRET while doing multitasking

Posted: Fri Jun 14, 2013 8:29 pm
by zhiayang
AJ wrote:Hi,

I'm clutching at straws a bit here, but another thought occurs to do with how the stack is paged. Your stack could be overrunning the area that you have created for it, thus causing a PFE. To debug this, try to:

1) Create the new process stack somewhere in the middle of the page you assign it instead of at the lowest possible address. I assume that you have created guard pages - maybe you are hitting one of those? If you are, then because you are in ring 0, the kernel may be trying to use the same stack for the exception handler. The stack is invalid already, therefore you are getting PF->DF->TF.
2) On the same note, you may like to set up the IST, so that when the PFE handler and/or double fault handlers are called, you switch stacks. This should mean that at least you can dump some meaningful register values to assist with your debugging.

If this debugging approach fails, then you'll have to post a link to your repo and hope that some kindly soul will have a look at it (it won't be me :) ).

Cheers,
Adam

1. I'm as much clutching at straws as you are... No I don't have guard pages -- in fact, the lower 8mb is identity mapped; my page allocator handed out something at 4+mb, so that shouldn't be the problem. But either way, I've tried moving the stack pointer lower -- 0x800 thereabouts... Nothing.

2. That's a good idea -- It's a good futureproof idea, but I think for now QEMU's register dump works.


Also. I should have mentioned this first, but cutting&pasting on the iPhone is not the best. But I solved the issue of IRQs not being sent --

Code: Select all

mov 0x20, %al
outb %al, $0x20
See the problem? So interrupts are getting sent again, but that doesn't actual help much. In fact, now *any* code in either thread crashes/resets QEMU.

Re: GPF/Reset on IRET while doing multitasking

Posted: Sat Jun 15, 2013 6:45 am
by zhiayang
SOLVED!

Credits, in no order of any kind:

Code: Select all

sortiecat
shikhin
Griwes
reavengrey
thePowersGang
All the guys at #osdev!


In the sprit of this community, I shall share what made it happen:

1. Don't push/pop %SP on the stack. (credits to sortiecat)
2. Make sure that %SS, when popped off the stack, is the same as %DS (thanks thePowersGang, shikhin and Griwes)
3. You don't need to push/pop the segment registers. (thanks, sortiecat)