General protection fault in 64 bit long mode

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
SpooK
Member
Member
Posts: 260
Joined: Sun Jun 18, 2006 7:21 pm

Post by SpooK »

untwisted wrote:Well, the first line of that code chunk is the offending line, but it doesn't matter WHAT that line is (we originally thought it was a SYSCALL, then someone moved code around and it appeared to be a SYSRET. I padded the function with some NOPs and noticed its just that particular address).

After more testing it HAS to be a problem with the way we're trying to get syscall / sysret working. I *assumed* that we were making it back to supervised mode because thats where the page fault is happening, but I'm not so sure anymore. Basically we're in limbo between our SYSCALL / syscall handler.
Trying hardcoding a byte with the hex value of 0x48 right before the sysret instruction, see what happens.
User avatar
untwisted
Posts: 19
Joined: Wed Feb 13, 2008 1:36 pm
Location: Pittsburgh
Contact:

Post by untwisted »

Ok, I've set the byte right before the sysret to 0x48 using a hex editor, and ran it. Nothing different that I can tell.

Am I missing something with this idea?
--untwisted
http://www.pittgeeks.org
XOmB exokernel project: http://xomb.org
SpooK
Member
Member
Posts: 260
Joined: Sun Jun 18, 2006 7:21 pm

Post by SpooK »

untwisted wrote:Ok, I've set the byte right before the sysret to 0x48 using a hex editor, and ran it. Nothing different that I can tell.

Am I missing something with this idea?
Just seeing if it was an issue with ensuring a 64-bit sysret.

PS: You shouldn't be changing anything with a hex editor, this will throw off address calculations. *ALWAYS* re-assemble/compile your code after any change.
User avatar
untwisted
Posts: 19
Joined: Wed Feb 13, 2008 1:36 pm
Location: Pittsburgh
Contact:

Post by untwisted »

My bad, I wasn't thinking hehe (and my asm is poor). I actually coded the value in appropriately this time and got the same page fault / gpf as before.
--untwisted
http://www.pittgeeks.org
XOmB exokernel project: http://xomb.org
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

To be honest, I'm getting the idea that the actual cause of the problem isn't properly traced. For every pagefault you know what address is involved, and what code caused it. For the code that caused it you can determine the corresponding line of source code, and the argument that it uses, based on that you can look wether the values are as expected. So far I only get the idea that you are trying to make the error shut up without figuring out the true source.

So, could you please provide more info:
1: on how the syscall handler is being set up
2: on how the (virtual) memory layout of the OS is
3: what that first page fault is about.
4: an compiled image that we can test ourselves.
5: and if possible, some way to view all the sourcecode
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
untwisted
Posts: 19
Joined: Wed Feb 13, 2008 1:36 pm
Location: Pittsburgh
Contact:

Post by untwisted »

We're not trying to make it just shut up, and we are tracing it (to the best of my knowledge) properly. We're all new to this and theres a very good possibility that we're doing things just plain wrong. Thanks for the help though :)

1: Right now there isn't much in the syscall handler. Here is the code. What we've intended it to do is just syscall / sysret loop just to make sure everything is working.
This code is toward the end of our main function.

Code: Select all

	if(!(cpuid(0x8000_0001) & 0b1000_0000_0000))
	{
		kprintfln("Your computer is not cool enough, we need SYSCALL and SYSRET.");
		asm { cli; hlt; }
	}

	const ulong STAR = 0x003b_0010_0000_0000;
	const uint STARHI = STAR >> 32;
	const uint STARLO = STAR & 0xFFFFFFFF;

	lstar.setHandler(&sysCallHandler);

	asm
	{
		// Set the STAR register.
		"movl $0xC0000081, %%ecx" ::: "ecx";
		"movl %0, %%edx" :: "i" STARHI : "edx";
		"movl %0, %%eax" :: "i" STARLO : "eax";
		"wrmsr";

		// Set the SF_MASK register.  Top should be 0, bottom is our mask,
		// but we're not masking anything (yet).
		"xorl %%eax, %%eax" ::: "eax";
		"xorl %%edx, %%edx" ::: "edx";
		"movl $0xC0000084, %%ecx" ::: "ecx";
		"wrmsr";

		// Jump to user mode.
		"movq $testUser, %%rcx" ::: "rcx";
		"movq $0, %%r11" ::: "r11";
		"sysretq";
	}

	
}

void sysCallHandler()
{
	asm
	{
		naked;
		"nop";
		"nop";
		"nop";
		"sysretq";
	}
}

extern(C) void testUser()
{
	asm
	{
		naked;
		"nop";
		"nop";
		"nop";
		"syscall";
	}
}
2: We're running a flat 1:1 memory model right now using 2mb pages. We have yet to do anything with virtual memory yet really.
3: The first fault is due to a page not present, it occurs on a write (during a print statement) and in supervised mode.
4: http://www.pittgeeks.org/projects/files/paganos.iso
5: http://xomb.googlecode.com/svn/trunk/
--untwisted
http://www.pittgeeks.org
XOmB exokernel project: http://xomb.org
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

I find it very strange that you 'recover' from the page fault. The code I've found simply prints a message upon pagefault, then returns, which would effectively cause the same fault to happen again. :?

I'll try to find some time to debug the CD image later today.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
untwisted
Posts: 19
Joined: Wed Feb 13, 2008 1:36 pm
Location: Pittsburgh
Contact:

Post by untwisted »

I really appreciate the help. It was my understanding that if a page is in memory, but doesn't have its present bit set it would fault (minor fault) and the bit would be set automagically. After writing that last sentence I'm second guessing myself though. If this sounds incredibly noobish, I'm sorry, we're all new at this :P

One of the members of the group set the SS to null at the top of our main function, and managed to avoid the GPE, but then got a different error because the return address for iretq was not in canonical form. He then & the top 32 bits with FFFFFF to see if that would fix it, and now our code manages to infinite loop jumping between user mode and supervised mode.

I'm not sure what that means really as I also thought that return addresses were popped on to the stack by the CPU, and I would have assumed that the CPU would ensure that the address was in canonical form.

This is all so frustrating! I wish the documentation was better.
--untwisted
http://www.pittgeeks.org
XOmB exokernel project: http://xomb.org
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

does that mean the problem's solved?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
untwisted
Posts: 19
Joined: Wed Feb 13, 2008 1:36 pm
Location: Pittsburgh
Contact:

Post by untwisted »

Well, thats a good question. We aren't getting the error anymore, but we still aren't sure why the error even happened, so I'm a bit worried that we're just sort of sweeping it under the rug and it'll come back to bite us later.
--untwisted
http://www.pittgeeks.org
XOmB exokernel project: http://xomb.org
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Post by os64dev »

I am the only one who finds the isr_common handling inside a C function wierd or very dangerous. You depend on the compiler not to insert any code before your asm block you should never do that. Write the isr_common in asm. problably that helps already.
Author of COBOS
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

Ehm, the kernel is written in D. I only figured that when I browsed the sources :? That makes it a bit dangerous to say that that's a compiler dependent thing... for all i know the "naked" directive could be part of the language specification
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
untwisted
Posts: 19
Joined: Wed Feb 13, 2008 1:36 pm
Location: Pittsburgh
Contact:

Post by untwisted »

Well, I think we've got this one licked. We got a new set of eyes on the project and he sat down and went line by line in the asm dump. He found a few bugs that we've since cleaned up (one being that we cast our longs to ints by accident in our atoi function). The GPE / page fault were coming from lstar being set incorrectly causing an invalid rIP to be pushed on to the stack.

Thanks for all of your help guys. Hopefully I'll be able to offer some help to the next noob that comes along like myself ;)

Edit: Just as an aside, the problem came with our inline asm. For some reason, the assembler wasn't paying attention to our clobber list and then wrote over registers that we needed to preserve.

It has since been fixed and the code committed to svn.
--untwisted
http://www.pittgeeks.org
XOmB exokernel project: http://xomb.org
Post Reply