Page 1 of 1

Page Fault on calling a (garbage) function pointer

Posted: Fri Mar 07, 2014 7:59 pm
by Lionel
Prefix: I don't know that this forum is exactly appropriate for this question, but I'll try anyway.

My kernels VFS code is page faulting on attempting to call the handler for reading the /dev/null device. It is defined as a function pointer to the read_null function in device_null_create(). The page fault occurs immediately when read_null is called from the vfs_node_t object held in the filesystem tree. The cause of this is an invalid pointer to 0xF000FF53, which is a garbage address (at least, I believe it is; my kernel at this point had only allocated to about 0x2F4000 of usable memory, as it is a lower half kernel).

My question is: What's causing my kernel to call that address instead of the proper address of my function?
Single-stepping in gdb has revealed little (or maybe it's because I don't use gdb often), and everything seems ("seems") okay until read_null is called. I have posted some relevant sources and output as well, and attempted to narrow it down to related and suspected functions.

main.c

Code: Select all

vfs_mount("/dev/null", device_null_create());
vfs_print_tree_node(vfs_tree->root,0);
printf("Reading from /dev/null\n");
vfs_node_t *null = kopen("/dev/null", 0);
printf("null returns:%d\n",read_vfs(null,0,0,0));
vfs.c

Code: Select all

uint32_t read_vfs(vfs_node_t *node, uint32_t offset, uint32_t size, uint8_t *buffer) 
{
	if (node->read) {
		printf("read_vfs: node->read = 0x%X\n",node->read);
		uint32_t ret = node->read(node, offset, size, buffer);
		return ret;
	} else {
		return 0;
	}
}
device_null.c

Code: Select all

uint32_t read_null(vfs_node_t *node, uint32_t offset, uint32_t size, uint8_t *buffer) {
	return 0;
}
...
vfs_node_t *device_null_create()
{
	vfs_node_t * fnode = malloc(sizeof(vfs_node_t));
	memset(fnode, 0x00, sizeof(vfs_node_t));
	fnode->inode = 0;
	strcpy(fnode->name, "null");
	fnode->uid = 0;
	fnode->gid = 0;
	fnode->flags   = VFS_CHARDEVICE;
	fnode->read    = read_null;
	fnode->write   = write_null;
	fnode->open    = open_null;
	fnode->close   = close_null;
	fnode->readdir = NULL;
	fnode->finddir = NULL;
	return fnode;
}
Kernel Log (Thank god I had the insight to print all output to the serial console (stdio), so it was an easy copy & paste)

Code: Select all

[x86]:Loading x86 components...
[CPU]:GDT Setup
[CPU]:IDT Setup
[IRQ]:Registered irq handler for 32 (IRQ0) at 0x10BD10
[IRQ]:Registered irq handler for 14 at 0x109EF0
[MEM]:Initialising and populating memory...
[MEM]:Marked 0x2F4000 (3096576) frames as dirty
[MEM]:Allocating Kernel Reserved Area...
[MEM]:Creating heap...!
[MEM]:Done allocating initial memory!
[x86]:Done starting hardware!
[KERN]:CoreLibs initialising...
[KERN]:Running Debug Kernel! Some things might not work properly!
[IO]:Verifiying timer / interrupts (waiting 10 ticks)
[KERN]:Finished initialising CoreLibs!
[VIDEO]:BGA unsupported, setting terminal as output
[VFS]:Starting VFS...
sbrk: allocating 1 pages to cover 0x1000 bytes
[root] -> (empty)
[VFS]:Mounting devices...
[VFS]:Searching for dev
[VFS]:Did not find dev, making it.
[VFS]:Searching for null
[VFS]:Did not find null, making it.
[root] -> (empty)
    dev -> (empty)
        null -> 0x2f4091 (null)
Reading from /dev/null
read_vfs: node->read = 0xF000FF53
[PF]:Page fault!
[PF]:Addr:0xF000FF53. present 1 rw 0 us 0 res 0 id 0
[SYS]:Encountered interrupt 14 (Page Fault)!
[KERN]:Halting!
Registers:
| eax 0x22; ebx 0x0; ecx 0xF000FF53; edx 0x0
| ??? 0x104AB8; ebp 0x104B08; err 0x0; efl 0x202
| usp 0x105E31; eip 0x104B08; esi 0x0; edi 0x0
| cs 0x8; ds 0x10; es 0x10; fs 0x10
| gs  0x10
I've tried to work this bug out, but it is either beyond my capabilities (I am still learning; but I do have at least half a head on my shoulders), or it's so simplistic it is flying over my head. If you need anymore just ask, but full up to date source code is in my github in the dev branch for convenience. I've also tried to ask this question intelligently, so if I made any errors in asking, could you tell me? I wish to improve.

Thanks,
Lionel

Re: Page Fault on calling a (garbage) function pointer

Posted: Fri Mar 07, 2014 8:20 pm
by sortie
I can't spot the error as such. It looks like it happens between the vfs_mount() and read_vfs() call. I bet your malloc is bad, your mount code is bad, something else or that one of your core assumptions is wrong:

You followed James Molley's tutorial. This is okay, but it contains errors and design flaws. You'll want to deprogram yourself using the (partial!) list of issues with that tutorial: http://wiki.osdev.org/James_Molloy%27s_ ... Known_Bugs - This might contain the solution to your problem if it is one of your assumptions being wrong.

I couldn't find this code on your github, so I couldn't look closely at the mount code.

Re: Page Fault on calling a (garbage) function pointer

Posted: Fri Mar 07, 2014 8:47 pm
by Lionel
Sortie:
Yeah, I once spent one full day fixing bugs and dumb implementation from that wiki page. I personally think it should be more visible. And I didn't really "follow", more like "decimated", but yeah, core idea's and such live on.
A direct link to the src/ directory (which lives in the dev branch on github) can be accessed here, specifically for mountcode: here.
Getting back to the problem at runtime (Get it? Cause cpu's don't hands? No?), I had suspected as much. The read_vfs call looks sound, as all it does is call the node's handler. The address of the function pointer points to a "garbage" memory address which shouldn't exist (however my page fault handler is reporting incorrectly or that page exists). My kernel's malloc function (which is extremely simple, but fully working and NOT placement malloc). The VFS seems to be mounting the null device into the tree correctly (as my vfs actually has a mountpoint tree instead of... nothing).
Entire vfs tree:

Code: Select all

before mount: 
[root] -> (empty)
after mount: 
[root] -> (empty)
    dev -> (empty)
        null -> 0x2f4091 (null)
Note that the file is called null, not that it is a null pointer.
The function pointer is somehow being assigned the wrong address to null_read, or the address is being read back incorrectly. Or it could be something else.
My mount code is possibly at fault; I'll have a closer look to see if I can find anything. Though I don't think memory allocation is at fault (I have done extensive tests of malloc-ing and freeing of large page-spanning sizes and it has worked perfectly), but that might not hold true for the other three layers of memory allocation. I'll look into that as well sortie.

Thanks,
Lionel

EDIT:
Analysing the parameter for VFS mount returns:

Code: Select all

[VFS]:Reading vfs node 'null'
[VFS]:Type: 
[VFS]:perm: 
[VFS]:uid:0, gid:0
[VFS]:inode:0
[VFS]:size:0 bytes
[VFS]:From device:unkn
Which doesn't make sense at all. Type is not shown at all, when it is a character device!
EDIT 2:
It might be kopen(). The returned vfs_node_t is garbage:

Code: Select all

[VFS]:Reading vfs node '' at 0x0
[VFS]:Type: file blockdev pipe mountpoint
[VFS]:perm: rw
[VFS]:uid:-1073732290, gid:-268370093
[VFS]:inode:-268370093
[VFS]:size:-268370093 bytes
[VFS]:From device:vfs
Address of read:0xF000FF53
Address of open:0xF000FF53
read_vfs: node->read = 0xF000FF53
The problem is that I don't know exactly whats broken. I am attempting gdb again but I fear it will be just as useless as last time.

EDIT #3:
Fixed! It was a problem with kopen(): I had forgot to update the code to point to check if there was a file system to point to the tree's filesystem root node, not the root node I had for legacy reasons. I knew those values looked like something when you follow a null pointer! I guess it was a simplistic problem after all! It now reads properly and returns 0. Thanks for getting me on the right track sortie!

Re: Page Fault on calling a (garbage) function pointer

Posted: Fri Mar 07, 2014 9:39 pm
by sortie
:)