Out of Range Pointer Error

jvc · Post by **jvc** » Wed Dec 31, 2014 9:41 pm

Hello,

I am working on a 32-bit kernel loaded by grub2. I am working on a little driver that gets the time from CMOS, and stores it in a structure that is placed on my heap. All of a sudden, I am getting an error on real hardware that states: "452: out of range pointer: 0x7." It then says press any key to exit which causes the computer to continue on in the boot list. This leads me to believe the error is occurring in grub for some reason.

Note: I do not get this error in VirtualBox.

This is the code that I added which seems to result in the error:

Code: Select all

static cmos_datetime* cmos_obtaintime()
{
    uint8_t statusB = 0; uint8_t bcd = 1; uint8_t hour24 = 0;
    cmos_datetime* datetime = kmalloc(sizeof(cmos_datetime));


    //Status B
    outportb(CMOS_CMD, CMOS_STATUS_B);
    statusB = inportb(CMOS_DATA);
    if (statusB & 0x02) hour24 = 1;
    if (statusB & 0x04) bcd = 0;

    //Time/Date
    outportb(CMOS_CMD, CMOS_YEAR);
    datetime->year = inportb(CMOS_DATA);
    if (bcd) datetime->year = BCDtoBINARY(datetime->year);
    datetime->year += cmos_century;

    outportb(CMOS_CMD, CMOS_MONTH);
    datetime->month = inportb(CMOS_DATA);
    if (bcd) datetime->month = BCDtoBINARY(datetime->month);

    outportb(CMOS_CMD, CMOS_DAY);
    datetime->day = inportb(CMOS_DATA);
    if (bcd) datetime->day = BCDtoBINARY(datetime->day);

    outportb(CMOS_CMD, CMOS_HOURS);
    datetime->hour = inportb(CMOS_DATA);
    if (!hour24) datetime->hour = cmos_convert12to24(datetime->hour, bcd); // DOES BCD conversion too
    else if (bcd) datetime->hour = BCDtoBINARY(datetime->hour);

    outportb(CMOS_CMD, CMOS_MINUTES);
    datetime->minute = inportb(CMOS_DATA);
    if (bcd) datetime->minute = BCDtoBINARY(datetime->minute);

    outportb(CMOS_CMD, CMOS_SECONDS);
    datetime->second = inportb(CMOS_DATA);
    if (bcd) datetime->second = BCDtoBINARY(datetime->second);
 
    return datetime;
}

void cmos_acpi_install()
{
    cmos_datetime* datetime = cmos_obtaintime();   
    kprintf("Current Time: %u.%u.%u %u:%u:%u\n", datetime->day, datetime->month, datetime->year, datetime->hour, datetime->minute, datetime->second); 


    kfree(datetime);
}

Brendan · Post by **Brendan** » Wed Dec 31, 2014 11:15 pm

Hi,

The first thing I'd do is put a "while(true) {}" in your code as close to the entry point as possible. If it still crashes, then it's very likely the problem is GRUB.

However, maybe GRUB sets up exception handlers, and your code crashes and triggers GRUB's exception handler. In that case the problem isn't GRUB; and I'd probably start by inserting code in various places (e.g. "printf("AA\n");") to determine how far it gets before it crashes.

The other thing I'd consider is changing it so that "cmos_obtaintime()" takes a pointer to a pre-allocated buffer (e.g. "static void cmos_obtaintime(cmos_datetime*outputBuffer)", so that you can do this:

Code: Select all

void cmos_acpi_install()
{
    cmos_datetime datetime;

    cmos_obtaintime(&datetime);
    kprintf("Current Time: %u.%u.%u %u:%u:%u\n", datetime.day, datetime.month, datetime.year, datetime.hour, datetime.minute, datetime.second); 
}

If that fixes the problem, then the problem was probably your "kmalloc()".

Cheers,

Brendan

jvc · Post by **jvc** » Thu Jan 01, 2015 11:24 am

Hi Brendan,

Thanks for the reply. I carried out your suggestions and found that they did stop the error. It gets weird though, printing the address out to the console also removes the error, with the heap functions still being called. Investigation of the output of this print shows that the heap is functioning as expected.

In terms of grub performing exception handling, I don't really see how it could because my OS takes over the IRQs/ISRs much earlier in its initialization process.

By adding a function in a completely different source file, the error stopped, and I haven't been able to reproduce it since. I am definitely unnerved by this disappearing bug, though my hunch is that it has something to do with grub not liking the image.

Jacob

xenos · Post by **xenos** » Thu Jan 01, 2015 1:12 pm

Looks like the error message comes from this line of the GRUB source:
http://anonscm.debian.org/cgit/pkg-grub ... tor.c#n452
This part of the code should not be executed after control has been passed to your kernel image. So either your kernel jumps to that point for whatever strange reason (which I would expect to rather cause a triple fault or at least some exception instead of just printing, since you probably already messed with the screen), or it happens in GRUB before it passes control to your kernel. This is probably hard to debug, but with Bochs debugger it should be possible. I would use a version of your image which reproduces the bug and check in Bochs or QEMU with GDB where it runs to and what the call stack looks like.

KemyLand · Post by **KemyLand** » Thu Jan 01, 2015 4:52 pm

jvc wrote:Hi Brendan,

Thanks for the reply. I carried out your suggestions and found that they did stop the error. It gets weird though, printing the address out to the console also removes the error, with the heap functions still being called. Investigation of the output of this print shows that the heap is functioning as expected.

In terms of grub performing exception handling, I don't really see how it could because my OS takes over the IRQs/ISRs much earlier in its initialization process.

By adding a function in a completely different source file, the error stopped, and I haven't been able to reproduce it since. I am definitely unnerved by this disappearing bug, though my hunch is that it has something to do with grub not liking the image.

Jacob

You may think that this is a solution, but almost always, this kind of "nothing to do" repairs tend to shadow important the details about a bug that can still be there, and can come back later

. I'll undo that "repair" and do what XenOS suggests, so that we and you can know where the bug comes from...

When I find a bug in my kernel's harsh early stages, before ISR's are setted up, I tend to use GDB's next and step commands. If I reach a point where EIP becomes something like 0xe000, and GDB's says I'm in function ??, I'ld have reached a GRUB exception handler . The solution? Once there, dump the stack frames and you'll get the offending procedure.

jvc · Post by **jvc** » Thu Jan 01, 2015 8:51 pm

Hi,

Those are good suggestions, I will take a look.

Jacob

jvc · Post by **jvc** » Sun Jan 04, 2015 12:10 pm

Hi,

It looks like I found the bug, though I am a little confused about it. It had to do with my linker script. The original one is as follows:

Code: Select all

ENTRY(krnlstart)
OUTPUT_FORMAT(elf32-i386)

SECTIONS 
{
	. = 0x100000;

	. += 0xC0000000;


	.text ALIGN (4096) : AT(ADDR(.text) - 0xC0000000)
	{
            *(.multiboot)
			*(.text)
	}

	.data ALIGN (4096) : AT(ADDR(.data) - 0xC0000000)
	{
			*(.data)
			*(.rodata*)
	}

	.bss ALIGN(4096) :AT(ADDR(.bss) - 0xC0000000)
	{
			*(COMMON)
			*(.bss)
	}

	kernel_end = .;
}

And here is the slightly modified one:

Code: Select all

ENTRY(krnlstart)
OUTPUT_FORMAT(elf32-i386)

SECTIONS 
{
	. = 0x100000;

	. += 0xC0000000;


	.text ALIGN (4096) : AT(ADDR(.text) - 0xC0000000)
	{
            *(.multiboot)
			*(.text)
	}

	.data ALIGN (4096) : AT(ADDR(.data) - 0xC0000000)
	{
			*(.data)
			*(.rodata)
	}

	.bss ALIGN(4096) :AT(ADDR(.bss) - 0xC0000000)
	{
			*(COMMON)
			*(.bss)
	}

	kernel_end = .;
}

Note how the entry for *(.rodata) no longer has the second '*'. I'm not entirely sure exactly why this solved it, though, Looking at the outputted linker maps, the .rodata.str1.4 (etc) sections are now being placed before the .data section instead of after with the other .rodata stuff.

Jacob

xenos · Post by **xenos** » Sun Jan 04, 2015 1:09 pm

It doesn't look like you found the bug, but rather another cure for the symptoms...

jvc · Post by **jvc** » Sun Jan 04, 2015 1:24 pm

It's weird though, adding a printf statement before I install my idt "solves" the problem, while adding it after does not.

KemyLand · Post by **KemyLand** » Sun Jan 04, 2015 1:31 pm

jvc wrote:It's weird though, adding a printf statement before I install my idt "solves" the problem, while adding it after does not.

As always, this type of "solutions" will come back as bad bugs later on. Maybe not today, maybe not tomorrow, but someday

.

jvc · Post by **jvc** » Sun Jan 04, 2015 1:39 pm

Still searching for it, its just difficult when any debugging code stop the bug from occurring. I am also not able to reproduce this bug in an emulator, only on real hardware.

jvc · Post by **jvc** » Sun Jan 04, 2015 2:39 pm

So I have concluded that my kernel's entry point never even gets called, this error is happening before my kernel runs.

Brendan · Post by **Brendan** » Mon Jan 05, 2015 7:28 am

Hi,

jvc wrote:So I have concluded that my kernel's entry point never even gets called, this error is happening before my kernel runs.

In that case, there's about 6 possibilities:

The hardware is faulty, and something about the computer causes GRUB to crash. E.g. maybe there's faulty RAM.
The hardware/firmware is buggy, and something about the computer causes GRUB to crash. E.g. maybe there's a bug in the way the BIOS implemented one of the BIOS functions that GRUB uses.
GRUB is buggy, and something about the computer causes GRUB to crash. E.g. maybe GRUB expects to be able to use 636 KiB of RAM at 0x00000000, but on that computer the EBDA is larger.
GRUB is buggy, and something about your files cause GRUB to crash. E.g. maybe your kernel has a section that's supposed to be loaded at 0x10000000 and GRUB fails to check if there's RAM at 0x10000000 before attempting to load that section.
Your files are buggy. E.g. maybe your kernel violates the multi-boot spec in some way, and it's unreasonable to expect GRUB's sanity checks to detect it. Note: I can't actually think of a good example of this that would work on some computers but not others.
Something about the way you've created the boot image is wrong. For example, maybe you're booting from USB flash and that specific device tells the BIOS there's 32 sectors per track, but you've assumed 63 sectors per track in your partition table causing GRUB to load the wrong code/data using the wrong "CHS" values.

Mostly, you're going to need more information (and more testing to obtain more information). For example, maybe try a very minimal kernel (e.g. a 32 byte flat binary) and see if GRUB will start it correctly, or a different type of boot device (e.g. boot from CD-ROM instead of USB flash), or a different version of GRUB, or...

Cheers,

Brendan

jvc · Post by **jvc** » Mon Jan 05, 2015 6:53 pm

Thanks for the reply Brendan. So I tested on a few different computers, and the bug was reproduced on all of them, though it still does not appear in an emulator. I also tried a few different versions of Grub, and the error appeared on all three. I tried a couple different boot mediums as well, and the same issue.

I found that the bug is highly dependent on a very very specific layout for the binary. As in adding a single function call, or even a single parameter to an existing function call will cause the bug to disappear. Once again, the kernel does not execute, which I found by placing an hlt instruction right at the entry point (after a cli instruction that was there from before). I did have to remove a later instruction (a mov command two lines down) to get the bug to persist.

Furthermore, the bug only appears when the kernel is loaded to virtual address 0xC0100000 (3GB and 1MB) and physical address 0x100000 (1MB). If I use the linker script to offset the kernel by a single page, the bug disappears.

I really feel like this might be some really finicky grub thing, where it just does not like this very specific set of circumstances. Maybe I'm wrong, so I shall try a different boot load, maybe I will get somewhere that way.

Jacob

jvc · Post by **jvc** » Mon Jan 05, 2015 7:21 pm

So I tested with Grub legacy and it works just fine.
It was grub2 that has the bug.

OSDev.org

Out of Range Pointer Error

Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error

Re: Out of Range Pointer Error