Page 1 of 1

Debugging Assistance Request

Posted: Fri Mar 16, 2007 7:55 am
by Kenny
Hey all.

I have my partial OS up and running. It's an extension of the tutorials of MuOS by Gregor Brunmar at http://hem.passagen.se/gregge/index_hires.html

My OS is loaded with the assembly kernel at 0x7C00 loading and calling a C kernel at 0x7E00, with a stack set up at 0x90000.

The kernel runs in segmented protected mode (as per the tutorials), it scrolls the screen nicely, buffers scan-codes and ascii codes, handles a few commands, etc. No multi-tasking or anything, everything is linear and in one binary. I have an interrupt table which calls my kprintf equivalent for any exceptions then hangs the system.

My problem arose when I created a new command to buffer some data. My keyboard routine called my new procedure, the procedure kprintf'd a debug string, then I declared a variable for a buffer.

Code: Select all

	unsigned char lBuffer[1500];
	short lLength = 0;

This is where things started to go ... odd.

With the buffer defined as 1500 bytes, the machine (and Bochs) would triple-fault and reset as soon as I called the new procedure. It didn't even seem to call my interrupt routines, which should have hung the system. If I change the definition to 1000 bytes, the system seems ok, it's also fine with a buffer of 2000 bytes. With any length between 1000 and 1500, the system will become extremely unstable, sometimes triple-faulting, sometimes calling my exception handlers for exceptions 6 (Invalid OpCode), 8 (Double-fault) or 13 (GPF) sometimes after a long pause.

I concluded that this behaviour must be something to do with the stack overwriting something (or other corruption), then returning to some invalid address or just out "into the weeds", but the kernel is only about 25k. I can also tried moving the inital stack down to start at 0x7000 with the same result.

I've poked around with Bochs, and the triple-fault seems to occur toward the end of my buffer-defining procedure, possibly as it's returning and deallocating the local variables from the stack, but I am unsure how to proceed any further with debugging on either my test machine or Bochs.

I fear that it might be some mistake in my interrupt handler. I've taken a copy of the kernel and tried to reduce it to just the code that causes the problem, but it's still too big to post here, the compiled size is now 4k.

One bizarre (yes, I'd say bizarre) observation I made when reducing the code, I have a section of code similar to the following:

Code: Select all

unsigned char mShiftKeyDown = 0;
unsigned char mCapsLock = 0;
unsigned char mKeymapGB[128] = {
	   0, 0, '1', '2', '3', ...
};

void KeyboardCommandProcessing()
{

	...
	
	if (mShiftKeyDown ^ mCapsLock == 1)
	{
	}
	else
	{
		lASCIICode = mKeymapGB[lScanCode];
	}

	...

}
This checks the status of the Shift and Caps lock keys before converting the keyboard scan code into an ASCII character. As I have modified it, the ELSE path is the only one that will ever be followed. With the code like this, the system triple-faults and reboots, ignoring the interrupt handlers. If I remove the IF and ELSE, leaving just the assignment of lASCIICode from mKeymapGB[], the system does not crash, and carrys on through the code to allocate my 1500 byte buffer properly.

I've been poking this for a few days now, and I have no idea how to proceed.

Any suggestions will be appreciated.

Thanks

Posted: Fri Mar 16, 2007 12:19 pm
by salil_bhagurkar
Get a linker map from ld and you'll understand which variable and function is going where. (ld -Map <file>)

I suggest using dynamic allocation instead of static allocation. Write a memory manager if you don't have one.

Are you declaring the buffer inside a function or outside? If it's inside its on stack and hence theres some stack problem. If you are declaring outside then it's in the data segment. It may be interfering with the idt or some vital area. Check the map.

Posted: Fri Mar 16, 2007 2:10 pm
by Brynet-Inc
salil_bhagurkar wrote:Get a linker map from ld and you'll understand which variable and function is going where. (ld -Map <file>)

I suggest using dynamic allocation instead of static allocation. Write a memory manager if you don't have one.

Are you declaring the buffer inside a function or outside? If it's inside its on stack and hence theres some stack problem. If you are declaring outside then it's in the data segment. It may be interfering with the idt or some vital area. Check the map.
It's ld -M or ld --print-map with GNU ld 2.15 though..

Posted: Sat Mar 17, 2007 4:47 am
by Kenny
Ok, thanks, I shall try that.

As an aside, when using the LIDT instruction to load the IDT, does the processor create a copy of the interrupt table, or does it just take a reference to the one as it is in memory?

In the original tutorial, the IDT is created in a local variable and then LIDT'd from there. Since encountering this problem and looking more in to how C handles variables, I'm concerned that it might be that my stack allocates the area for the IDT, the processor references it, then deallocates it and uses the space for something else after the Initialise_Interrrupts function has finished.

If this were the case, however, I would have expected the IDT to have been overwritten before now with all the other local variables I'm defining in between times, but it still could be a contributing factor.

Thanks

Posted: Sat Mar 17, 2007 5:28 am
by Combuster
The most likely culprit here is a stack overflow:

Declaring a variable inside a function will cause it to be allocated on the stack. The stack is however of limited size, so if you declare large arrays you might end up writing outside the stack area and inside code or data parts. (The area that gets overwritten depends on the size of the defined array)

Which is why declaring char variable[big_number] inside kernel functions is a Bad Thing. You should either declare them globally or use (k)malloc or something similar.

Posted: Sat Mar 17, 2007 6:01 am
by Kenny
@Combuster

Stack overflow was my first thought too, but the stack starts at 90000 (around 580k through the first 640k), and the lowest I've ever seen it is around 8FF00.

The kernel is upward of 7C00 (32k) and is only 25k or so of code and about 66k of uninitialised data, which should leave me huge amounts of space (over 450k) to allocate a 1.5k buffer.

As I said in the original post, I also moved the stack top to just below the kernel, to start at 7000, and to even higher, up to 9FFFF, and they both gave exactly the same result.

I haven't yet created a memory manager, I've been working to get a good functional codebase within 640k before worrying about mallocing the larger buffers.

@salil_bhagurkar & Brynet-Inc

Many thanks for that, the switch is very useful and also confirms what I was saying above, the top of the kernel data (code, data and bss sections) is 0x1EF30 (124k through the memory).

Going back to my above post, if it is a problem with allocating the IDT on the stack, would I need to convert my IDT and the IDT Descriptor that gets LIDT'd into global variables so that they will persist their contents correctly, that is if they are being referenced by the processor and not copied (none of the references I've found explains whether this is the case).

Thanks to all

Posted: Sat Mar 17, 2007 9:54 am
by frank
Kenny wrote: Going back to my above post, if it is a problem with allocating the IDT on the stack, would I need to convert my IDT and the IDT Descriptor that gets LIDT'd into global variables so that they will persist their contents correctly, that is if they are being referenced by the processor and not copied (none of the references I've found explains whether this is the case).
You cannot create the IDT on stack. The processor does not create a copy of the IDT it simple uses a pointer to the IDT that you have given it. Since you are allocating the IDT on the stack, it gets overwritten by other functions as soon as the function that it was allocated in returns. Try putting the IDT in global memory.

Posted: Mon Mar 19, 2007 7:37 am
by Kenny
Ok, some progress on this.

I've moved the IDT and its descriptor off the stack and into global variables, and this has fixed some of the problem. The machine no longer triple-faults.

Bear in mind that this is a segmented memory model at the moment.

I've been poking around in the code and gcc seems to make a design choice. If I declare an array of a certain size and specify an initial value, gcc decides whether to "inline" the population of the array ("Load byte 1 with 0x00, load byte 2 with 0x00 ...") or to store the contents elsewhere, and copy it into the stack byte-by-byte using movsb commands. This explains the seemingly random behaviour, the inline version works, the movsb version doesn't.

Has anyone else encountered anything similar? Are there gcc options to affect this behaviour so that I can force it and see if this is the source of the problem?

Allocating large arrays in C causes an Interrupt 11

Posted: Mon Mar 19, 2007 12:40 pm
by Kenny
All soved :D

The problem was (I believe, feel free to confirm / deny) as follows:

1. The IDT was being allocated on the stack as per the tutorial on which I based my work. This wasn't causing the problem, but was hindering it's resolution. The IDT was being overwritten and when the real exception occurred, the machine was triple-faulting.

2. Array handling in GCC. If you allocate a small array in C, such as:

Code: Select all

unsigned char lTest[3] = {0x01, 0x02, 0x03};
gcc compiles this into pseudo-code as:

Code: Select all

Allocate 3 bytes of space on the stack
Set byte 1 to 0x01
Set byte 2 to 0x02
Set byte 3 to 0x03
However ... as the array size increases, it will reach a point where gcc will decide to compile it in a different way, as follows:

Code: Select all

<<Contents of array initialiser stored in code>>
Allocate n bytes of space on the stack.
Loop n times, copying a byte from the Contents above into the stack space.
The point at which GCC decides to change from one to the other is decided by the compiler, and that is why (in my later tests) a 36-byte array was fine, and a 37-byte array crashed.

This second method uses the MOVSB assembly command which copies from DS:SI to ES:DI, and it seems it's very important that you have correctly configured your segment registers beforehand.

I hadn't set ES to a valid segment value, and whilst everything else in the whole kernel worked fine with this value missing, allocating and populating large arrays just crashed and burned. :(

(As it happens, my data and code segments are overlayed on top of one another, so reading from an address in the data segment, actually read from the same address in the code segment.)

Many thanks to all who posted on this.

**edit** DS:SI and ES:DI corrected as per Candy's response below.

Posted: Mon Mar 19, 2007 12:58 pm
by salil_bhagurkar
Great debugging Kenny!

Re: Allocating large arrays in C causes an Interrupt 11

Posted: Mon Mar 19, 2007 2:16 pm
by Candy
Kenny wrote:This second method uses the MOVSB assembly command which copies from ES:SI to DS:DI, and it seems it's very important that you have correctly configured your segment registers beforehand.
That's from DS:SI to ES:DI actually.

Posted: Mon Mar 19, 2007 7:21 pm
by Kenny
:oops: Yes, you're 100% correct, Candy. I have edited the above post so that if anyone else encounters this problem, I'm not feeding them mis-information, thanks for the catch.