iret error

suthers · Post by **suthers** » Sat Jun 07, 2008 6:10 am

kmcguire wrote:
suthers wrote:Thanks, I've managed to fix it.
@Candy: Thanks for the post, it allowed me to figure out what was wrong, I was still adding 4 to esp on top of pushing because I add stuff to the stack in the isr (as you probably figured out, that was the common part of the isr that... I was pushing a byte and for some reason adding 4, (Yah I'm pretty stupid sometimes...) so changing it to 1 fixed it.
Thanks,

Jules

edit: Oh and fixing this uncovered a series off other bugs...
I am still lost at how you were pushing a byte? You pushed sixty-four bits of data onto the stack inside the ISR stub as two zeros..

The pop eax and add esp, 4 looked correct. I can not find a instruction for pushing a immediate eight byte value in the 80386 instruction set while only incrementing the stack pointer by one. It increments it by four even though you only push one byte.

Didn't know that, even so, Its the only config that works, though it might explain the hundreds of errors that occur afterwards...
I'll try and debug it and post if i find anything interesting...
Thanks,

Jules

Dex · Post by **Dex** » Sat Jun 07, 2008 9:20 am

I think kmcguire second comment is right, i think your PROBLEM could be that some
errors push error code on to the stack others do not, you seem to be push dummy error code onto the stack for ALL errors.

Also would you not be best using a structure registers_t, which is a representation of all the registers you pushed ?.

NOTE: This is from a none C coders point of view.

suthers · Post by **suthers** » Sat Jun 07, 2008 11:17 am

I have only put dummy error codes where they are necessary...
I don't really see the point in passing all my registers to the function...
Still can't find any thing that would cause the error, though I haven't been searching for long (Just got back from school).
Thanks,

Jules

suthers · Post by **suthers** » Mon Jun 09, 2008 7:17 am

Ok, so I'm still trying to trace this error (I don't have much time because of exams, so it's taking ages...), firstly it turns, out that adding 4 to the esp works, but it still causes loads of errors, one after the other directly after an interrupt, I think its because it returns to the wrong address after the iretd.
So I wanted to ask, what's the best way to trace were the interrupt returns to?
Thanks in advance,

Jules

suthers · Post by **suthers** » Mon Jun 09, 2008 6:19 pm

It would be useful to know in what order an interrupt call pops SS, EIP, ESP, CS and the return address on the stack...
It would allow me to see what the return address is and would help me to debug...
Does anybody know in what order this is done?
Thanks in advance,

Jules

inx · Post by **inx** » Mon Jun 09, 2008 11:13 pm

I could tell you the order, but I think it would be better if I just told you it's in the Intel manuals.

suthers · Post by **suthers** » Tue Jun 10, 2008 3:53 am

fair enough, I'll find it (I should have done that in the first place anyway, sorry for breaking forum rules...).
Thanks,

Jules

AJ · Post by AJ » Tue Jun 10, 2008 4:26 am

Hi,

You could look at the Intel Manuals, but for something like this, I always find it quicker to use Sandpile (look up x86->structures->stack frame).

Cheers,
Adam

suthers · Post by **suthers** » Tue Jun 10, 2008 6:10 am

@AJ: Thanks, sandpile is incredibly useful, I've bookmarked it.
I found that the IRET was returning to the exact same instruction that caused the div 0 exception, hence explaining the div 0 error loop I was getting with an eventual GPF....
Am I supposed to increment the EIP on the descriptor for the iret, what should I do?
Thanks in advance,

Jules

suthers · Post by **suthers** » Tue Jun 10, 2008 6:56 am

Program flow continued normally once I added some code to increment the return address, but how would you deal with this otherwise in the kernel thread?
Thanks in advance,

Jules

AJ · Post by AJ » Tue Jun 10, 2008 7:01 am

Hi,

Personally, I would terminate a program on a Div 0 Exception. Why? Because the program has obviously got some data it's manupulating where something has gone wrong and you can't possibly know what the program was supposed to do. If you do anything other than terminating the program, you leave the system in an indeterminate state which is a Bad Thing(tm).

Cheers,
Adam

[edit]Going back to sandpile, the exception list there gives you some idea whether the exception is a fault or a trap and so whether EIP points to the erroneous instruction or the following instruction.[/edit]

suthers · Post by **suthers** » Tue Jun 10, 2008 7:37 am

AJ wrote:Hi,

Personally, I would terminate a program on a Div 0 Exception. Why? Because the program has obviously got some data it's manupulating where something has gone wrong and you can't possibly know what the program was supposed to do. If you do anything other than terminating the program, you leave the system in an indeterminate state which is a Bad Thing(tm).

Cheers,
Adam

[edit]Going back to sandpile, the exception list there gives you some idea whether the exception is a fault or a trap and so whether EIP points to the erroneous instruction or the following instruction.[/edit]

Thanks, that can be really useful (I assumed that they all didn't increment the EIP and all would point to the instruction that caused the error....)
Thanks,

Jules

P.S. Yay, I've finally fixed it

Dex · Post by **Dex** » Tue Jun 10, 2008 10:56 am

Could you post the working code, so it may help others, i am also interested in seeing the working code, to see the fault

.

suthers · Post by **suthers** » Tue Jun 10, 2008 11:39 am

No problem, I should have done so in the first place:

Code: Select all

_isr:
_isr0:
	cli
	push byte 0
	push byte 0
	jmp _isr

_isr1:
	cli
	push byte 0
	push byte 1
	jmp _isr

_isr2:
	cli
	push byte 0
	push byte 2
	jmp _isr
...

_isr31:
	cli
	push byte 0
	push byte 31
	jmp _isr

these are the blocks that I use to handle the interupts and they call:

Code: Select all

_isr:
	mov [store1], eax
	pop eax
	mov [store2], eax
	mov eax, [store1]
	pusha
	push es
	push ds
	push fs
	push gs
	mov ax, 0x10
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov eax, _int_handler
	push dword [store2]
	call eax
	pop eax
	pop gs
	pop fs
	pop ds
	pop es
	popa
	add esp, 4
	pop eax
	inc eax
	push eax
	sti
	iret

...
SECTION .data

...

store1 dd 0
store2 dd 0

int_handler:

Code: Select all


void int_handler(int err_code)
{
	k_printf("\nKERNEL PANIC:", 0x04);
	k_printf(exception_messages[err_code], 0x04);
	return;
};

(exception_messages is a char* array)

In the code I used:

Code: Select all

volatile int test = 10;

....

test /= 0;

to cause a div 0 exception.

The first error was caused by the amount I added to esp at the end, because I used "push byte", I assumed it only negate 1 from the stack, where as I now know it always negates 4, so by adding 4 to esp at the end to pass over the (error code/0 added to keep stack frame balanced), one can repair this error...
The second was because when the CPU calls the interrupt the value of the EIP that was pushed onto the stack was the address of the instuction that caused the error, so when i did an iret, it returned to that instruction, causing an exception loop...
So to work round this, I added code to increment the value of EIP that is on the stack by one (though of course this won't necessarily work as instructions have different lengths).
So for now i've decided to halt the CPU when an exception occurs in the kernel..., I'll change that when I get to implementing multi threading and user mode threads....
So the only reason that increasing the value of EIP used by iret by 1 worked was because the compiler converted:

Code: Select all

test /= 0;

into (something like):

Code: Select all

mov ax, 10

....

div [reg]

So, the only reason this works is because 'div [reg]' happens to be 2 bytes long, and the second part was probably a 1byte opcode that didn't do anything much...
(So actually its a better idea to add 2 to the esp used by iret....)
But doing this is a bit pointless because if you miss out an instruction, this might interfere with the working off your kernel, so its better to hlt the CPU (which is what I now do after displaying the error message)
Thanks for all the help.

Jules

Dex · Post by **Dex** » Tue Jun 10, 2008 12:32 pm

Thanks for the update