An exception damages data
An exception damages data
Hi.
I'm working on multitasking.
I have such a simple tasks(task01, task02, task03)
I add these tasks to the list of threads that are executed in turn.
Since "return" cannot normally complete the task, I have the scheduler_thread_exit_current function(), it works as needed and successfully completes the task.
In cases where this function is not present, "return" passes control to the wrong place and a #UD or #GP exception occurs.
I want the process to be forcibly terminated if any of these exceptions occur.
But I found that my process table gets corrupted after calling an exception.
Describe how the exception works.
When exceptions occur, an assembly function(general_protection_fault) is called that calls the handler in C and passes parameters to it.
Control is then passed to the main handler(general_protection_fault_exception).
This is where the error is detected.
The logs show that when the first thread ends(id = 1)(voluntary termination), the process table looks like this.
Then control is passed to thread 3 (id = 3), there is no voluntary termination and #GP occurs, which just prints the process table, the address of the [2] element (thread 3) is incorrect, and its id is displayed incorrectly.
I don't know what could have damaged it.
I also tried browsing memory using qemu. Here's what he showed me(tested on task02 (thread 2), that's why #UD appears here):
You can see that by the time task02() completes, the address data is correct.
But as soon as control is passed to the handler in C(I couldn't debug the assembly header, my debugger can't do it there), the data in this cell damaged.
I believe there are errors in the assembler handler, but I don't see anything problematic there.
What might be the problem?
I would appreciate your help!
I'm working on multitasking.
I have such a simple tasks(task01, task02, task03)
I add these tasks to the list of threads that are executed in turn.
Since "return" cannot normally complete the task, I have the scheduler_thread_exit_current function(), it works as needed and successfully completes the task.
In cases where this function is not present, "return" passes control to the wrong place and a #UD or #GP exception occurs.
I want the process to be forcibly terminated if any of these exceptions occur.
But I found that my process table gets corrupted after calling an exception.
Describe how the exception works.
When exceptions occur, an assembly function(general_protection_fault) is called that calls the handler in C and passes parameters to it.
Control is then passed to the main handler(general_protection_fault_exception).
This is where the error is detected.
The logs show that when the first thread ends(id = 1)(voluntary termination), the process table looks like this.
Then control is passed to thread 3 (id = 3), there is no voluntary termination and #GP occurs, which just prints the process table, the address of the [2] element (thread 3) is incorrect, and its id is displayed incorrectly.
I don't know what could have damaged it.
I also tried browsing memory using qemu. Here's what he showed me(tested on task02 (thread 2), that's why #UD appears here):
You can see that by the time task02() completes, the address data is correct.
But as soon as control is passed to the handler in C(I couldn't debug the assembly header, my debugger can't do it there), the data in this cell damaged.
I believe there are errors in the assembler handler, but I don't see anything problematic there.
What might be the problem?
I would appreciate your help!
- max
- Member
- Posts: 616
- Joined: Mon Mar 05, 2012 11:23 am
- Libera.chat IRC: maxdev
- Location: Germany
- Contact:
Re: An exception damages data
Hey,
it's really hard to tell what exactly causes your memory to get faulty. We don't know about your memory layout etc after all.
Your process is running with user privilege level I guess? So you should be sure that the process itself does not override the kernel memory/your process table.
If yes, then it would be likely that there is something off with the way you handle exceptions. Did you make sure that the stack is set correctly when the exception occurs? Right when the handler is entered, is your structure already corrupt?
Greets
it's really hard to tell what exactly causes your memory to get faulty. We don't know about your memory layout etc after all.
Your process is running with user privilege level I guess? So you should be sure that the process itself does not override the kernel memory/your process table.
If yes, then it would be likely that there is something off with the way you handle exceptions. Did you make sure that the stack is set correctly when the exception occurs? Right when the handler is entered, is your structure already corrupt?
Greets
Re: An exception damages data
Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?
The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.
So, putting it all together:
Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.
Code: Select all
invalid_opcode:
cli
;save all 32bit registers
pushad
push dword [esp + 32] ;push eip
push dword [esp + 64] ;push cs
call invalid_opcode_handler
pop dword [esp + 64] ;pop cs
pop dword [esp + 32] ;pop eip
;return all 32bit registers
popad
;delete eip and cs from stack
add esp, 8
sti
iretd
So, putting it all together:
Code: Select all
invalid_opcode:
;save all 32bit registers
pushad
push dword [esp + 32] ;push eip
push dword [esp + 40] ;push cs
call invalid_opcode_handler
;delete eip and cs from stack
add esp, 8
;return all 32bit registers
popad
iretd
Carpe diem!
Re: An exception damages data
If I may add one remark: If I were you, I would try to separate the PIC logic from the interrupt logic. That will make it easier to support APIC, once the need arises. For example, my IRQ code looks like this: All IRQ entry points push 127 minus interrupt number (this gives a value between -128 and +127 for all possible values of the interrupt number, thus enabling the use of the short push encoding) and then jump to a common handler. This means, all IRQ entry points fit inside of 8 bytes (since the heavy lifting is done in the common handler), so with a few "align" directives, I can guarantee that they are all exactly 8 bytes apart from one another. Thereby allowing me to register them in a loop from C. The common handler will then subtract another 127 from the top of stack (yielding the negative interrupt number), push all registers, and call a C function. The C function will call a function pointer from a table, where all function pointers are initialized to point to a dummy function. The IRQ code will register its handlers (there are only 2) with the relevant interrupts. The IRQ code itself will also only call an interrupt handler from a table. This way, the PIC stuff is entirely isolated from the interrupt stuff. The PIC code will reserve 16 interrupts en bloc, but if it wasn't called, then only the APIC would reserve anything.
With APIC, you have no way to know at design time how many IRQs you may need. One IOAPIC may support 24 or 32 IRQ lines, but you may have many IOAPICs in your system. Usually you don't, but that means we need the interrupt number as data, so pushing it before transferring to a common handler is the only way to go. And then there's MSI, which means, essentially, that every PCI card can register another batch of interrupts. (With MSI it is only one, but then there's also MSI-X or what it was called). So yeah, separate the PIC code from the interrupt code while things are still simple.
With APIC, you have no way to know at design time how many IRQs you may need. One IOAPIC may support 24 or 32 IRQ lines, but you may have many IOAPICs in your system. Usually you don't, but that means we need the interrupt number as data, so pushing it before transferring to a common handler is the only way to go. And then there's MSI, which means, essentially, that every PCI card can register another batch of interrupts. (With MSI it is only one, but then there's also MSI-X or what it was called). So yeah, separate the PIC code from the interrupt code while things are still simple.
Carpe diem!
- bellezzasolo
- Member
- Posts: 110
- Joined: Sun Feb 20, 2011 2:01 pm
Re: An exception damages data
A few observations (to OP):nullplan wrote:Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.Code: Select all
invalid_opcode: cli ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 64] ;push cs call invalid_opcode_handler pop dword [esp + 64] ;pop cs pop dword [esp + 32] ;pop eip ;return all 32bit registers popad ;delete eip and cs from stack add esp, 8 sti iretd
So, putting it all together:Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.Code: Select all
invalid_opcode: ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 40] ;push cs call invalid_opcode_handler ;delete eip and cs from stack add esp, 8 ;return all 32bit registers popad iretd
How are you dealing with the error code? You need to pop it before iretd. But, on exception handlers that don't pass an error code, you need to avoid that.
Maintaining 256 different interrupt entry routines is difficult.
PUSHAD is OK, but it's a bit wasteful of stack space, and you don't have that option on x64.
Here's my entire 64 bit interrupt code:
Code: Select all
BITS 64
section .text
save_interrupt_registers:
pop rax
push rcx
push rdx
push r8
push r9
push r10
push r11
jmp rax
restore_interrupt_registers:
pop rax
pop r11
pop r10
pop r9
pop r8
pop rdx
pop rcx
jmp rax
swap_gs_ifpriv:
mov rax, QWORD[rbp+8+0x10]
and rax, 0x3 ;CPL
cmp rax, 0
je .next
swapgs
.next:
ret
%macro SAVE_VOLATILE_REGISTERS 0
push rax
call save_interrupt_registers
%endmacro
%macro RESTORE_VOLATILE_REGISTERS 0
call restore_interrupt_registers
pop rax
%endmacro
extern x64_interrupt_dispatcher
extern x64_save_fpu
extern x64_restore_fpu
extern xsavearea_size
extern memset
extern kprintf
;Stack layout:
;Old stack
;____________________
;PADDING
;____________________
;XSAVE
;____________________
;Old RSP
;____________________
save_fpu_interrupt:
pop r9 ;Return address
mov rax, xsavearea_size
mov rdx, [rax] ;Size of stack area
mov r8, rdx ;Length parameter for memset
mov rcx, rsp
sub rcx, rdx
and cl, 0xC0 ;Align stack
mov rdx, rsp ;Old stack pointer
mov rsp, rcx ;Load new stack pointer
push rdx ;Save old RSP on new stack
push r9 ;R9 is volatile, save our return address
;Set to 0
sub rsp, 32
push r8
mov rdx, 0
push rdx
push rcx
call memset
add rsp, 32+24
;Now save the FPU state
mov rax, [qword x64_save_fpu]
call rax
pop r9
jmp r9
restore_fpu_interrupt:
pop r8 ;Return address
pop r9 ;Old stack pointer
mov rcx, rsp ;XSAVE area
push r8 ;R8 is volatile
push r9 ;R9 is volatile
mov rax, [qword x64_restore_fpu]
call rax ;Save the state!
pop r9
pop r8 ;Restore R8
mov rsp, r9 ;Load old stack
jmp r8
;First parameter: error code passed
;Second parameter:
%macro INTERRUPT_HANDLER 2
global x64_interrupt_handler_%2
x64_interrupt_handler_%2:
%if %1 == 0
push 0 ;Dummy error code
%endif
;Stack frame
push rbp
mov rbp, rsp
SAVE_VOLATILE_REGISTERS
;Per CPU information
call swap_gs_ifpriv
call save_fpu_interrupt
;Now we pass the stack interrupt stack and vector
mov rcx, %2
mov rdx, rbp
add rdx, 8
sub rsp, 32
call x64_interrupt_dispatcher
add rsp, 32
call restore_fpu_interrupt
call swap_gs_ifpriv
RESTORE_VOLATILE_REGISTERS
pop rbp
add rsp, 8 ;Get rid of error code
iretq
%endmacro
%macro INTERRUPT_HANDLER_BLOCK 3
%assign i %2
%rep %3-%2
INTERRUPT_HANDLER %1, i
%assign i i+1
%endrep
%endmacro
INTERRUPT_HANDLER 0, 0
INTERRUPT_HANDLER 0, 1
INTERRUPT_HANDLER 0, 2
INTERRUPT_HANDLER 0, 3
INTERRUPT_HANDLER 0, 4
INTERRUPT_HANDLER 0, 5
INTERRUPT_HANDLER 0, 6
INTERRUPT_HANDLER 0, 7
INTERRUPT_HANDLER 1, 8
INTERRUPT_HANDLER 0, 9
INTERRUPT_HANDLER 1, 10
INTERRUPT_HANDLER 1, 11
INTERRUPT_HANDLER 1, 12
INTERRUPT_HANDLER 1, 13
INTERRUPT_HANDLER 1, 14
INTERRUPT_HANDLER 0, 15
INTERRUPT_HANDLER 0, 16
INTERRUPT_HANDLER 1, 17
INTERRUPT_HANDLER 0, 18
INTERRUPT_HANDLER 0, 19
INTERRUPT_HANDLER 0, 20
INTERRUPT_HANDLER_BLOCK 0, 21, 30
INTERRUPT_HANDLER 1, 30
INTERRUPT_HANDLER 0, 31
INTERRUPT_HANDLER_BLOCK 0, 32, 256
%macro dq_concat 2
dq %1%2
%endmacro
section .data
global default_irq_handlers
default_irq_handlers:
%assign i 0
%rep 256
dq_concat x64_interrupt_handler_,i
%assign i i+1
%endrep
stack_str: dw __utf16__(`Interrupt frame:\n Error code %x\n Return RIP: %x\n Return CS: %x\n Return RFLAGS: %x\n Return RSP: %x\n Return SS: %x\n`), 0
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
https://github.com/ChaiSoft/ChaiOS
Re: An exception damages data
I tried to use the code you suggested, but the memory is still corrupted... Suggestions about parameters are really correct, I noticed that I get incorrect cs and eip, now they are correct.nullplan wrote:Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.Code: Select all
invalid_opcode: cli ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 64] ;push cs call invalid_opcode_handler pop dword [esp + 64] ;pop cs pop dword [esp + 32] ;pop eip ;return all 32bit registers popad ;delete eip and cs from stack add esp, 8 sti iretd
So, putting it all together:Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.Code: Select all
invalid_opcode: ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 40] ;push cs call invalid_opcode_handler ;delete eip and cs from stack add esp, 8 ;return all 32bit registers popad iretd
Now it is important for me to decide why my table gets corrupted after an interrupt is triggered.
Do you have any suggestions? I don't even know what might be causing this behavior.
- bellezzasolo
- Member
- Posts: 110
- Joined: Sun Feb 20, 2011 2:01 pm
Re: An exception damages data
It's possible, and pretty simple, to set a memory watchpoint on the address in question.mrjbom wrote:I tried to use the code you suggested, but the memory is still corrupted... Suggestions about parameters are really correct, I noticed that I get incorrect cs and eip, now they are correct.nullplan wrote:Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.Code: Select all
invalid_opcode: cli ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 64] ;push cs call invalid_opcode_handler pop dword [esp + 64] ;pop cs pop dword [esp + 32] ;pop eip ;return all 32bit registers popad ;delete eip and cs from stack add esp, 8 sti iretd
So, putting it all together:Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.Code: Select all
invalid_opcode: ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 40] ;push cs call invalid_opcode_handler ;delete eip and cs from stack add esp, 8 ;return all 32bit registers popad iretd
Now it is important for me to decide why my table gets corrupted after an interrupt is triggered.
Do you have any suggestions? I don't even know what might be causing this behavior.
Write DR0 with the relevant address.
Write (3<<18) | (1<<16) | 1 to DR7 for a 4 byte memory write watchpoint.
And disable it (write 0 to DR7) before the writes you mean to do!
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
https://github.com/ChaiSoft/ChaiOS
Re: An exception damages data
Something interesting, in the code of the interrupt handler, I just made a process table output,nullplan wrote: So, putting it all together:Code: Select all
invalid_opcode: ;save all 32bit registers pushad push dword [esp + 32] ;push eip push dword [esp + 40] ;push cs call invalid_opcode_handler ;delete eip and cs from stack add esp, 8 ;return all 32bit registers popad iretd
Code: Select all
invalid_opcode:
;save all 32bit registers
call scheduler_thread_show_list
pushad
push dword [esp + 32] ;push eip
push dword [esp + 40] ;push cs
call invalid_opcode_handler
;delete eip and cs from stack
add esp, 8
;return all 32bit registers
popad
iretd
log:
Code: Select all
I'm thread #1 <--- task01
THREAD LIST: <--- start of invalid_opcode: header
thread_list[0] addr = 0x114e000, id = 0
thread_list[1] addr = 0x282, id = 4283691008
thread_list[2] addr = 0x1154000, id = 2
thread_list[3] addr = 0x1158000, id = 3
/--------------------------------------------------\ <--- main C handler invalid_opcode_exception()
invalid_opcode_exception!
cs = 8(0x8), eip = 18153694(0x11500de)
problematic_thread addr = 0x1150000, id = 1
THREAD LIST:
thread_list[0] addr = 0x114e000, id = 0
thread_list[1] addr = 0x282, id = 4283691008
thread_list[2] addr = 0x1154000, id = 2
thread_list[3] addr = 0x1158000, id = 3
\--------------------------------------------------/
I have no idea what can damage the data table.
Re: An exception damages data
Well, this change is invalid as well, since scheduler_thread_show_list is a C function, so is allowed to clobber EAX, ECX, and EDX. Another thing I forgot: You are in 32-bit mode, so segmentation is actually still important. So therefore you need to save DS and ES, and set them to 0x10 both, and restore them afterwards. Also, when the invalid opcode hits, the DF might be set, so it has to be deleted before calling a C function. So:
Code: Select all
invalid_opcode:
cld ;DF is already saved in interrupt frame
;save all 32bit registers
pushad
; save and set segments
push ds
push es
mov ax,0x10
mov ds,ax
mov es,ax
push dword [esp + 40] ;push eip
push dword [esp + 48] ;push cs
call invalid_opcode_handler
;delete eip and cs from stack
add esp, 8
; restore segments
pop es
pop ds
;return all 32bit registers
popad
iretd
Carpe diem!
Re: An exception damages data
I realized this is really harder than expected...nullplan wrote:Well, this change is invalid as well, since scheduler_thread_show_list is a C function, so is allowed to clobber EAX, ECX, and EDX. Another thing I forgot: You are in 32-bit mode, so segmentation is actually still important. So therefore you need to save DS and ES, and set them to 0x10 both, and restore them afterwards. Also, when the invalid opcode hits, the DF might be set, so it has to be deleted before calling a C function. So:Code: Select all
invalid_opcode: cld ;DF is already saved in interrupt frame ;save all 32bit registers pushad ; save and set segments push ds push es mov ax,0x10 mov ds,ax mov es,ax push dword [esp + 40] ;push eip push dword [esp + 48] ;push cs call invalid_opcode_handler ;delete eip and cs from stack add esp, 8 ; restore segments pop es pop ds ;return all 32bit registers popad iretd
But unfortunately and this does not solve, the table is still damaged...
I still don't have even a rough guess as to why this is happening...
It seems that this problem is extremely difficult to solve, perhaps return passing control to a random area damages the data.
I started thinking about whether I can write the return address to the stack when creating an array, so that return works as it should.
Is it possible to do this?
Re: An exception damages data
Looks like it is debug time. Add a watch point to the table entry once it is initialized. Add a handler for the debug exception that prints where you are. And then wait for the corruption to occur. To add a watchpoint, you write the base address into a debug register, then set the corresponding R/W field in DR7 to 01 and the corresponding LEN field to an encoding of the length. Although you may need three watch points for your structure (apparently, length 4 is the maximum).mrjbom wrote:I still don't have even a rough guess as to why this is happening...
You have a list with external storage, so it is possible to corrupt the list by bending the "data" pointer. It is also possible (though unlikely) that the previous node's "next" pointer was corrupted, but in your case, it would have to have been exceptionally lucky that the pointer happened to land somewhere where the "next" pointer went back to the correct node. So either something is overwriting the data pointer or something is corrupting where it is pointing to.
I'm a bit disappointed that x86 doesn't appear to have arbitrary length watch points.
Carpe diem!
Re: An exception damages data
I tried debugging the code.nullplan wrote:Looks like it is debug time. Add a watch point to the table entry once it is initialized. Add a handler for the debug exception that prints where you are. And then wait for the corruption to occur. To add a watchpoint, you write the base address into a debug register, then set the corresponding R/W field in DR7 to 01 and the corresponding LEN field to an encoding of the length. Although you may need three watch points for your structure (apparently, length 4 is the maximum).mrjbom wrote:I still don't have even a rough guess as to why this is happening...
You have a list with external storage, so it is possible to corrupt the list by bending the "data" pointer. It is also possible (though unlikely) that the previous node's "next" pointer was corrupted, but in your case, it would have to have been exceptionally lucky that the pointer happened to land somewhere where the "next" pointer went back to the correct node. So either something is overwriting the data pointer or something is corrupting where it is pointing to.
I'm a bit disappointed that x86 doesn't appear to have arbitrary length watch points.
For the test I have this code:
Code: Select all
uint32_t a = 123;
uint32_t dr7 = 0;
uint32_t dr0 = 0;
//write addr of 'a' to dr0
__asm__ volatile ("mov %%dr0, %0" :: "r" (&a));
__asm__ volatile ("mov %0, %%dr0" : "=r" (dr0));
//read dr7
__asm__ volatile ("mov %0, %%dr7" : "=r" (dr7));
//set 16-17 01b - write to dr7
set_n_bit(dr7, 16, 0);
set_n_bit(dr7, 17, 1);
//set 18-19 lenght 4 bytes - 11b
set_n_bit(dr7, 18, 1);
set_n_bit(dr7, 19, 1);
//write new dr7 value
__asm__ volatile ("mov %%dr7, %0" :: "r" (dr7));
//try...
a = 321;
But this is not so important.
I have shown that when trying to print the thread_list values at the very beginning of the Assembly handler they are corrupted.
Also, new problems started to occur during code modification, and now new elements seem to be added to the table.
The only logical guess is that ret passes control to some area of memory where commands are located that miraculously damage the data.
As far as I know, ret takes the return address from the stack, I decided to try filling the stack with some values, but it didn't bring any results.
I tried to investigate the operation of call/ret in emu8086, I realized that if I can properly configure the thread stack, I can force "return" to return the control to the correct place, just like I set the entry point. But I couldn't figure out where in the stack I should write my return address.
Re: An exception damages data
Well, maybe because that is a dead store. Or maybe the compiler allocates "a" into some register and will only spill it later. For things like that, I have set and read functions in my io.S that work exactly like the in and out functions, but for memory space instead of I/O space:mrjbom wrote:I expect a #DB exception to be thrown when writing a new value, but this does not happen.
Code: Select all
void setl(uint32_t*, uint32_t);
Code: Select all
.global setl
.type setl, @function
setl:
movl %esi, (%rdi)
retq
.size setl, .-setl
Code: Select all
.global setl,
.type setl, @function
setl:
movl 4(%esp), %eax
movl 8(%esp), %ecx
movl %ecx, (%eax)
retl
.size setl,.-setl
In all cases of "weird memory corruption" in the past that I've seen, the error has been far away from the place where it was noticed. That is why I suggest you get that #DB working, so you can have the processor test for you. That is going to be easier that fishing in the fog and hoping to get lucky.mrjbom wrote:The only logical guess is that ret passes control to some area of memory where commands are located that miraculously damage the data.
That is correct, that is the operation of ret. It is, essentially "pop eip". And call is "push next; jmp target; next:". Only the "next" is calculated at runtime from EIP. And yes, stack buffer overflow can result in diverting control flow such that you jump to arbitrary places. But you would have to be really unlucky to find the one value to write to the stack such ret corrupts your process list. Usually it would just crash.mrjbom wrote:As far as I know, ret takes the return address from the stack, I decided to try filling the stack with some values, but it didn't bring any results.
Since ret is just "pop EIP", you must write your return address to that memory location where ESP ends up pointing when the "ret" is run.mrjbom wrote:But I couldn't figure out where in the stack I should write my return address.
Carpe diem!
Re: An exception damages data
I tried using the code you suggested, but the exception still doesn't work.nullplan wrote:Well, maybe because that is a dead store. Or maybe the compiler allocates "a" into some register and will only spill it later. For things like that, I have set and read functions in my io.S that work exactly like the in and out functions, but for memory space instead of I/O space:mrjbom wrote:I expect a #DB exception to be thrown when writing a new value, but this does not happen.Code: Select all
void setl(uint32_t*, uint32_t);
Then you can force the write to occur with "setl(&a, 123)". And that really should trap with #DB in your case. Of course, this is for 64-bit mode, you would need something likeCode: Select all
.global setl .type setl, @function setl: movl %esi, (%rdi) retq .size setl, .-setl
Code: Select all
.global setl, .type setl, @function setl: movl 4(%esp), %eax movl 8(%esp), %ecx movl %ecx, (%eax) retl .size setl,.-setl
In addition, I found that DR0 is reset to zero.
Code: Select all
//write addr of 'a' to dr0
__asm__ volatile ("mov %%dr0, %0" :: "r" (&a));
//read dr0
__asm__ volatile ("mov %0, %%dr0" : "=r" (dr0));
serial_printf("dr0 = 0x%x\n", dr0); //0
I run qemu without gdb and nothing should overwrite debug registers.
Re: An exception damages data
That is correct, that is the operation of ret. It is, essentially "pop eip". And call is "push next; jmp target; next:". Only the "next" is calculated at runtime from EIP. And yes, stack buffer overflow can result in diverting control flow such that you jump to arbitrary places. But you would have to be really unlucky to find the one value to write to the stack such ret corrupts your process list. Usually it would just crash.mrjbom wrote:As far as I know, ret takes the return address from the stack, I decided to try filling the stack with some values, but it didn't bring any results.
Since ret is just "pop EIP", you must write your return address to that memory location where ESP ends up pointing when the "ret" is run.[/quote]mrjbom wrote:But I couldn't figure out where in the stack I should write my return address.
I tried doing this in the thread function:
Code: Select all
void task01()
{
serial_printf("I'm thread #1\n");
//scheduler_thread_exit_current();
__asm__ volatile ("mov %0, %%ecx"::"a"(&task_switch));
return;
}