An exception damages data

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

An exception damages data

Post by mrjbom »

Hi.
I'm working on multitasking.
I have such a simple tasks(task01, task02, task03)
I add these tasks to the list of threads that are executed in turn.
Since "return" cannot normally complete the task, I have the scheduler_thread_exit_current function(), it works as needed and successfully completes the task.
In cases where this function is not present, "return" passes control to the wrong place and a #UD or #GP exception occurs.

I want the process to be forcibly terminated if any of these exceptions occur.
But I found that my process table gets corrupted after calling an exception.

Describe how the exception works.
When exceptions occur, an assembly function(general_protection_fault) is called that calls the handler in C and passes parameters to it.
Control is then passed to the main handler(general_protection_fault_exception).
This is where the error is detected.

The logs show that when the first thread ends(id = 1)(voluntary termination), the process table looks like this.
Then control is passed to thread 3 (id = 3), there is no voluntary termination and #GP occurs, which just prints the process table, the address of the [2] element (thread 3) is incorrect, and its id is displayed incorrectly.
I don't know what could have damaged it.


I also tried browsing memory using qemu. Here's what he showed me(tested on task02 (thread 2), that's why #UD appears here):
You can see that by the time task02() completes, the address data is correct.

Image

But as soon as control is passed to the handler in C(I couldn't debug the assembly header, my debugger can't do it there), the data in this cell damaged.

Image

I believe there are errors in the assembler handler, but I don't see anything problematic there.

What might be the problem?
I would appreciate your help!
User avatar
max
Member
Member
Posts: 616
Joined: Mon Mar 05, 2012 11:23 am
Libera.chat IRC: maxdev
Location: Germany
Contact:

Re: An exception damages data

Post by max »

Hey,
it's really hard to tell what exactly causes your memory to get faulty. We don't know about your memory layout etc after all.
Your process is running with user privilege level I guess? So you should be sure that the process itself does not override the kernel memory/your process table.
If yes, then it would be likely that there is something off with the way you handle exceptions. Did you make sure that the stack is set correctly when the exception occurs? Right when the handler is entered, is your structure already corrupt?
Greets
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: An exception damages data

Post by nullplan »

Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?

Code: Select all

invalid_opcode:
  cli
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 64] ;push cs
  call invalid_opcode_handler
  pop dword [esp + 64] ;pop cs
  pop dword [esp + 32] ;pop eip
  ;return all 32bit registers
  popad
  ;delete eip and cs from stack
  add esp, 8
  sti
  iretd
The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.

So, putting it all together:

Code: Select all

invalid_opcode:
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 40] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ;return all 32bit registers
  popad
  iretd
Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.
Carpe diem!
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: An exception damages data

Post by nullplan »

If I may add one remark: If I were you, I would try to separate the PIC logic from the interrupt logic. That will make it easier to support APIC, once the need arises. For example, my IRQ code looks like this: All IRQ entry points push 127 minus interrupt number (this gives a value between -128 and +127 for all possible values of the interrupt number, thus enabling the use of the short push encoding) and then jump to a common handler. This means, all IRQ entry points fit inside of 8 bytes (since the heavy lifting is done in the common handler), so with a few "align" directives, I can guarantee that they are all exactly 8 bytes apart from one another. Thereby allowing me to register them in a loop from C. The common handler will then subtract another 127 from the top of stack (yielding the negative interrupt number), push all registers, and call a C function. The C function will call a function pointer from a table, where all function pointers are initialized to point to a dummy function. The IRQ code will register its handlers (there are only 2) with the relevant interrupts. The IRQ code itself will also only call an interrupt handler from a table. This way, the PIC stuff is entirely isolated from the interrupt stuff. The PIC code will reserve 16 interrupts en bloc, but if it wasn't called, then only the APIC would reserve anything.

With APIC, you have no way to know at design time how many IRQs you may need. One IOAPIC may support 24 or 32 IRQ lines, but you may have many IOAPICs in your system. Usually you don't, but that means we need the interrupt number as data, so pushing it before transferring to a common handler is the only way to go. And then there's MSI, which means, essentially, that every PCI card can register another batch of interrupts. (With MSI it is only one, but then there's also MSI-X or what it was called). So yeah, separate the PIC code from the interrupt code while things are still simple.
Carpe diem!
User avatar
bellezzasolo
Member
Member
Posts: 110
Joined: Sun Feb 20, 2011 2:01 pm

Re: An exception damages data

Post by bellezzasolo »

nullplan wrote:Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?

Code: Select all

invalid_opcode:
  cli
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 64] ;push cs
  call invalid_opcode_handler
  pop dword [esp + 64] ;pop cs
  pop dword [esp + 32] ;pop eip
  ;return all 32bit registers
  popad
  ;delete eip and cs from stack
  add esp, 8
  sti
  iretd
The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.

So, putting it all together:

Code: Select all

invalid_opcode:
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 40] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ;return all 32bit registers
  popad
  iretd
Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.
A few observations (to OP):
How are you dealing with the error code? You need to pop it before iretd. But, on exception handlers that don't pass an error code, you need to avoid that.

Maintaining 256 different interrupt entry routines is difficult.

PUSHAD is OK, but it's a bit wasteful of stack space, and you don't have that option on x64.

Here's my entire 64 bit interrupt code:

Code: Select all

BITS 64

section .text

save_interrupt_registers:
pop rax
push rcx
push rdx
push r8
push r9
push r10
push r11
jmp rax

restore_interrupt_registers:
pop rax
pop r11
pop r10
pop r9
pop r8
pop rdx
pop rcx
jmp rax

swap_gs_ifpriv:
mov rax, QWORD[rbp+8+0x10]
and rax, 0x3	;CPL
cmp rax, 0
je .next
swapgs
.next:
ret


%macro SAVE_VOLATILE_REGISTERS 0
push rax
call save_interrupt_registers
%endmacro

%macro RESTORE_VOLATILE_REGISTERS 0
call restore_interrupt_registers
pop rax
%endmacro

extern x64_interrupt_dispatcher
extern x64_save_fpu
extern x64_restore_fpu
extern xsavearea_size
extern memset
extern kprintf

;Stack layout:
;Old stack
;____________________
;PADDING
;____________________
;XSAVE
;____________________
;Old RSP
;____________________
save_fpu_interrupt:
pop r9			;Return address
mov rax, xsavearea_size
mov rdx, [rax]	;Size of stack area
mov r8, rdx		;Length parameter for memset
mov rcx, rsp
sub rcx, rdx
and cl, 0xC0	;Align stack
mov rdx, rsp	;Old stack pointer
mov rsp, rcx	;Load new stack pointer
push rdx		;Save old RSP on new stack
push r9			;R9 is volatile, save our return address
;Set to 0
sub rsp, 32
push r8
mov rdx, 0
push rdx
push rcx
call memset
add rsp, 32+24
;Now save the FPU state
mov rax, [qword x64_save_fpu]
call rax
pop r9
jmp r9

restore_fpu_interrupt:
pop r8		;Return address
pop r9		;Old stack pointer
mov rcx, rsp	;XSAVE area
push r8		;R8 is volatile
push r9		;R9 is volatile
mov rax, [qword x64_restore_fpu]
call rax	;Save the state!
pop r9
pop r8		;Restore R8
mov rsp, r9	;Load old stack
jmp r8


;First parameter: error code passed
;Second parameter: 
%macro INTERRUPT_HANDLER 2
global x64_interrupt_handler_%2
x64_interrupt_handler_%2:
%if %1 == 0
push 0			;Dummy error code
%endif
;Stack frame
push rbp
mov rbp, rsp
SAVE_VOLATILE_REGISTERS
;Per CPU information
call swap_gs_ifpriv
call save_fpu_interrupt

;Now we pass the stack interrupt stack and vector
mov rcx, %2
mov rdx, rbp
add rdx, 8

sub rsp, 32
call x64_interrupt_dispatcher
add rsp, 32

call restore_fpu_interrupt
call swap_gs_ifpriv

RESTORE_VOLATILE_REGISTERS

pop rbp
add rsp, 8		;Get rid of error code
iretq
%endmacro

%macro INTERRUPT_HANDLER_BLOCK 3
%assign i %2
%rep %3-%2
INTERRUPT_HANDLER %1, i
%assign i i+1
%endrep
%endmacro

INTERRUPT_HANDLER 0, 0
INTERRUPT_HANDLER 0, 1
INTERRUPT_HANDLER 0, 2
INTERRUPT_HANDLER 0, 3
INTERRUPT_HANDLER 0, 4
INTERRUPT_HANDLER 0, 5
INTERRUPT_HANDLER 0, 6
INTERRUPT_HANDLER 0, 7
INTERRUPT_HANDLER 1, 8
INTERRUPT_HANDLER 0, 9
INTERRUPT_HANDLER 1, 10
INTERRUPT_HANDLER 1, 11
INTERRUPT_HANDLER 1, 12
INTERRUPT_HANDLER 1, 13
INTERRUPT_HANDLER 1, 14
INTERRUPT_HANDLER 0, 15
INTERRUPT_HANDLER 0, 16
INTERRUPT_HANDLER 1, 17
INTERRUPT_HANDLER 0, 18
INTERRUPT_HANDLER 0, 19
INTERRUPT_HANDLER 0, 20
INTERRUPT_HANDLER_BLOCK 0, 21, 30
INTERRUPT_HANDLER 1, 30
INTERRUPT_HANDLER 0, 31
INTERRUPT_HANDLER_BLOCK 0, 32, 256

%macro dq_concat 2
dq %1%2
%endmacro

section .data
global default_irq_handlers
default_irq_handlers:
%assign i 0
%rep 256
dq_concat x64_interrupt_handler_,i
%assign i i+1
%endrep
stack_str: dw __utf16__(`Interrupt frame:\n Error code %x\n Return RIP: %x\n Return CS: %x\n Return RFLAGS: %x\n Return RSP: %x\n Return SS: %x\n`), 0
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

Re: An exception damages data

Post by mrjbom »

nullplan wrote:Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?

Code: Select all

invalid_opcode:
  cli
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 64] ;push cs
  call invalid_opcode_handler
  pop dword [esp + 64] ;pop cs
  pop dword [esp + 32] ;pop eip
  ;return all 32bit registers
  popad
  ;delete eip and cs from stack
  add esp, 8
  sti
  iretd
The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.

So, putting it all together:

Code: Select all

invalid_opcode:
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 40] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ;return all 32bit registers
  popad
  iretd
Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.
I tried to use the code you suggested, but the memory is still corrupted... Suggestions about parameters are really correct, I noticed that I get incorrect cs and eip, now they are correct.

Now it is important for me to decide why my table gets corrupted after an interrupt is triggered.

Do you have any suggestions? I don't even know what might be causing this behavior.
User avatar
bellezzasolo
Member
Member
Posts: 110
Joined: Sun Feb 20, 2011 2:01 pm

Re: An exception damages data

Post by bellezzasolo »

mrjbom wrote:
nullplan wrote:Memory suddenly getting faulty is usually caused by bad interrupt code. So, let's have a look, shall we?

Code: Select all

invalid_opcode:
  cli
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 64] ;push cs
  call invalid_opcode_handler
  pop dword [esp + 64] ;pop cs
  pop dword [esp + 32] ;pop eip
  ;return all 32bit registers
  popad
  ;delete eip and cs from stack
  add esp, 8
  sti
  iretd
The "cli" is unnecessary if you'd just register the invalid opcode exception as an interrupt gate (type 14, IIRC). Similarly, the "sti" just before "iretd" is useless, since as part of its operation, iretd pops flags. Invalid Opcode has no error code, so the EIP and CS are the top of stack on entry. "pushad" adds 32 bytes to the stack, meaning [esp+32] really is EIP. This decreases ESP another 4 bytes, and CS is one byte higher, and that means that CS is at [ESP+40], then, not [ESP+64]. So your "push dword [esp + 64]" is pushing invalid data. At that point, anything more than "ESP + 44" is invalid without first looking at CS, and anything more than "ESP + 52" is invalid in all cases. After that push, the limits increase by another 4 bytes. Your "pop dword [esp + 64]" is therefore writing somewhere into the parent stack frame, and the "pop dword [esp + 32]" is overwriting some register you saved with "pushad". Both of these are corrupting state. Then you popad, writing the changed value back to the registers, and then "add esp, 8". Your stack pointer was pointing at the iret-frame before that, now it's pointing to the EFLAGS in the iret frame. I have no idea how the "iretd" afterwards didn't crash.

So, putting it all together:

Code: Select all

invalid_opcode:
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 40] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ;return all 32bit registers
  popad
  iretd
Looking at the other exception handlers, you can apply similar changes there. Actually, general_protection_fault and page_fault almost work out, except for where you are overwriting part of the "pusha" image with the "pop dword [esp + 32]", which ends up corrupting a register, with unpredictable effects. Just use "add esp, 4" to discard the value from stack.
I tried to use the code you suggested, but the memory is still corrupted... Suggestions about parameters are really correct, I noticed that I get incorrect cs and eip, now they are correct.

Now it is important for me to decide why my table gets corrupted after an interrupt is triggered.

Do you have any suggestions? I don't even know what might be causing this behavior.
It's possible, and pretty simple, to set a memory watchpoint on the address in question.

Write DR0 with the relevant address.
Write (3<<18) | (1<<16) | 1 to DR7 for a 4 byte memory write watchpoint.
And disable it (write 0 to DR7) before the writes you mean to do!
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

Re: An exception damages data

Post by mrjbom »

nullplan wrote: So, putting it all together:

Code: Select all

invalid_opcode:
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 40] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ;return all 32bit registers
  popad
  iretd
Something interesting, in the code of the interrupt handler, I just made a process table output,

Code: Select all

invalid_opcode:
  ;save all 32bit registers
  call scheduler_thread_show_list
  pushad
  push dword [esp + 32] ;push eip
  push dword [esp + 40] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ;return all 32bit registers
  popad
  iretd
and it was damaged at the very beginning of it.
log:

Code: Select all

I'm thread #1 <--- task01

THREAD LIST: <--- start of invalid_opcode: header
thread_list[0] addr = 0x114e000, id = 0
thread_list[1] addr = 0x282, id = 4283691008
thread_list[2] addr = 0x1154000, id = 2
thread_list[3] addr = 0x1158000, id = 3


/--------------------------------------------------\ <--- main C handler invalid_opcode_exception()
invalid_opcode_exception!
cs = 8(0x8), eip = 18153694(0x11500de)
problematic_thread addr = 0x1150000, id = 1

THREAD LIST:
thread_list[0] addr = 0x114e000, id = 0
thread_list[1] addr = 0x282, id = 4283691008
thread_list[2] addr = 0x1154000, id = 2
thread_list[3] addr = 0x1158000, id = 3

\--------------------------------------------------/
It turns out that the handler's code does not damage the table. But the process itself (task01) prints the table correctly. So it gets corrupted when the exception is triggered.

I have no idea what can damage the data table.
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: An exception damages data

Post by nullplan »

Well, this change is invalid as well, since scheduler_thread_show_list is a C function, so is allowed to clobber EAX, ECX, and EDX. Another thing I forgot: You are in 32-bit mode, so segmentation is actually still important. So therefore you need to save DS and ES, and set them to 0x10 both, and restore them afterwards. Also, when the invalid opcode hits, the DF might be set, so it has to be deleted before calling a C function. So:

Code: Select all

invalid_opcode:
  cld ;DF is already saved in interrupt frame
  ;save all 32bit registers
  pushad
  ; save and set segments
  push ds
  push es
  mov ax,0x10
  mov ds,ax
  mov es,ax

  push dword [esp + 40] ;push eip
  push dword [esp + 48] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ; restore segments
  pop es
  pop ds
  ;return all 32bit registers
  popad
  iretd
Carpe diem!
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

Re: An exception damages data

Post by mrjbom »

nullplan wrote:Well, this change is invalid as well, since scheduler_thread_show_list is a C function, so is allowed to clobber EAX, ECX, and EDX. Another thing I forgot: You are in 32-bit mode, so segmentation is actually still important. So therefore you need to save DS and ES, and set them to 0x10 both, and restore them afterwards. Also, when the invalid opcode hits, the DF might be set, so it has to be deleted before calling a C function. So:

Code: Select all

invalid_opcode:
  cld ;DF is already saved in interrupt frame
  ;save all 32bit registers
  pushad
  ; save and set segments
  push ds
  push es
  mov ax,0x10
  mov ds,ax
  mov es,ax

  push dword [esp + 40] ;push eip
  push dword [esp + 48] ;push cs
  call invalid_opcode_handler
  ;delete eip and cs from stack
  add esp, 8
  ; restore segments
  pop es
  pop ds
  ;return all 32bit registers
  popad
  iretd
I realized this is really harder than expected...
But unfortunately and this does not solve, the table is still damaged...
I still don't have even a rough guess as to why this is happening...
It seems that this problem is extremely difficult to solve, perhaps return passing control to a random area damages the data.

I started thinking about whether I can write the return address to the stack when creating an array, so that return works as it should.
Is it possible to do this?
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: An exception damages data

Post by nullplan »

mrjbom wrote:I still don't have even a rough guess as to why this is happening...
Looks like it is debug time. Add a watch point to the table entry once it is initialized. Add a handler for the debug exception that prints where you are. And then wait for the corruption to occur. To add a watchpoint, you write the base address into a debug register, then set the corresponding R/W field in DR7 to 01 and the corresponding LEN field to an encoding of the length. Although you may need three watch points for your structure (apparently, length 4 is the maximum).

You have a list with external storage, so it is possible to corrupt the list by bending the "data" pointer. It is also possible (though unlikely) that the previous node's "next" pointer was corrupted, but in your case, it would have to have been exceptionally lucky that the pointer happened to land somewhere where the "next" pointer went back to the correct node. So either something is overwriting the data pointer or something is corrupting where it is pointing to.

I'm a bit disappointed that x86 doesn't appear to have arbitrary length watch points.
Carpe diem!
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

Re: An exception damages data

Post by mrjbom »

nullplan wrote:
mrjbom wrote:I still don't have even a rough guess as to why this is happening...
Looks like it is debug time. Add a watch point to the table entry once it is initialized. Add a handler for the debug exception that prints where you are. And then wait for the corruption to occur. To add a watchpoint, you write the base address into a debug register, then set the corresponding R/W field in DR7 to 01 and the corresponding LEN field to an encoding of the length. Although you may need three watch points for your structure (apparently, length 4 is the maximum).

You have a list with external storage, so it is possible to corrupt the list by bending the "data" pointer. It is also possible (though unlikely) that the previous node's "next" pointer was corrupted, but in your case, it would have to have been exceptionally lucky that the pointer happened to land somewhere where the "next" pointer went back to the correct node. So either something is overwriting the data pointer or something is corrupting where it is pointing to.

I'm a bit disappointed that x86 doesn't appear to have arbitrary length watch points.
I tried debugging the code.
For the test I have this code:

Code: Select all

uint32_t a = 123;
uint32_t dr7 = 0;
uint32_t dr0 = 0;
//write addr of 'a' to dr0
__asm__ volatile ("mov %%dr0, %0" :: "r" (&a));
__asm__ volatile ("mov %0, %%dr0" : "=r" (dr0));
//read dr7
__asm__ volatile ("mov %0, %%dr7" : "=r" (dr7));
//set 16-17 01b - write to dr7
set_n_bit(dr7, 16, 0);
set_n_bit(dr7, 17, 1);
//set 18-19 lenght 4 bytes - 11b
set_n_bit(dr7, 18, 1);
set_n_bit(dr7, 19, 1);
//write new dr7 value
__asm__ volatile ("mov %%dr7, %0" :: "r" (dr7));
//try...
a = 321;
I expect a #DB exception to be thrown when writing a new value, but this does not happen.
But this is not so important.

I have shown that when trying to print the thread_list values at the very beginning of the Assembly handler they are corrupted.
Also, new problems started to occur during code modification, and now new elements seem to be added to the table.

The only logical guess is that ret passes control to some area of memory where commands are located that miraculously damage the data.

As far as I know, ret takes the return address from the stack, I decided to try filling the stack with some values, but it didn't bring any results.
I tried to investigate the operation of call/ret in emu8086, I realized that if I can properly configure the thread stack, I can force "return" to return the control to the correct place, just like I set the entry point. But I couldn't figure out where in the stack I should write my return address.
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: An exception damages data

Post by nullplan »

mrjbom wrote:I expect a #DB exception to be thrown when writing a new value, but this does not happen.
Well, maybe because that is a dead store. Or maybe the compiler allocates "a" into some register and will only spill it later. For things like that, I have set and read functions in my io.S that work exactly like the in and out functions, but for memory space instead of I/O space:

Code: Select all

void setl(uint32_t*, uint32_t);

Code: Select all

.global setl
.type setl, @function
setl:
    movl %esi, (%rdi)
    retq
.size setl, .-setl
Then you can force the write to occur with "setl(&a, 123)". And that really should trap with #DB in your case. Of course, this is for 64-bit mode, you would need something like

Code: Select all

.global setl,
.type setl, @function
setl:
  movl 4(%esp), %eax
  movl 8(%esp), %ecx
  movl %ecx, (%eax)
  retl
.size setl,.-setl
mrjbom wrote:The only logical guess is that ret passes control to some area of memory where commands are located that miraculously damage the data.
In all cases of "weird memory corruption" in the past that I've seen, the error has been far away from the place where it was noticed. That is why I suggest you get that #DB working, so you can have the processor test for you. That is going to be easier that fishing in the fog and hoping to get lucky.
mrjbom wrote:As far as I know, ret takes the return address from the stack, I decided to try filling the stack with some values, but it didn't bring any results.
That is correct, that is the operation of ret. It is, essentially "pop eip". And call is "push next; jmp target; next:". Only the "next" is calculated at runtime from EIP. And yes, stack buffer overflow can result in diverting control flow such that you jump to arbitrary places. But you would have to be really unlucky to find the one value to write to the stack such ret corrupts your process list. Usually it would just crash.
mrjbom wrote:But I couldn't figure out where in the stack I should write my return address.
Since ret is just "pop EIP", you must write your return address to that memory location where ESP ends up pointing when the "ret" is run.
Carpe diem!
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

Re: An exception damages data

Post by mrjbom »

nullplan wrote:
mrjbom wrote:I expect a #DB exception to be thrown when writing a new value, but this does not happen.
Well, maybe because that is a dead store. Or maybe the compiler allocates "a" into some register and will only spill it later. For things like that, I have set and read functions in my io.S that work exactly like the in and out functions, but for memory space instead of I/O space:

Code: Select all

void setl(uint32_t*, uint32_t);

Code: Select all

.global setl
.type setl, @function
setl:
    movl %esi, (%rdi)
    retq
.size setl, .-setl
Then you can force the write to occur with "setl(&a, 123)". And that really should trap with #DB in your case. Of course, this is for 64-bit mode, you would need something like

Code: Select all

.global setl,
.type setl, @function
setl:
  movl 4(%esp), %eax
  movl 8(%esp), %ecx
  movl %ecx, (%eax)
  retl
.size setl,.-setl
I tried using the code you suggested, but the exception still doesn't work.
In addition, I found that DR0 is reset to zero.

Code: Select all

//write addr of 'a' to dr0
__asm__ volatile ("mov %%dr0, %0" :: "r" (&a));
//read dr0
__asm__ volatile ("mov %0, %%dr0" : "=r" (dr0));
serial_printf("dr0 = 0x%x\n", dr0); //0
Why is this happening?
I run qemu without gdb and nothing should overwrite debug registers.
User avatar
mrjbom
Member
Member
Posts: 322
Joined: Sun Jul 21, 2019 7:34 am

Re: An exception damages data

Post by mrjbom »

mrjbom wrote:As far as I know, ret takes the return address from the stack, I decided to try filling the stack with some values, but it didn't bring any results.
That is correct, that is the operation of ret. It is, essentially "pop eip". And call is "push next; jmp target; next:". Only the "next" is calculated at runtime from EIP. And yes, stack buffer overflow can result in diverting control flow such that you jump to arbitrary places. But you would have to be really unlucky to find the one value to write to the stack such ret corrupts your process list. Usually it would just crash.
mrjbom wrote:But I couldn't figure out where in the stack I should write my return address.
Since ret is just "pop EIP", you must write your return address to that memory location where ESP ends up pointing when the "ret" is run.[/quote]

I tried doing this in the thread function:

Code: Select all

void task01()
{
    serial_printf("I'm thread #1\n");
    //scheduler_thread_exit_current();
    __asm__ volatile ("mov %0, %%ecx"::"a"(&task_switch));
    return;
}
However, return still returns control to an unknown destination.
Post Reply