Inconsistent faults when switching to and from user task
Posted: Wed Mar 20, 2019 1:30 pm
I've been working (a.k. banging my head against a brick wall) on adding x86 user mode to my kernel for the past couple of weeks. I designed the task switching protocol, stack layouts and everything makes sense, but I keep getting inconsistent faults some time after switching to a user task. Switching between kernel tasks works perfectly so I'm baffled as to what could be causing switching to user tasks to be causing faults.
My task switching protocol is as follows:
1. On a timer interrupt, get the next process as per some scheduling policy.
2. If it's a kernel task, then call arch_switch_to_kernel_task (source attached), else call arch_switch_to_user_task
arch_switch_to_kernel_task:
Save current cpu state into process' saved kernel state
Restore next process' CPU state from process' saved kernel state
Return to the address stored at the top of the kernel stack, effectively continuing on from where the task was halted.
arch_switch_to_user_task:
Save current cpu state into process' saved kernel state
Restore next process' CPU state from process' saved kernel state
Return to the address stored at the top of the kernel stack, which eventually finds itself in the irq handler. This pops off the saved user state and enters user mode with an iret
To facilitate this protocol, each task is initialised:
kernel task:
the registers in the saved kernel state are set to 0
stack:
top: exit function address
top - 4: entry function address <-- kernel_state.esp
user task:
the registers in both the kernel state and user state are set to 0
kernel stack:
top: exit function address
top - 4: initial user cpu state
top - 4 - sizeof(arch_cpu_state_t): Unused eax value
top - 8 - sizeof(arch_cpu_state_t): Return address <-- kernel_state.esp
user stack:
top: exit function address <-- user_state.useresp
Sometimes I get a page fault at c0104450 in:
with the CPU state being
Then sometimes it fails at c01095c4 in:
with the CPU state being
This always happens after a seemingly arbitrary number of task switches and it all works perfectly if I make the "cleaner" task a kernel task rather than a user one.
I have a feeling that it could be to do with my setting of ebp/esp, or me missing a part of the switching protocol, but I can't for the life of me figure it out. Does anybody have any idea?
Source for reference:
* Boot file
* Where the switching happens
* Where a process is initialised
* IRQ and ISR handling
* The shceduler
* CPU state definition
My task switching protocol is as follows:
1. On a timer interrupt, get the next process as per some scheduling policy.
2. If it's a kernel task, then call arch_switch_to_kernel_task (source attached), else call arch_switch_to_user_task
arch_switch_to_kernel_task:
Save current cpu state into process' saved kernel state
Restore next process' CPU state from process' saved kernel state
Return to the address stored at the top of the kernel stack, effectively continuing on from where the task was halted.
arch_switch_to_user_task:
Save current cpu state into process' saved kernel state
Restore next process' CPU state from process' saved kernel state
Return to the address stored at the top of the kernel stack, which eventually finds itself in the irq handler. This pops off the saved user state and enters user mode with an iret
To facilitate this protocol, each task is initialised:
kernel task:
the registers in the saved kernel state are set to 0
stack:
top: exit function address
top - 4: entry function address <-- kernel_state.esp
user task:
the registers in both the kernel state and user state are set to 0
kernel stack:
top: exit function address
top - 4: initial user cpu state
top - 4 - sizeof(arch_cpu_state_t): Unused eax value
top - 8 - sizeof(arch_cpu_state_t): Return address <-- kernel_state.esp
user stack:
top: exit function address <-- user_state.useresp
Sometimes I get a page fault at c0104450 in:
Code: Select all
c010444a <linkedlist_size>:
c010444a: 55 push %ebp
c010444b: 89 e5 mov %esp,%ebp
c010444d: 8b 45 08 mov 0x8(%ebp),%eax
c0104450: 8b 40 08 mov 0x8(%eax),%eax
c0104453: 5d pop %ebp
c0104454: c3 ret
Code: Select all
[ERROR] PANIC @ src/arch/x86/idt/exceptions.c:38: Page fault
cs=0x1B ss=0x23 gs=0x23 fs=0x23 es=0x23 ds=0x23
ebp=0xC052D127 esp=0xC052D26F
edi=0x0 esi=0x0 ebx=0x0 edx=0x0 ecx=0x0 eax=0x13FEAC4
int=0xE err=0x4
eip=0xC0104450 ef=0x286 uesp=0xC052D127
cr0=0x80000011 cr2=0x13FEACC cr3=0x400000
Code: Select all
c01095a1 <arch_switch_to_kernel_task>:
c01095a1: fa cli
c01095a2: 57 push %edi
c01095a3: 50 push %eax
c01095a4: 8b 7c 24 0c mov 0xc(%esp),%edi
c01095a8: 8b 44 24 10 mov 0x10(%esp),%eax
c01095ac: 8f 47 2c popl 0x2c(%edi)
c01095af: 8f 47 10 popl 0x10(%edi)
c01095b2: 89 77 14 mov %esi,0x14(%edi)
c01095b5: 89 6f 18 mov %ebp,0x18(%edi)
c01095b8: 89 67 1c mov %esp,0x1c(%edi)
c01095bb: 89 5f 20 mov %ebx,0x20(%edi)
c01095be: 89 57 24 mov %edx,0x24(%edi)
c01095c1: 89 4f 28 mov %ecx,0x28(%edi)
c01095c4: 8b 78 10 mov 0x10(%eax),%edi
c01095c7: 8b 70 14 mov 0x14(%eax),%esi
c01095ca: 8b 68 18 mov 0x18(%eax),%ebp
c01095cd: 8b 60 1c mov 0x1c(%eax),%esp
c01095d0: 8b 58 20 mov 0x20(%eax),%ebx
c01095d3: 8b 50 24 mov 0x24(%eax),%edx
c01095d6: 8b 48 28 mov 0x28(%eax),%ecx
c01095d9: 8b 40 2c mov 0x2c(%eax),%eax
c01095dc: 89 25 04 dc 10 c0 mov %esp,0xc010dc04
c01095e2: fb sti
c01095e3: c3 ret
Code: Select all
[ERROR] PANIC @ src/arch/x86/idt/exceptions.c:38: Page fault
cs=0x8 ss=0xC052D603 gs=0x10 fs=0x10 es=0x10 ds=0x10
ebp=0xC052CE53 esp=0xC052CE03
edi=0xC052D603 esi=0x0 ebx=0x0 edx=0x3FDC0 ecx=0x1 eax=0x3FDC0
int=0xE err=0x0
eip=0xC01095C4 ef=0x82 uesp=0xC0104F7B
cr0=0x80000011 cr2=0x3FDD0 cr3=0x400000
I have a feeling that it could be to do with my setting of ebp/esp, or me missing a part of the switching protocol, but I can't for the life of me figure it out. Does anybody have any idea?
Source for reference:
* Boot file
* Where the switching happens
* Where a process is initialised
* IRQ and ISR handling
* The shceduler
* CPU state definition