Ok. Thanks. For enthusiast level OS without third party support, I suppose the options for what to do (log, panic, ignore) can be given to the user and set for each known NMI type separately in configuration somewhere, like what Linux does. Also, the OS could print message on the screen and ask the user, if they are willing to take chances and continue work.Brendan wrote:My approach is for the micro-kernel to ask a motherboard driver what to do during boot ("always ignore", "always kernel panic", "ask motherboard driver what to do with each NMI"); where if the motherboard driver selects the last option and an NMI happens the motherboard driver can do anything it likes (including nothing) before telling the kernel to ignore the NMI, or the motherboard driver can just tell the kernel to panic for that NMI (and can provide a more specific reason).
Context Switch and Paging - Does IRET Care About Paging?
Re: Context Switch and Paging - Does IRET Care About Paging?
Re: Context Switch and Paging - Does IRET Care About Paging?
One (the only, if MCE is present?) important use case of NMIs is watchdog timers - using NMIs your system is able to report progress even if it has to block IRQs for extended periods of time. That is a very useful feature once you start debugging SMP lockups - it has saved me numerous long debugging sessions.
In this context, it would be interesting how CLI performs compared to MOV CR8. If the MOV is sufficiently fast to replace most uses of CLI in the kernel, this feature could be implemented, with less hassles, using a "soft NMI" mechanism similar to what Brendan suggested.
In this context, it would be interesting how CLI performs compared to MOV CR8. If the MOV is sufficiently fast to replace most uses of CLI in the kernel, this feature could be implemented, with less hassles, using a "soft NMI" mechanism similar to what Brendan suggested.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: Context Switch and Paging - Does IRET Care About Paging?
Any ideas why this is getting messed up? Some type of page fault is happening. Memory constants prefixed with SCM_ are globally mapped in the kernel's page directory and all process's page directories.
All emulators crash and Bochs gives me this:
All emulators crash and Bochs gives me this:
Code: Select all
00049502853i[CPU0 ] | EAX=83e58955 EBX=00004004 ECX=00000000 EDX=00000000
00049502853i[CPU0 ] | ESP=ffc01f88 EBP=00d92000 ESI=00000000 EDI=00000000
00049502853i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df if tf sf ZF af PF cf
00049502853i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00049502853i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 ffffffff 1 1
00049502853i[CPU0 ] | DS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00049502853i[CPU0 ] | SS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00049502853i[CPU0 ] | ES:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00049502853i[CPU0 ] | FS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00049502853i[CPU0 ] | GS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00049502853i[CPU0 ] | EIP=ffc04068 (ffc04068)
00049502853i[CPU0 ] | CR0=0xe0000031 CR2=0x00507848
00049502853i[CPU0 ] | CR3=0x00d86000 CR4=0x00000000
00049502853i[CPU0 ] 0x00000000ffc04068>> iret : CF
00049502853p[CPU0 ] >>PANIC<< exception(): 3rd (14) exception with no resolution
Code: Select all
; This code is globally mapped to the address SCM_TASK_START
align 0x1000
.startNewUserTask:
mov ax, USER_DATASEG
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
; Copy stack to this task's kernel stack
push KPROCESS_KSTACK
push DWORD [ebx+KTask.owner]
call DWORD [KernelGetInfoForProcess]
mov edi, eax
add edi, 0x1000 - StateInfo_size
mov eax, esp
push StateInfo_size
push eax
push edi
call DWORD [_MemCopy]
; Switch CR3 and switch stacks
push KPROCESS_CR3
push DWORD [ebx+KTask.owner]
call DWORD [KernelGetInfoForProcess]
mov edi, eax
mov cr3, eax
mov esp, SCM_KERNEL_STACK + 0x1000 - StateInfo_size
mov eax, esp
push USER_DATASEG
push DWORD [eax+StateInfo.esp]
push 0x200 ; No flags just interrupt enable
push USER_CODESEG
push DWORD [eax+StateInfo.eip]
mov ebp, DWORD [eax+StateInfo.ebp]
mov eax, DWORD [eax+StateInfo.eip]
mov eax, DWORD [eax] ; Just to see value references correct data in Bochs, which it does
xor ecx, ecx
xor edx, edx
xor esi, esi
xor edi, edi
iret
Re: Context Switch and Paging - Does IRET Care About Paging?
I rewrote the whole mapping system, but iret here now only seems to work when everything is mapped in.
I can't just have the EXE mapped in and a user stack and all the globally mapped things and a bunch of non-present spaces between pages, there has to be a 1:1 map between at least 0 - and a bit after the end of the EXE image or it doesn't work.
E.g. with the 0x6000 byte EXE loaded at 0xD7F000
Why is that???
I can't just have the EXE mapped in and a user stack and all the globally mapped things and a bunch of non-present spaces between pages, there has to be a 1:1 map between at least 0 - and a bit after the end of the EXE image or it doesn't work.
E.g. with the 0x6000 byte EXE loaded at 0xD7F000
Code: Select all
mov edx, DWORD [esi+PEOptionalHeader.ImageBase]
add edx, DWORD [esi+PEOptionalHeader.SizeOfImage]
add edx, 0x16000 ; Must be this or higher, or won't work. Why?
push 1 ; User
push edx ; Size
push 0 ; Virt
push 0 ; Phys
push ebx ; Page Directory
call MapAddressSpace
Re: Context Switch and Paging - Does IRET Care About Paging?
Are you setting the user stack pointer at the end of the stack block or the start... I see that you are setting the kernel stack pointer correctly, but you may have messed it up for the user one.rwosdev wrote:Why is that???
Re: Context Switch and Paging - Does IRET Care About Paging?
I didn't include it in that code above but even when I link in the user stack (from the bottom) it still crashes.
So I map kernel stack, user stack, interrupt handler stubs (which work), PE executable, anything I'm missing?
So I map kernel stack, user stack, interrupt handler stubs (which work), PE executable, anything I'm missing?
Re: Context Switch and Paging - Does IRET Care About Paging?
Well, I suppose, you have mapped the kernel code as well. Aside from that, nothing that I can think of at the moment.rwosdev wrote:I didn't include it in that code above but even when I link in the user stack (from the bottom) it still crashes.
So I map kernel stack, user stack, interrupt handler stubs (which work), PE executable, anything I'm missing?
But note that since you get triple fault, the problem is not only in your user mapping. If it was, your page fault handler would have been called instead. But you get double and then triple fault, which means that your kernel mapping is problematic. You may consider keeping your entire kernel address space for the time being, until you resolve the user mode issue.
In that regard, you could try to use the Bochs debugger. Check out the wiki here and see if it helps. There is an option for debugging triple faults there.
Re: Context Switch and Paging - Does IRET Care About Paging?
I checked my mapping code. Doing this:
Should have the same effect as mapping 0x0-0xFFFFFFFF directly, yet causes the user program to crash on launch, so as you suggested, there is at least a problem there. I'm going to write the mapping code in C instead of assembly and see if I can spot the actual issue. Usually the longer I spend on a problem the more pathetic it turns out to be
Code: Select all
push 1
push 0x1000
push 0
push 0
push eax
call MapAddressSpace
push 1
push 0xFFFFFFFF - 0x1000
push 0x1000
push 0x1000
push eax
call MapAddressSpace
Re: Context Switch and Paging - Does IRET Care About Paging?
So I wrote it all in C to see where I was going wrong, made some changes and put them back in assembly and the mapping seems to work better now.
If I also identity map my GDT to every process, Bochs gives a different error:
The thread I'm trying to run is just jmp $, all the segments are now usermode, it makes no kernel accesses why am I still getting a fault?
If I also identity map my GDT to every process, Bochs gives a different error:
Code: Select all
00059503701i[CPU0 ] | EAX=fbfbfbfb EBX=00000000 ECX=00000000 EDX=00000000
00059503701i[CPU0 ] | ESP=00d9b000 EBP=00d9b000 ESI=00000000 EDI=00000000
00059503701i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af pf cf
00059503701i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00059503701i[CPU0 ] | CS:001b( 0003| 0| 3) 00000000 ffffffff 1 1
00059503701i[CPU0 ] | DS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059503701i[CPU0 ] | SS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059503701i[CPU0 ] | ES:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059503701i[CPU0 ] | FS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059503701i[CPU0 ] | GS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059503701i[CPU0 ] | EIP=00d814b6 (00d814b6)
00059503701i[CPU0 ] | CR0=0xe0000031 CR2=0x00509040
00059503701i[CPU0 ] | CR3=0x00d8d000 CR4=0x00000000
00059503701i[CPU0 ] 0x0000000000d814b6>> jmp .-2 (0x00d814b6) : EBFE
00059503701p[CPU0 ] >>PANIC<< exception(): 3rd (14) exception with no resolution
Re: Context Switch and Paging - Does IRET Care About Paging?
Right. The GDT, IDT, and LDT must be mapped. The TSS must also be mapped.rwosdev wrote:If I also identity map my GDT to every process, Bochs gives a different error
Re: Context Switch and Paging - Does IRET Care About Paging?
Okay, I've now mapped the IDT and TSS (don't have an LDT). Still crashes but CR2 is different on fault and the address is much closer to the executable.
Anything else I'm missing? So far I've mapped interrupt stubs, user task start & resume, GDT, IDT, TSS, process's kernel stack and user stack
Code: Select all
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
00059813603i[CPU0 ] | ESP=00000000 EBP=00000000 ESI=00000000 EDI=00000000
00059813603i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af pf cf
00059813603i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00059813603i[CPU0 ] | CS:001b( 0003| 0| 3) 00000000 ffffffff 1 1
00059813603i[CPU0 ] | DS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059813603i[CPU0 ] | SS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059813603i[CPU0 ] | ES:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059813603i[CPU0 ] | FS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059813603i[CPU0 ] | GS:0023( 0004| 0| 3) 00000000 ffffffff 1 1
00059813603i[CPU0 ] | EIP=00000000 (00000000)
00059813603i[CPU0 ] | CR0=0xe0000031 CR2=0x00d80008
00059813603i[CPU0 ] | CR3=0x00d8f000 CR4=0x00000000
00059813603i[CPU0 ] 0x0000000000000000: (instruction unavailable) page not present
00059813603p[CPU0 ] >>PANIC<< exception(): 3rd (14) exception with no resolution
Re: Context Switch and Paging - Does IRET Care About Paging?
Hi,
It shouldn't be too hard to put a breakpoint (e.g. the "xchg ebx,ebx" magic breakpoint) just before the "jmp $" and use the debugger to inspect the TSS, IDT, GDT (e.g. make sure they're mapped into the virtual address space properly, etc); and should be easy for you to figure out what the problem is.
EDIT: I think you changed you post while I was typing..
From the latest Bochs log; you should be able to put a breakpoint just before the return to user-space and single step (while inspecting the stack) to determine why it ended up with "EIP=0x00000000".
Cheers,
Brendan
If you're doing a "jmp $" in user-space, then something must be interrupting it (an IRQ), and the page fault/s must be trigged by CPU trying to access things it needs to start the interrupt handler/s. The things CPU needs to start the interrupt handler/s are the TSS, the IDT, the GDT, then the kernel stack (pointed to by the "SS0:ESP0" fields of the TSS).rwosdev wrote:Okay, I've now mapped the IDT and TSS (don't have an LDT). Still crashes but CR2 is different on fault and the address is much closer to the executable.
Anything else I'm missing? So far I've mapped interrupt stubs, user task start & resume, GDT, IDT, TSS, process's kernel stack and user stackCode: Select all
CR0=0xe0000031 CR2=0x00d80008 00059867040i[CPU0 ] | CR3=0x00d8f000 CR4=0x00000000 00059867040i[CPU0 ] 0x0000000000d834b6>> jmp .-2 (0x00d834b6) : EBFE
It shouldn't be too hard to put a breakpoint (e.g. the "xchg ebx,ebx" magic breakpoint) just before the "jmp $" and use the debugger to inspect the TSS, IDT, GDT (e.g. make sure they're mapped into the virtual address space properly, etc); and should be easy for you to figure out what the problem is.
EDIT: I think you changed you post while I was typing..
From the latest Bochs log; you should be able to put a breakpoint just before the return to user-space and single step (while inspecting the stack) to determine why it ended up with "EIP=0x00000000".
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Context Switch and Paging - Does IRET Care About Paging?
Just got it! Issue was I wasn't mapping in the TSS per process, only once on kernel initialization.
Many thanks simeonz and Brendan. Though I have a feeling something similar is going to come up again, hopefully not too badly
Many thanks simeonz and Brendan. Though I have a feeling something similar is going to come up again, hopefully not too badly
Re: Context Switch and Paging - Does IRET Care About Paging?
I rewrote all this work in a cleaner fashion and improved interrupt stubs. I've got user mode programs loaded in a totally virtual address space now, and can relocate them anywhere. Tested rigorously, seems solid with no signs of the problem re-occurring.
I've dabbled with OS development and thinking about concepts on and off for about 10 years but never made a serious effort until the past year. This to me is a real milestone, thanks so much guys, esp. simeonez and Brendan for your help!
I've dabbled with OS development and thinking about concepts on and off for about 10 years but never made a serious effort until the past year. This to me is a real milestone, thanks so much guys, esp. simeonez and Brendan for your help!