Page 1 of 1

[SOLVED] Interrupt Occurring During User-Mode Causing Crash

Posted: Mon May 28, 2018 2:37 pm
by rwosdev
Spent a few hours on this, just started usermode tasks in my OS (32-bit).

I'm fairly sure my structures are OK as I've been doing ring 0 task switching using PIT for about 2 months. TSS structure matches the one on the Wiki.

I can jump into user-mode just fine, it's just when an interrupt occurs during user-mode (PIT, exception etc) the OS crashes.

Bochs reports the following error:

Code: Select all

00072527377e[CPU0  ] interrupt(): SS selector null
00072527377e[CPU0  ] interrupt(): SS selector null
00072527377i[CPU0  ] CPU is in protected mode (active)
00072527377i[CPU0  ] CS.mode = 32 bit
00072527377i[CPU0  ] SS.mode = 32 bit
00072527377i[CPU0  ] EFER   = 0x00000000
00072527377i[CPU0  ] | EAX=0062bf80  EBX=00004108  ECX=00015000  EDX=00000000
00072527377i[CPU0  ] | ESP=00004a10  EBP=00004a10  ESI=00e2f000  EDI=00e318ae
00072527377i[CPU0  ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf ZF af PF cf
00072527377i[CPU0  ] | SEG sltr(index|ti|rpl)     base    limit G D
00072527377i[CPU0  ] |  CS:0023( 0004| 0|  3) 00000000 ffffffff 1 1
00072527377i[CPU0  ] |  DS:002b( 0005| 0|  3) 00000000 ffffffff 1 1
00072527377i[CPU0  ] |  SS:002b( 0005| 0|  3) 00000000 ffffffff 1 1
00072527377i[CPU0  ] |  ES:002b( 0005| 0|  3) 00000000 ffffffff 1 1
00072527377i[CPU0  ] |  FS:002b( 0005| 0|  3) 00000000 ffffffff 1 1
00072527377i[CPU0  ] |  GS:002b( 0005| 0|  3) 00000000 ffffffff 1 1
00072527377i[CPU0  ] | EIP=0005b196 (0005b196)
00072527377i[CPU0  ] | CR0=0xe0000031 CR2=0x00000000
00072527377i[CPU0  ] | CR3=0x0062d000 CR4=0x00000000
Here's my code for jumping to user mode (which runs perfect with interrupts disabled):

Code: Select all

.startNewUserTask:
	mov ax, USER_DATASEG ; USER_DATASEG = 0x28 | 11b
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
		
	mov eax, esp
	push USER_DATASEG
	push DWORD [esp+StateInfo.esp]
	pushf
	or DWORD [esp], 0x200 ; Enable interrupts
	push USER_CODESEG ; USER_CODESEG = 0x23 | 11b
	push DWORD [eax+StateInfo.eip]
    
	mov ebp, DWORD [eax+StateInfo.ebp]
	mov eax, 0
	
	iret
	
TSS code that is executed right after loading Kernel and drivers (only sticking to one core for now):

Code: Select all

.setupTSS:
	mov DWORD [TSS0+X86TSS.ebp], ebp
	mov DWORD [TSS0+X86TSS.esp0], esp
	mov eax, ss
	mov WORD [TSS0+X86TSS.ss0], DATASEG
	mov WORD [TSS0+X86TSS.ss], USER_DATASEG
	mov eax, cr3
	mov DWORD [TSS0+X86TSS.cr3], eax
	
	mov ax, GDT_TSS0_ID | 11b
	ltr ax

	; Done, go idle and wait for interrupts.
	sti
	
.idle:
	hlt
	jmp .idle
	
Relevant portions of GDT:

Code: Select all

GDT_TSS0_ID			EQU 0x18
gdt_tss0:
	.segLimitLow 	dw 0
	.baseAddrLow 	dw 0
	.baseAddrHiLow	db 0
	.atts			db 11101001b
	.segLimitAtts2	db 10010000b ; Must have 0s for OR in setup
	.baseAddrHiHi	db 0
	
USER_CODESEG		EQU 0x20 | 11b ; RPL of 3 for usermode and higher
gdt_userCodeSegment:
	.segLimitLow 	dw 0xFFFF
	.baseAddrLow 	dw 0x0000
	.baseAddrHiLow	db 0x00
	.atts			db 11111010b
	.segLimitAtts2	db 11011111b
	.baseAddrHiHi	db 0x00
	
USER_DATASEG		EQU 0x28 | 11b ; RPL of 3 for usermode and higher
gdt_userDataSegment:
	.segLimitLow 	dw 0xFFFF
	.baseAddrLow 	dw 0x0000
	.baseAddrHiLow	db 0x00
	.atts			db 11110010b
	.segLimitAtts2	db 11001111b
	.baseAddrHiHi	db 0x00

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Mon May 28, 2018 2:49 pm
by alexfru
I'd double check TSS and TR. You may have screwed up SS0:ESP0, which would explain your

Code: Select all

00072527377e[CPU0  ] interrupt(): SS selector null
Check the TSS size, member alignment and padding.

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Mon May 28, 2018 4:56 pm
by rwosdev
I've tried 2 different TSS structures which match the one on the wiki and the problem still occurs (see below).

GDT and TSS GDT entry fixup code, executed straight after booting kernel (limit is the size in bytes -1 correct?):

Code: Select all

.setupGDT:
	lgdt [gdt_info]
	mov eax, 0x10
	mov ds, eax
	mov es, eax
	mov fs, eax
	mov gs, eax
	mov ss, eax
	
.fixTSS0:
	mov eax, TSS0_INFO.base
	mov WORD [gdt_tss0.baseAddrLow], ax
	shr eax, 16
	mov BYTE [gdt_tss0.baseAddrHiLow], al
	mov BYTE [gdt_tss0.baseAddrHiHi], ah
	
	mov eax, TSS0_INFO.limit
	mov WORD [gdt_tss0.segLimitLow], ax
	shr eax, 16
	and al, 00001111b
	or BYTE [gdt_tss0.segLimitAtts2], al
	
	jmp 0x08:Setup

...
align 8
TSS0:
	times (X86TSS_size) db 0
	
TSS0_INFO:
	.base		EQU TSS0
	.limit		EQU TSS0_INFO - TSS0 - 1
	
	
TSS code straight after completing Kernel load and dynamically loading necessary drivers:

Code: Select all

.setupTSS:
	mov DWORD [TSS0+X86TSS.ebp], ebp
	mov DWORD [TSS0+X86TSS.esp0], esp
	mov WORD [TSS0+X86TSS.ss0], DATASEG
	mov WORD [TSS0+X86TSS.ss], USER_DATASEG
	mov eax, cr3
	mov DWORD [TSS0+X86TSS.cr3], eax
	
	mov ax, GDT_TSS0_ID | 11b
	ltr ax
My original TSS structure I think I wrote myself:

Code: Select all

struc X86TSS
	.prevTaskLink	resw 1
	.reserved1		resw 1
	
	.esp0			resd 1
	.ss0			resw 1
	.reserved2		resw 1
	
	.esp1			resd 1
	.ss1			resw 1
	.reserved3		resw 1
	
	.esp2			resd 1
	.ss2			resw 1
	.reserved4		resw 1
	
	.cr3			resd 1
	.eip			resd 1
	.efl			resd 1
	.eax			resd 1
	.ecx			resd 1
	.edx			resd 1
	.ebx			resd 1
	.esp			resd 1
	.ebp			resd 1
	.esi			resd 1
	.edi			resd 1
	
	.es				resw 1
		.reserved5				resw 1
	.cs				resw 1
		.reserved6				resw 1
	.ss				resw 1
		.reserved7				resw 1
	.ds				resw 1
		.reserved8				resw 1
	.fs				resw 1
		.reserved9				resw 1
	.gs				resw 1
		.reserved10				resw 1
	                     
	.ldtSegSelector	resw 1
	.reserved11		resw 1
	                     
	.debugTrapFlag	resw 1
	.ioMapBaseAddr	resw 1
endstruc
Alternative one I just copied from here: http://skelix.net/skelixos/tutorial06_en.html

Code: Select all

struc X86TSS
	.back_link resd 1

	.esp0 resd 1
	.ss0 resd 1
	.esp1 resd 1
	.ss1 resd 1
	.esp2 resd 1
	.ss2 resd 1
	.cr3 resd 1
	.eip resd 1
	.efl resd 1

	.eax resd 1
	.ecx resd 1
	.edx resd 1
	.ebx resd 1

	.esp resd 1
	.ebp resd 1

	.esi resd 1
	.edi resd 1

	.es resd 1
	.cs resd 1
	.ss resd 1
	.ds resd 1
	.fs resd 1
	.gs resd 1

	.ldt resd 1
	.trace_bitmap resd 1
endstruc

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Mon May 28, 2018 6:55 pm
by alexfru
Can you disable ring3 code and print out the TSS after a few seconds of execution? I wonder if you corrupt it somehow.

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Mon May 28, 2018 8:02 pm
by rwosdev
Yeah there does appear to be some corruption happening. I disabled the usermode task so usermode-related code is never branched to, but I discovered I only need to do this modified (incomplete) TSS code, without even an LTR instruction, to show corruption:

Code: Select all

mov DWORD [TSS0+X86TSS.esp0], 0

	mov WORD [TSS0+X86TSS.ss0], DATASEG
	mov WORD [TSS0+X86TSS.ss], USER_DATASEG

	mov eax, 0
	mov ax, WORD [TSS0+X86TSS.ss0]
	PRINT_HEX eax, 0x07
	PRINT_CHAR ' '
	mov eax, DWORD [TSS0+X86TSS.esp0]
	PRINT_HEX eax, 0x07
	call PrintNewLine
	
So I print ss0 and esp0 with a space. It should show:

Code: Select all

00000010 1234ABCD
But actually shows:

Code: Select all

0000002B 0000002B
Which of course is USER_DATASEG printed twice. The only thing running once interrupts are enabled is an idle thread kernel task in ring0.

Makes no sense, this has got to be some stupid oversight lol, the ones I spend all day on usually are :roll:

I'll look into it further

Edit: I printed the addresses of these two separate pieces of data

Code: Select all

	mov eax, 0
	mov eax, TSS0+X86TSS.ss0
	PRINT_UINT_HEX eax, 0x07
	PRINT_CHAR ' '
	mov eax, TSS0+X86TSS.esp0
	PRINT_UINT_HEX eax, 0x07
	call PrintNewLine
	
They both come up the same. This is very odd as I rely on similar code for other functions in my OS which has worked fine for months. The problem might be something to do with my linker and/or loader both of which I wrote

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Tue May 29, 2018 2:02 am
by nullplan
Hi,

two things: Unless you are using hardware task switching, nothing about your TSS matters except SS0 and ESP0 (and the I/O port permission bitmap, if you are using it). And are you sure your TSS segment descriptor in the GDT is set up correctly? Because you show it to be set to base 0, length 0, and that's too short for a TSS, and probably not the right address for it either. In fact, if you value your sanity, you will keep address 0 unmapped for every process ever. Else nullpointer bugs will eat your soul.

Ciao,
Markus

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Tue May 29, 2018 9:39 am
by rwosdev
Hi Markus, I didn't include the TSS fixup code in my first post, but I did put it in the second post.

To me, this definitely seems (at least partly) like an issue with my PE linker, I'll post back when I've fixed it

Cheers

Re: Interrupt Occurring During User-Mode Causing Crash

Posted: Tue May 29, 2018 10:43 am
by rwosdev
Problem seems to be fixed, it was a problem with my linker, nothing was wrong with the original TSS

Btw, another issue with this, but causing a race condition, was using a Task Gate for PIT/RTC Clock etc to specify my interrupt handlers. Use an Interrupt Gate so the processor disables interrupts immediately before going to your handler, then they get re-enabled on the iret.
See the answer here for more info: https://stackoverflow.com/questions/255 ... 8#25543528

Cheers guys