Support SSE extension in my OS
Support SSE extension in my OS
Hi.
I have a minimal 32-bit OS completely work in kernel mode (no user mode implemented).
I recently came across using SSE usage in assembly for my OS. It means that I had used a lot SIMD operations on the same physical machine
on Windows and Linux OSs, therefore, I am aware that my CPU supports the SSE extension (and not SSE2, 3, 4, AVX, etc.).
I have a piece of code in NASM that as soon as reaching to the first movaps xmm0, [SOME_16_BYTE_ALIGNED_MEMORY], generates "General Fault".
Apparently, as expected, my OS does not support SSE by default (is this right?) and should be activated. In OsDev wiki link: https://wiki.osdev.org/SSE, there is a whole explanation about it, but since I am new in this regard, I would like to ask some questions.
- My OS does not support user mode, as I said, so, no task switch. Then what is the deal with FXSAVE and FXRSTOR?
- Could someone explain (or possibly list) the bare minimum MUSTs to add the SSE support, while your OS is all in kernel mode?
I really appreciate any explanations.
Best regards.
Iman.
I have a minimal 32-bit OS completely work in kernel mode (no user mode implemented).
I recently came across using SSE usage in assembly for my OS. It means that I had used a lot SIMD operations on the same physical machine
on Windows and Linux OSs, therefore, I am aware that my CPU supports the SSE extension (and not SSE2, 3, 4, AVX, etc.).
I have a piece of code in NASM that as soon as reaching to the first movaps xmm0, [SOME_16_BYTE_ALIGNED_MEMORY], generates "General Fault".
Apparently, as expected, my OS does not support SSE by default (is this right?) and should be activated. In OsDev wiki link: https://wiki.osdev.org/SSE, there is a whole explanation about it, but since I am new in this regard, I would like to ask some questions.
- My OS does not support user mode, as I said, so, no task switch. Then what is the deal with FXSAVE and FXRSTOR?
- Could someone explain (or possibly list) the bare minimum MUSTs to add the SSE support, while your OS is all in kernel mode?
I really appreciate any explanations.
Best regards.
Iman.
Re: Support SSE extension in my OS
If you have single threaded OS you don't need to bother with FXSAVE and FXRSTOR.
I you have threads (doesn't matter if they are usermode threads or not) you have to save and restore FPU/SSE state on thread switch (just like any other registers). And that's where FXSAVE/FXRSTOR pair is used. If you don't care much about performance then save/restore FPU/SSE state on every switch. Done.
I you want something more intelligent, you have to have working interrupt/exception handling in place. With that you can simply set CR0.TS bit on every task switch and keep track of the last thread that saved its FPU/SSE state. Then if some thread needs to use FPU/SSE instruction, it will raise #NM exception. In its handler you clear CR0.TS bit, save FPU/SSE state (using FXSAVE) of your last thread that used FPU/SSE (if any), restore FPU/SSE state of current thread (using FXRSTOR) and mark current thread as last thread that used FPU/SSE. This way you only save and restore FPU/SSE state when absolutely needed. And it makes a difference, since FXSAVE/FXRSTOR buffer is 512 bytes in size.
There is how I initialize FPU enable SSE.
And somewhere in data segment:
I you have threads (doesn't matter if they are usermode threads or not) you have to save and restore FPU/SSE state on thread switch (just like any other registers). And that's where FXSAVE/FXRSTOR pair is used. If you don't care much about performance then save/restore FPU/SSE state on every switch. Done.
I you want something more intelligent, you have to have working interrupt/exception handling in place. With that you can simply set CR0.TS bit on every task switch and keep track of the last thread that saved its FPU/SSE state. Then if some thread needs to use FPU/SSE instruction, it will raise #NM exception. In its handler you clear CR0.TS bit, save FPU/SSE state (using FXSAVE) of your last thread that used FPU/SSE (if any), restore FPU/SSE state of current thread (using FXRSTOR) and mark current thread as last thread that used FPU/SSE. This way you only save and restore FPU/SSE state when absolutely needed. And it makes a difference, since FXSAVE/FXRSTOR buffer is 512 bytes in size.
There is how I initialize FPU enable SSE.
Code: Select all
# init FPU
fninit
fldcw [fcw]
# enable SSE
mov eax, cr0
and al, ~0x04
or al, 0x22
mov cr0, eax
mov eax, cr4
or ax, 0x600
mov cr4, eax
Code: Select all
fcw: dw 0x037F
Re: Support SSE extension in my OS
pvc wrote:If you have single threaded OS you don't need to bother with FXSAVE and FXRSTOR.
I you have threads (doesn't matter if they are usermode threads or not) you have to save and restore FPU/SSE state on thread switch (just like any other registers). And that's where FXSAVE/FXRSTOR pair is used. If you don't care much about performance then save/restore FPU/SSE state on every switch. Done.
I you want something more intelligent, you have to have working interrupt/exception handling in place. With that you can simply set CR0.TS bit on every task switch and keep track of the last thread that saved its FPU/SSE state. Then if some thread needs to use FPU/SSE instruction, it will raise #NM exception. In its handler you clear CR0.TS bit, save FPU/SSE state (using FXSAVE) of your last thread that used FPU/SSE (if any), restore FPU/SSE state of current thread (using FXRSTOR) and mark current thread as last thread that used FPU/SSE. This way you only save and restore FPU/SSE state when absolutely needed. And it makes a difference, since FXSAVE/FXRSTOR buffer is 512 bytes in size.
There is how I initialize FPU enable SSE.And somewhere in data segment:Code: Select all
# init FPU fninit fldcw [fcw] # enable SSE mov eax, cr0 and al, ~0x04 or al, 0x22 mov cr0, eax mov eax, cr4 or ax, 0x600 mov cr4, eax
Code: Select all
fcw: dw 0x037F
Dear pvc.
Thanks a lot for your answer.
It is getting now more clear how to proceed.
Since I run my OS entirely single-thread, all in kernel mode, I won't bother with FXSAVE and FXRSTOR.
In your code, I have a question.
what if I only write this:
Code: Select all
mov eax, cr0
and al, ~0x04
or al, 0x22
mov cr0, eax
mov eax, cr4
or ax, 0x600
mov cr4, eax
Code: Select all
fninit
fldcw [fcw]
Thanks.
Iman.
Re: Support SSE extension in my OS
Some BIOSes/loaders do not always initialize FPU properly. FPU and SSE units share some of the hardware inside a CPU, so it won't hurt to initialize both at the same time (just to be safe).
Re: Support SSE extension in my OS
Dear pvc.pvc wrote:Some BIOSes/loaders do not always initialize FPU properly. FPU and SSE units share some of the hardware inside a CPU, so it won't hurt to initialize both at the same time (just to be safe).
I tried to activate sse by bit manipulations in control registers cr0 and cr4.
But now as soon as it reaches a sse instruction, the whole OS restarts (triple fault?)
What might be the problem?
-
- Member
- Posts: 5580
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Support SSE extension in my OS
There's no need for this, FNINIT already sets FCW to 0x037F. Also, it looks like you're not initializing MXCSR.pvc wrote:Code: Select all
fldcw [fcw] fcw: dw 0x037F
According to Intel and AMD, you still need FXSAVE (or XSAVE/XSAVEOPT/XSAVEC/XSAVES) to read MXCSR_MASK so you can write MXCSR. I suspect it's safe to just write 0x1F80 without checking MXCSR_MASK, but there's no guarantee.iman wrote:Since I run my OS entirely single-thread, all in kernel mode, I won't bother with FXSAVE and FXRSTOR.
Write some exception handlers for your OS. When an unexpected exception occurs, your handler can display the CPU registers on the screen (or output them to a serial port, or however you want to debug things). You can use that information to see if you've written the correct values to CR0 and CR4, and see which instruction caused the exception.iman wrote:But now as soon as it reaches a sse instruction, the whole OS restarts (triple fault?)
Re: Support SSE extension in my OS
I thought fninit initializes MXCSR too. Well, I was wrong.
Re: Support SSE extension in my OS
Octocontrabass wrote:There's no need for this, FNINIT already sets FCW to 0x037F. Also, it looks like you're not initializing MXCSR.pvc wrote:Code: Select all
fldcw [fcw] fcw: dw 0x037F
iman wrote:Since I run my OS entirely single-thread, all in kernel mode, I won't bother with FXSAVE and FXRSTOR.
Is this the right way to initialize MXCSR?Octocontrabass wrote:According to Intel and AMD, you still need FXSAVE (or XSAVE/XSAVEOPT/XSAVEC/XSAVES) to read MXCSR_MASK so you can write MXCSR. I suspect it's safe to just write 0x1F80 without checking MXCSR_MASK, but there's no guarantee.
Code: Select all
segment .code
fxsave [fxsave_region]
STMXCSR [mem]
segment .data
mem: dd 0x1F80
align 16
fxsave_region: TIMES 512 db 0
Re: Support SSE extension in my OS
I have handlers for exceptions such as Divide-by-zero Error, Debug, Non-maskable Interrupt, Invalid Opcode, Double Fault, Segment Not Present, Stack-Segment Fault, General Protection Fault, SIMD Floating-Point Exception, etc., but none of them happened and, as I said, the CPU restarts .Octocontrabass wrote: Write some exception handlers for your OS. When an unexpected exception occurs, your handler can display the CPU registers on the screen (or output them to a serial port, or however you want to debug things). You can use that information to see if you've written the correct values to CR0 and CR4, and see which instruction caused the exception.
-
- Member
- Posts: 5580
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Support SSE extension in my OS
No. You need to mask your desired value of MXCSR with MXCSR_MASK, then load it with LDMXCSR.iman wrote:Is this the right way to initialize MXCSR?
Something like this:
Code: Select all
segment .code
fxsave [fxsave_region]
mov eax, [fxsave_region + 28]
or eax, eax
jnz .next
mov eax, 0xffbf ; the default MXCSR_MASK for CPUs that don't support MXCSR_MASK
.next:
mov [mxcsr_mask], eax ; optional: save MXCSR_MASK for later changes to MXCSR
and eax, 0x1f80 ; your desired value of MXCSR goes here
mov [mem], eax
ldmxcsr [mem]
segment .data
mem: dd 0
mxcsr_mask: dd 0
align 16
fxsave_region: times 512 db 0
It sounds like one of them happened, but your exception handler didn't work.iman wrote:I have handlers for exceptions such as Divide-by-zero Error, Debug, Non-maskable Interrupt, Invalid Opcode, Double Fault, Segment Not Present, Stack-Segment Fault, General Protection Fault, SIMD Floating-Point Exception, etc., but none of them happened and, as I said, the CPU restarts .
Re: Support SSE extension in my OS
I found the source of problem, but no clue why is that.Octocontrabass wrote:It sounds like one of them happened, but your exception handler didn't work.
After writing to FXSAVE and MXCSR as you suggested, I tested my OS with a small test subroutine.
I got a #GP error.
Further analyzing, instead of reading from a memory into xmm0, I only used 128-bit registers and it worked perfectly. In the end I found out that in my NASM code, in data segment, EVEN IF there was "align 16" directive on top of all data area, only the first 16 bytes were aligned and I have to repeat "align 16" directive like below (the 2nd listing) in order to get it worked. To me it looks ugly and I have no clue why is that. Perhaps you can help me figure this out.
listing_1:
Code: Select all
section .data
align 16
_0 : dd 0.0, 0.0, 0.0, 0.0
_1 : dd 1.0, 1.0, 1.0, 1.0
_2 : dd 2.0, 2.0, 2.0, 2.0
section .text
test_sse:
movaps xmm0, [_0] ; it goes well
movaps xmm0, [_1] ; gives #GP
ret
Code: Select all
section .data
align 16
_0 : dd 0.0, 0.0, 0.0, 0.0
align 16
_1 : dd 1.0, 1.0, 1.0, 1.0
align 16
_2 : dd 2.0, 2.0, 2.0, 2.0
section .text
test_sse:
movaps xmm0, [_0] ; it goes well
movaps xmm0, [_1] ; it goes well
ret
Iman.