Page 1 of 1

Support SSE extension in my OS

Posted: Wed Nov 20, 2019 1:34 am
by iman
Hi.

I have a minimal 32-bit OS completely work in kernel mode (no user mode implemented).
I recently came across using SSE usage in assembly for my OS. It means that I had used a lot SIMD operations on the same physical machine
on Windows and Linux OSs, therefore, I am aware that my CPU supports the SSE extension (and not SSE2, 3, 4, AVX, etc.).

I have a piece of code in NASM that as soon as reaching to the first movaps xmm0, [SOME_16_BYTE_ALIGNED_MEMORY], generates "General Fault".

Apparently, as expected, my OS does not support SSE by default (is this right?) and should be activated. In OsDev wiki link: https://wiki.osdev.org/SSE, there is a whole explanation about it, but since I am new in this regard, I would like to ask some questions.

- My OS does not support user mode, as I said, so, no task switch. Then what is the deal with FXSAVE and FXRSTOR?
- Could someone explain (or possibly list) the bare minimum MUSTs to add the SSE support, while your OS is all in kernel mode?

I really appreciate any explanations.

Best regards.
Iman.

Re: Support SSE extension in my OS

Posted: Wed Nov 20, 2019 5:15 am
by pvc
If you have single threaded OS you don't need to bother with FXSAVE and FXRSTOR.

I you have threads (doesn't matter if they are usermode threads or not) you have to save and restore FPU/SSE state on thread switch (just like any other registers). And that's where FXSAVE/FXRSTOR pair is used. If you don't care much about performance then save/restore FPU/SSE state on every switch. Done.

I you want something more intelligent, you have to have working interrupt/exception handling in place. With that you can simply set CR0.TS bit on every task switch and keep track of the last thread that saved its FPU/SSE state. Then if some thread needs to use FPU/SSE instruction, it will raise #NM exception. In its handler you clear CR0.TS bit, save FPU/SSE state (using FXSAVE) of your last thread that used FPU/SSE (if any), restore FPU/SSE state of current thread (using FXRSTOR) and mark current thread as last thread that used FPU/SSE. This way you only save and restore FPU/SSE state when absolutely needed. And it makes a difference, since FXSAVE/FXRSTOR buffer is 512 bytes in size.

There is how I initialize FPU enable SSE.

Code: Select all

    # init FPU
    fninit
    fldcw [fcw]

    # enable SSE
    mov eax, cr0
    and al, ~0x04
    or al, 0x22
    mov cr0, eax
    mov eax, cr4
    or ax, 0x600
    mov cr4, eax
And somewhere in data segment:

Code: Select all

    fcw: dw 0x037F

Re: Support SSE extension in my OS

Posted: Wed Nov 20, 2019 5:29 am
by iman
pvc wrote:If you have single threaded OS you don't need to bother with FXSAVE and FXRSTOR.

I you have threads (doesn't matter if they are usermode threads or not) you have to save and restore FPU/SSE state on thread switch (just like any other registers). And that's where FXSAVE/FXRSTOR pair is used. If you don't care much about performance then save/restore FPU/SSE state on every switch. Done.

I you want something more intelligent, you have to have working interrupt/exception handling in place. With that you can simply set CR0.TS bit on every task switch and keep track of the last thread that saved its FPU/SSE state. Then if some thread needs to use FPU/SSE instruction, it will raise #NM exception. In its handler you clear CR0.TS bit, save FPU/SSE state (using FXSAVE) of your last thread that used FPU/SSE (if any), restore FPU/SSE state of current thread (using FXRSTOR) and mark current thread as last thread that used FPU/SSE. This way you only save and restore FPU/SSE state when absolutely needed. And it makes a difference, since FXSAVE/FXRSTOR buffer is 512 bytes in size.

There is how I initialize FPU enable SSE.

Code: Select all

    # init FPU
    fninit
    fldcw [fcw]

    # enable SSE
    mov eax, cr0
    and al, ~0x04
    or al, 0x22
    mov cr0, eax
    mov eax, cr4
    or ax, 0x600
    mov cr4, eax
And somewhere in data segment:

Code: Select all

    fcw: dw 0x037F

Dear pvc.

Thanks a lot for your answer.
It is getting now more clear how to proceed.
Since I run my OS entirely single-thread, all in kernel mode, I won't bother with FXSAVE and FXRSTOR.

In your code, I have a question.
what if I only write this:

Code: Select all

    mov eax, cr0
    and al, ~0x04
    or al, 0x22
    mov cr0, eax
    mov eax, cr4
    or ax, 0x600
    mov cr4, eax
Do I have to do this as well?

Code: Select all

    fninit
    fldcw [fcw]
In my source code, I only use SSE registers xmm0 to xmm7 for single precision floating point calculations.

Thanks.
Iman.

Re: Support SSE extension in my OS

Posted: Wed Nov 20, 2019 5:42 am
by pvc
Some BIOSes/loaders do not always initialize FPU properly. FPU and SSE units share some of the hardware inside a CPU, so it won't hurt to initialize both at the same time (just to be safe).

Re: Support SSE extension in my OS

Posted: Wed Nov 20, 2019 11:07 am
by iman
pvc wrote:Some BIOSes/loaders do not always initialize FPU properly. FPU and SSE units share some of the hardware inside a CPU, so it won't hurt to initialize both at the same time (just to be safe).
Dear pvc.
I tried to activate sse by bit manipulations in control registers cr0 and cr4.

But now as soon as it reaches a sse instruction, the whole OS restarts (triple fault?)

What might be the problem?

Re: Support SSE extension in my OS

Posted: Wed Nov 20, 2019 11:42 am
by Octocontrabass
pvc wrote:

Code: Select all

    fldcw [fcw]

    fcw: dw 0x037F
There's no need for this, FNINIT already sets FCW to 0x037F. Also, it looks like you're not initializing MXCSR.
iman wrote:Since I run my OS entirely single-thread, all in kernel mode, I won't bother with FXSAVE and FXRSTOR.
According to Intel and AMD, you still need FXSAVE (or XSAVE/XSAVEOPT/XSAVEC/XSAVES) to read MXCSR_MASK so you can write MXCSR. I suspect it's safe to just write 0x1F80 without checking MXCSR_MASK, but there's no guarantee.
iman wrote:But now as soon as it reaches a sse instruction, the whole OS restarts (triple fault?)
Write some exception handlers for your OS. When an unexpected exception occurs, your handler can display the CPU registers on the screen (or output them to a serial port, or however you want to debug things). You can use that information to see if you've written the correct values to CR0 and CR4, and see which instruction caused the exception.

Re: Support SSE extension in my OS

Posted: Thu Nov 21, 2019 2:58 am
by pvc
I thought fninit initializes MXCSR too. Well, I was wrong.

Re: Support SSE extension in my OS

Posted: Thu Nov 21, 2019 3:47 am
by iman
Octocontrabass wrote:
pvc wrote:

Code: Select all

    fldcw [fcw]

    fcw: dw 0x037F
There's no need for this, FNINIT already sets FCW to 0x037F. Also, it looks like you're not initializing MXCSR.
iman wrote:Since I run my OS entirely single-thread, all in kernel mode, I won't bother with FXSAVE and FXRSTOR.
Octocontrabass wrote:According to Intel and AMD, you still need FXSAVE (or XSAVE/XSAVEOPT/XSAVEC/XSAVES) to read MXCSR_MASK so you can write MXCSR. I suspect it's safe to just write 0x1F80 without checking MXCSR_MASK, but there's no guarantee.
Is this the right way to initialize MXCSR?

Code: Select all

segment .code
fxsave [fxsave_region]
STMXCSR [mem]

segment .data
mem: dd 0x1F80
align 16
fxsave_region: TIMES 512 db 0

Re: Support SSE extension in my OS

Posted: Thu Nov 21, 2019 3:52 am
by iman
Octocontrabass wrote: Write some exception handlers for your OS. When an unexpected exception occurs, your handler can display the CPU registers on the screen (or output them to a serial port, or however you want to debug things). You can use that information to see if you've written the correct values to CR0 and CR4, and see which instruction caused the exception.
I have handlers for exceptions such as Divide-by-zero Error, Debug, Non-maskable Interrupt, Invalid Opcode, Double Fault, Segment Not Present, Stack-Segment Fault, General Protection Fault, SIMD Floating-Point Exception, etc., but none of them happened and, as I said, the CPU restarts .

Re: Support SSE extension in my OS

Posted: Thu Nov 21, 2019 5:05 am
by Octocontrabass
iman wrote:Is this the right way to initialize MXCSR?
No. You need to mask your desired value of MXCSR with MXCSR_MASK, then load it with LDMXCSR.

Something like this:

Code: Select all

segment .code
    fxsave [fxsave_region]
    mov eax, [fxsave_region + 28]
    or eax, eax
    jnz .next
    mov eax, 0xffbf        ; the default MXCSR_MASK for CPUs that don't support MXCSR_MASK
.next:
    mov [mxcsr_mask], eax  ; optional: save MXCSR_MASK for later changes to MXCSR
    and eax, 0x1f80        ; your desired value of MXCSR goes here
    mov [mem], eax
    ldmxcsr [mem]

segment .data
mem: dd 0
mxcsr_mask: dd 0
align 16
fxsave_region: times 512 db 0
Note that this code is not tested. You should read the Intel and/or AMD manuals for yourself to better understand what's going on.
iman wrote:I have handlers for exceptions such as Divide-by-zero Error, Debug, Non-maskable Interrupt, Invalid Opcode, Double Fault, Segment Not Present, Stack-Segment Fault, General Protection Fault, SIMD Floating-Point Exception, etc., but none of them happened and, as I said, the CPU restarts .
It sounds like one of them happened, but your exception handler didn't work.

Re: Support SSE extension in my OS

Posted: Thu Nov 21, 2019 7:57 am
by iman
Octocontrabass wrote:It sounds like one of them happened, but your exception handler didn't work.
I found the source of problem, but no clue why is that.

After writing to FXSAVE and MXCSR as you suggested, I tested my OS with a small test subroutine.
I got a #GP error.
Further analyzing, instead of reading from a memory into xmm0, I only used 128-bit registers and it worked perfectly. In the end I found out that in my NASM code, in data segment, EVEN IF there was "align 16" directive on top of all data area, only the first 16 bytes were aligned and I have to repeat "align 16" directive like below (the 2nd listing) in order to get it worked. To me it looks ugly and I have no clue why is that. Perhaps you can help me figure this out.

listing_1:

Code: Select all

section .data
align 16
_0 : dd  0.0,    0.0,    0.0,    0.0
_1 : dd  1.0,    1.0,    1.0,    1.0
_2 : dd  2.0,    2.0,    2.0,    2.0
section .text
test_sse:
movaps xmm0, [_0] ; it goes well
movaps xmm0, [_1] ; gives #GP
ret
listing_2:

Code: Select all

section .data
align 16
_0 : dd  0.0,    0.0,    0.0,    0.0
align 16
_1 : dd  1.0,    1.0,    1.0,    1.0
align 16
_2 : dd  2.0,    2.0,    2.0,    2.0
section .text
test_sse:
movaps xmm0, [_0] ; it goes well
movaps xmm0, [_1] ; it goes well
ret
Best regards.
Iman.