CPUID Processor Count, and SSE...

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

CPUID Processor Count, and SSE...

Post by pcmattman »

I'm writing my CPUID interface (so that programs can get the CPU information) and it works properly apart from one small detail: when I use Bochs compiled for SMP support and enable 2 processors, the 'CPU count' field is 0, which seems wrong to me. Any ideas why, I'm not planning on using both processors, I would just like to be able to have the ability to know if there are 2 (or more) processors.

Also, how do I use SSE instructions? I've heard that they can help me (128-bit variables?) but I don't know how to implement them into my kernel. Any pointers, tips or code samples (I have Googled, so don't point me towards Google :D).
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

Normally, the multimedia registers are not used in the kernel, as this would cause speed issues with state saving (upon kernel entry the fpu and sse state would need to be saved and restored upon return), while these registers might not even be used by the program (where lazy fpu switching can be used instead)

Enabling the multimedia extensions requires some bits in the control registers to be set correctly:
cr0.em should be cleared
cr0.mp should be set
cr0.ts should be cleared

for SSE you need to set these as well (don't set them if the system does not support it):
cr4.osfxsr should be set
cr4.osxmmexcpt should be set

note that on a hardware task switch cr0.ts will be set and any multimedia instructions after that will raise an exception to allow context switching.

My kernel uses that handler to save/restore the fpu context, and it sets cr0.ts on software task switches.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: CPUID Processor Count, and SSE...

Post by Brendan »

Hi,
pcmattman wrote:I'm writing my CPUID interface (so that programs can get the CPU information) and it works properly apart from one small detail: when I use Bochs compiled for SMP support and enable 2 processors, the 'CPU count' field is 0, which seems wrong to me. Any ideas why, I'm not planning on using both processors, I would just like to be able to have the ability to know if there are 2 (or more) processors.
CPUID will tell you information for that CPU only. This means that in your case (Bochs emulating 2 seperate chips) the information you're getting is correct - there's "0 + 1" logical CPUs in the chip.

However, CPUID will tell you how many cores are in the same chip, and (for hyperthreading) will also tell you how many logical CPUs are in each core.

I can't remember the exact details, but IIRC Bochs is buggy when it comes to hyper-threading, and reports logical CPUs as multiple cores (or something like that). For my own purposes I rewrote the code in Bochs that emulates CPUID a while ago to avoid this problem (and others).
pcmattman wrote:Also, how do I use SSE instructions? I've heard that they can help me (128-bit variables?) but I don't know how to implement them into my kernel. Any pointers, tips or code samples (I have Googled, so don't point me towards Google :D).
Except for Combuster's comments, I've got no idea how you'd use MMX or SSE in GCC. I'd assume you'd need to use inline assembly....


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Post by jnc100 »

Output of cc1 --help from gcc 4.1.2:

Code: Select all

Target specific options:
  -m128bit-long-double        sizeof(long double) is 16
  -m32                        Generate 32bit i386 code
  -m3dnow                     Support 3DNow! built-in functions
  -m64                        Generate 64bit x86-64 code
  -m80387                     Use hardware fp
  -m96bit-long-double         sizeof(long double) is 12
  -maccumulate-outgoing-args  Reserve space for outgoing arguments in the
                              function prologue
  -malign-double              Align some doubles on dword boundary
  -malign-functions=          Function starts are aligned to this power of 2
  -malign-jumps=              Jump targets are aligned to this power of 2
  -malign-loops=              Loop code aligned to this power of 2
  -malign-stringops           Align destination of the string operations
  -march=                     Generate code for given CPU
  -masm=                      Use given assembler dialect
  -mbranch-cost=              Branches are this expensive (1-5, arbitrary units)
  -mcmodel=                   Use given x86-64 code model
  -mfancy-math-387            Generate sin, cos, sqrt for FPU
  -mfp-ret-in-387             Return values of functions in FPU registers
  -mfpmath=                   Generate floating point mathematics using given
                              instruction set
  -mhard-float                Use hardware fp
  -mieee-fp                   Use IEEE math for fp comparisons
  -minline-all-stringops      Inline all known string operations
  -mlarge-data-threshold=     Data greater than given threshold will go into
                              .ldata section in x86-64 medium model
  -mmmx                       Support MMX built-in functions
  -mms-bitfields              Use native (MS) bitfield layout
  -momit-leaf-frame-pointer   Omit the frame pointer in leaf functions
  -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2
  -mpush-args                 Use push instructions to save outgoing arguments
  -mred-zone                  Use red-zone in the x86-64 code
  -mregparm=                  Number of registers used to pass integer arguments
  -mrtd                       Alternate calling convention
  -msoft-float                Do not use hardware fp
  -msse                       Support MMX and SSE built-in functions and code
                              generation
  -msse2                      Support MMX, SSE and SSE2 built-in functions and
                              code generation
  -msse3                      Support MMX, SSE, SSE2 and SSE3 built-in
                              functions and code generation
  -msseregparm                Use SSE register passing conventions for SF and
                              DF mode
  -mstack-arg-probe           Enable stack probing
  -msvr3-shlib                Uninitialized locals in .bss
  -mtls-dialect=              Use given thread-local storage dialect
  -mtls-direct-seg-refs       Use direct references against %gs when accessing
                              tls data
  -mtune=                     Schedule code for given CPU
Regards,
John.
niteice
Member
Member
Posts: 59
Joined: Tue Oct 03, 2006 3:49 pm

Post by niteice »

The compiler includes builtin functions that are usually better than inline assembly, google "sse intrinsics"...
Post Reply