MTRR Enabling

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

MTRR Enabling

Post by ~ »

I have seen that a model-specific registers (MSR's) called MTRR's are used to configure the type of memory access in a range. I have seen that it's said that the "Write Combine" setup is good to speed up the screen painting (as seen in Menuet).

I have said that it can seed up the PCI/AGP bus access in general. Now, isn't the processor itself connected through PCI?

And if so, wouldn't it speed up the whole memory if I set an MTRR to cover the whole memory with "Write Combine"?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

I suggest for starters you read:
http://en.wikipedia.org/wiki/Northbridg ... mputing%29
http://en.wikipedia.org/wiki/Southbridg ... mputing%29

Next, be aware of what caching and bus protocols do:
on a pci bus, you have several controlling cycles around the data cycles. If the memory is uncacheable each write or read will generate one transaction. If you do use some form of caching using write-back or write-combining, the CPU can buffer and send far more data in one transaction.

Using Write-Back for video memory is not good: if there is no need to evict lines from the cache the memory will not get written to the video card and you will not see any updates. (if you have a small kernel, there need not be any cache line evictions at all)
Using Write-Combining for main memory is not good: every write to memory will force a bus cycle in the near future, so that all reads and writes will go to main memory which is slower than the onboard caches.

You should be aware that PCI devices do not enforce cache coherency of their own, so you should be very careful with using any caching method on devices whose state can change independent of the CPU.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Post by ~ »

In that case maybe it would be better to limit it only to video memory. There doesn't seem to be many more things for the MTRR's to be useful with "Write Combine", so I will have to limit the following code detecting somehow the base address and size of the video memory:

Code: Select all

       ;;;INIT: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write):
       ;;;INIT: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write):
       ;;;INIT: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write):
         ;Find a free MTRR register pair in the range 0x200 to 0x20F:
         ;;
          xor ecx,ecx
          mov cx,0x1FF          ;Point 2 addresses before the first PhysMask MTRR register
           .MTRRloop:
             add cx,2         ;First PhysMask MTRR is at 0x201
             cmp cx,0x211     ;Last MTRR pair ends at 0x20F
             jz .noMTRR       ;If surpassed we must end
             rdmsr
             test ah,0x08     ;See if pair is used
             jnz .MTRRloop
             dec ecx          ;Point to PhysBase MTRR register of the found pair

         ;Set Base Address in the register of the found pair:
         ;;
          xor edx,edx
          xor eax,eax
          inc eax
           ;Bits 0-1: Write Combine Memory Mode by MTRR (01b)
           ;;
            wrmsr         ;{EDX:EAX} == 0x0000000000000001

         ;Set up to 48 bits of PhysMask MTRR register pair for this full range (0x000000FFFFFFF800):
         ;;
          inc ecx    ;Point to the PhysMask register of this MTRR pair
          dec dl     ;{EDX} == 0x000000FF
          mov eax,11111111111111111111100000000000b  ;{EAX} == 0xFFFFF800
           wrmsr

         ;We Use MTRR DefType Register to Enable MTRR's:
         ;;
          mov cx,0x2FF   ;{ECX}==0x000002FF, DefType Reg.
          rdmsr           ;Read
          or ah,00001000b    ;Set bit 11 in {EAX} to enable all MTRRs unconditionally
           wrmsr           ;Write and end!

         .noMTRR:

       ;;;END:  Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write)
       ;;;END:  Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write)
       ;;;END:  Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write)
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

What one typically does is have the memory be write-back cacheable, and then use MTRRs specifically to set video memory to write-combine.

If you need write-through or non-cacheable, you can get those by setting the relevant bits in page tables, so you only ever need MTRRs for write-combine.
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

mystran wrote:What one typically does is have the memory be write-back cacheable, and then use MTRRs specifically to set video memory to write-combine.

If you need write-through or non-cacheable, you can get those by setting the relevant bits in page tables, so you only ever need MTRRs for write-combine.
how can I enable write-back for all the memory and write-combining for only a part of memory (video memory)?
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Post by ~ »

For video memory, you could scan the PCI base address for the (S)VGA display adapters and map that memory to be write-combine. The code above shows how to do it (set in a 3-bit field), you just need to adjust the base address. Now it would only be a matter of finding the size of the video buffer.

In the same fashion, you can set the range for write-back in one MTRR (you have to adjust that code to run several times to scan all of the 8 MTRR's being used here), and, in the bit field, set a free MTRR to write-back, using the bit value 110b (6).

Note that in the code it says bits 0-1 for type, but actually are bits 0-7 and only bits 0-2 are used (bits 3-7 are said to be reserved, at least in the AMD manual I have).
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

~ wrote:For video memory, you could scan the PCI base address for the (S)VGA display adapters and map that memory to be write-combine. The code above shows how to do it (set in a 3-bit field), you just need to adjust the base address. Now it would only be a matter of finding the size of the video buffer.
for example, if I want to enable write-combining for address from (0xC00000000) to (0xC001D4C00) how can I do?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Post by Brendan »

Hi,

Some random MTTR notes...

The BIOS configures the MTRRs during boot, so all usable RAM should already be set to the write-back caching type.

There's 2 different types of MTRRs and a "default". The default determines what caching type is used for anything that isn't covered by anything else (and Intel recommends it's set to "uncacheable"). Then there the fixed range registers that cover everything from 0x00000000 to 0x000FFFFF.

Lastly, there may or may not be any variable range registers. These can be used to specify the caching type for arbitrary address ranges. To determine how many variable range registers there are you need to check the VCNT field (lowest 8 bits) of the MTRRCAP MSR.

For Intel P6 to Pentium 4 there are always 8 variable range registers, but I don't know about more modern Intel CPUs (e.g. Core) or CPUs from other manufacturers. In any case it's better to use the VCNT field so that if anyone makes a CPU with 4 or 16 variable range registers your OS will already support it.

For the variable range registers there's some specific requirements for alignment and size masks, conditions for overlapping regions, and rules that determine the effective caching type once the CPU considers things like the cache control flags in the page table entry.

Also don't forget that you shouldn't change any MTRR without disabling the CPUs caches (including the "global" flag in CR4) and flushing the caches beforehand (even on single-CPU systems). For multi-CPU systems this can be a bit harder as all CPUs must disable cacheing and flush their caches, then all CPUs must load the new MTRRs at the same time before re-enabling caches. Failure to flush caches correctly can result in "undefined" behaviour (for e.g. stale data left in caches).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Post by ~ »

I haven't made "complex" calculations, but I guess the following should work (that's what I do):

Code: Select all

         ;Set Base Address in the register of the found pair: 
         ;; 
          mov edx,0x00000000C  ;[0000000C:
          xor eax,eax          ;          00000000]
          inc eax 
           ;Bits 0-1: Write Combine Memory Mode by MTRR (01b) 
           ;; 
            wrmsr         ;{EDX:EAX} == 0x0000000000000001 

         ;Set up to 48 bits of PhysMask MTRR register pair for this full range: 
         ;; 
          inc ecx    ;Point to the PhysMask register of this MTRR pair 
          mov edx,0x000FFFFF
          mov eax,0xFFE2B000
           wrmsr 

Note that I have only put the base address and put a mask so that any address used after the base address takes effect.

This is from the AMD64 Architecture Programmer's Manual Volume 2, System Programming, page 224:

PhysMask and PhysBase are used together to determine whether a target physical-address falls within the specified address range. PhysMask is logically ANDed with PhysBase and separately ANDed with the upper 40 bits of the target physical address. If the result of the two operations are identical, the target physical address falls within the specified memory range. The pseudo-code for the operation is:

Code: Select all

MaskBase = PhysMask AND PhysBase
MaskTarget = PhysMask AND Target_Address[51:12]
if MaskBase = MaskTarget
   then Target_Address_In_Range
   else Target_Address_Not_In_Range


PhysMask and PhysBase Values. Software can calculate the PhysMask value using the following procedure:

1. Substract the memory-range physical base-address from the upper physical-address of the memory range.

2. Substract the value calculated in Step 1 from the physical memory size.

3. Truncate the lower 12 bits of the result in Step 2 to create the PhysMask value to be loaded into the MTRRphysMaskn register. Truncation is performed by right-shifting the value 12 bits.

For example, assume a 32-Mbyte memory range is specified within the 52-bit physical address space, starting at address 200_0000h. The upper address of the range is 3FF_FFFFh. Following the process outlined above yields:

1. 3FF_FFFFh-200_0000h = 1FF_FFFFh
2. F_FFFF_FFFF_FFFFh-1_FF_FFFFh = F_FFFF_FE00_0000h
3. Right shift (F_FFFF_FE00_0000h) by 12 = FF_FFFF_E000h

In this example, the 40-bit value loaded into the PhysMask field is FF_FFFF_E000h.

Software must also truncate the lower 12 bits of the physical base address before loading it into the PhysBase field. In the example above, the 40-bit PhysBase field is 00_0000_2000h.

For our needs:
1. 0xC001D4C00-0xC00000000 = 0x1D4C00
2. 0xFFFFFFFFFFFFF-0x1D4C00 = 0xFFFFFFFE2B3FF
3. 0xFFFFFFFE2B3FF >> 12 = 0xFFFFFFFE2B

I hope the following explanation is accurate and error-free. It seems so with the calculation I've tried to do. I suspect that all values should be enforced to be aligned at 4Kb, since that's what the manual says, but I think that's performed when right-shifting 12 bits before loading the value in the proper field.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

I think there are two assumptions to this algorithm:
1) the area is a power of two in size
2) the base is a multiple of its size (i.e. the area is aligned to a boundary equal to its size)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply