Page 1 of 1
MTRR Enabling
Posted: Mon Apr 02, 2007 9:40 am
by ~
I have seen that a model-specific registers (MSR's) called MTRR's are used to configure the type of memory access in a range. I have seen that it's said that the "Write Combine" setup is good to speed up the screen painting (as seen in Menuet).
I have said that it can seed up the PCI/AGP bus access in general. Now, isn't the processor itself connected through PCI?
And if so, wouldn't it speed up the whole memory if I set an MTRR to cover the whole memory with "Write Combine"?
Posted: Mon Apr 02, 2007 10:33 am
by Combuster
I suggest for starters you read:
http://en.wikipedia.org/wiki/Northbridg ... mputing%29
http://en.wikipedia.org/wiki/Southbridg ... mputing%29
Next, be aware of what caching and bus protocols do:
on a pci bus, you have several controlling cycles around the data cycles. If the memory is uncacheable each write or read will generate one transaction. If you do use some form of caching using write-back or write-combining, the CPU can buffer and send far more data in one transaction.
Using Write-Back for video memory is not good: if there is no need to evict lines from the cache the memory will not get written to the video card and you will not see any updates. (if you have a small kernel, there need not be any cache line evictions at all)
Using Write-Combining for main memory is not good: every write to memory will force a bus cycle in the near future, so that all reads and writes will go to main memory which is slower than the onboard caches.
You should be aware that PCI devices do not enforce cache coherency of their own, so you should be very careful with using any caching method on devices whose state can change independent of the CPU.
Posted: Mon Apr 02, 2007 10:53 am
by ~
In that case maybe it would be better to limit it only to video memory. There doesn't seem to be many more things for the MTRR's to be useful with "Write Combine", so I will have to limit the following code detecting somehow the base address and size of the video memory:
Code: Select all
;;;INIT: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write):
;;;INIT: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write):
;;;INIT: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write):
;Find a free MTRR register pair in the range 0x200 to 0x20F:
;;
xor ecx,ecx
mov cx,0x1FF ;Point 2 addresses before the first PhysMask MTRR register
.MTRRloop:
add cx,2 ;First PhysMask MTRR is at 0x201
cmp cx,0x211 ;Last MTRR pair ends at 0x20F
jz .noMTRR ;If surpassed we must end
rdmsr
test ah,0x08 ;See if pair is used
jnz .MTRRloop
dec ecx ;Point to PhysBase MTRR register of the found pair
;Set Base Address in the register of the found pair:
;;
xor edx,edx
xor eax,eax
inc eax
;Bits 0-1: Write Combine Memory Mode by MTRR (01b)
;;
wrmsr ;{EDX:EAX} == 0x0000000000000001
;Set up to 48 bits of PhysMask MTRR register pair for this full range (0x000000FFFFFFF800):
;;
inc ecx ;Point to the PhysMask register of this MTRR pair
dec dl ;{EDX} == 0x000000FF
mov eax,11111111111111111111100000000000b ;{EAX} == 0xFFFFF800
wrmsr
;We Use MTRR DefType Register to Enable MTRR's:
;;
mov cx,0x2FF ;{ECX}==0x000002FF, DefType Reg.
rdmsr ;Read
or ah,00001000b ;Set bit 11 in {EAX} to enable all MTRRs unconditionally
wrmsr ;Write and end!
.noMTRR:
;;;END: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write)
;;;END: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write)
;;;END: Enable Processor MTRRs (mainly useful for faster PCI/AGP and video write)
Posted: Mon Apr 02, 2007 1:17 pm
by mystran
What one typically does is have the memory be write-back cacheable, and then use MTRRs specifically to set video memory to write-combine.
If you need write-through or non-cacheable, you can get those by setting the relevant bits in page tables, so you only ever need MTRRs for write-combine.
Posted: Thu Apr 05, 2007 3:28 am
by Jeko
mystran wrote:What one typically does is have the memory be write-back cacheable, and then use MTRRs specifically to set video memory to write-combine.
If you need write-through or non-cacheable, you can get those by setting the relevant bits in page tables, so you only ever need MTRRs for write-combine.
how can I enable write-back for all the memory and write-combining for only a part of memory (video memory)?
Posted: Thu Apr 05, 2007 8:15 am
by ~
For video memory, you could scan the PCI base address for the (S)VGA display adapters and map that memory to be write-combine. The code above shows how to do it (set in a 3-bit field), you just need to adjust the base address. Now it would only be a matter of finding the size of the video buffer.
In the same fashion, you can set the range for write-back in one MTRR (you have to adjust that code to run several times to scan all of the 8 MTRR's being used here), and, in the bit field, set a free MTRR to write-back, using the bit value 110b (6).
Note that in the code it says bits 0-1 for type, but actually are bits 0-7 and only bits 0-2 are used (bits 3-7 are said to be reserved, at least in the AMD manual I have).
Posted: Thu Apr 05, 2007 8:44 am
by Jeko
~ wrote:For video memory, you could scan the PCI base address for the (S)VGA display adapters and map that memory to be write-combine. The code above shows how to do it (set in a 3-bit field), you just need to adjust the base address. Now it would only be a matter of finding the size of the video buffer.
for example, if I want to enable write-combining for address from (0xC00000000) to (0xC001D4C00) how can I do?
Posted: Thu Apr 05, 2007 9:30 am
by Brendan
Hi,
Some random MTTR notes...
The BIOS configures the MTRRs during boot, so all usable RAM should already be set to the write-back caching type.
There's 2 different types of MTRRs and a "default". The default determines what caching type is used for anything that isn't covered by anything else (and Intel recommends it's set to "uncacheable"). Then there the fixed range registers that cover everything from 0x00000000 to 0x000FFFFF.
Lastly, there may or may not be any variable range registers. These can be used to specify the caching type for arbitrary address ranges. To determine how many variable range registers there are you need to check the VCNT field (lowest 8 bits) of the MTRRCAP MSR.
For Intel P6 to Pentium 4 there are always 8 variable range registers, but I don't know about more modern Intel CPUs (e.g. Core) or CPUs from other manufacturers. In any case it's better to use the VCNT field so that if anyone makes a CPU with 4 or 16 variable range registers your OS will already support it.
For the variable range registers there's some specific requirements for alignment and size masks, conditions for overlapping regions, and rules that determine the effective caching type once the CPU considers things like the cache control flags in the page table entry.
Also don't forget that you shouldn't change any MTRR without disabling the CPUs caches (including the "global" flag in CR4) and flushing the caches beforehand (even on single-CPU systems). For multi-CPU systems this can be a bit harder as all CPUs must disable cacheing and flush their caches, then all CPUs must load the new MTRRs at the same time before re-enabling caches. Failure to flush caches correctly can result in "undefined" behaviour (for e.g. stale data left in caches).
Cheers,
Brendan
Posted: Thu Apr 05, 2007 10:21 am
by ~
I haven't made "complex" calculations, but I guess the following should work (that's what I do):
Code: Select all
;Set Base Address in the register of the found pair:
;;
mov edx,0x00000000C ;[0000000C:
xor eax,eax ; 00000000]
inc eax
;Bits 0-1: Write Combine Memory Mode by MTRR (01b)
;;
wrmsr ;{EDX:EAX} == 0x0000000000000001
;Set up to 48 bits of PhysMask MTRR register pair for this full range:
;;
inc ecx ;Point to the PhysMask register of this MTRR pair
mov edx,0x000FFFFF
mov eax,0xFFE2B000
wrmsr
Note that I have only put the base address and put a mask so that any address used after the base address takes effect.
This is from the AMD64 Architecture Programmer's Manual Volume 2, System Programming, page 224:
PhysMask and PhysBase are used together to determine whether a target physical-address falls within the specified address range. PhysMask is logically ANDed with PhysBase and separately ANDed with the upper 40 bits of the target physical address. If the result of the two operations are identical, the target physical address falls within the specified memory range. The pseudo-code for the operation is:
Code: Select all
MaskBase = PhysMask AND PhysBase
MaskTarget = PhysMask AND Target_Address[51:12]
if MaskBase = MaskTarget
then Target_Address_In_Range
else Target_Address_Not_In_Range
PhysMask and PhysBase Values. Software can calculate the PhysMask value using the following procedure:
1. Substract the memory-range physical base-address from the upper physical-address of the memory range.
2. Substract the value calculated in Step 1 from the physical memory size.
3. Truncate the lower 12 bits of the result in Step 2 to create the PhysMask value to be loaded into the MTRRphysMaskn register. Truncation is performed by right-shifting the value 12 bits.
For example, assume a 32-Mbyte memory range is specified within the 52-bit physical address space, starting at address 200_0000h. The upper address of the range is 3FF_FFFFh. Following the process outlined above yields:
1. 3FF_FFFFh-200_0000h = 1FF_FFFFh
2. F_FFFF_FFFF_FFFFh-1_FF_FFFFh = F_FFFF_FE00_0000h
3. Right shift (F_FFFF_FE00_0000h) by 12 = FF_FFFF_E000h
In this example, the 40-bit value loaded into the PhysMask field is FF_FFFF_E000h.
Software must also truncate the lower 12 bits of the physical base address before loading it into the PhysBase field. In the example above, the 40-bit PhysBase field is 00_0000_2000h.
For our needs:
1. 0xC001D4C00-0xC00000000 = 0x1D4C00
2. 0xFFFFFFFFFFFFF-0x1D4C00 = 0xFFFFFFFE2B3FF
3. 0xFFFFFFFE2B3FF >> 12 = 0xFFFFFFFE2B
I hope the following explanation is accurate and error-free. It seems so with the calculation I've tried to do. I suspect that all values should be enforced to be aligned at 4Kb, since that's what the manual says, but I think that's performed when right-shifting 12 bits before loading the value in the proper field.
Posted: Thu Apr 05, 2007 10:34 am
by Combuster
I think there are two assumptions to this algorithm:
1) the area is a power of two in size
2) the base is a multiple of its size (i.e. the area is aligned to a boundary equal to its size)