Unable to mark a memory region as WC using MTRRs
Posted: Sat Sep 01, 2018 8:47 am
Hi everyone,
time ago I added in my kernel support for a framebuffer. It noticed back then that its performance was not good enough and, after some research, it turned out that the problem was that framebuffer's memory region was not marked as write-combining.
Now I finally decided to solve this problem using x86's MTRRs or PAT. After studying how to do that, I added a set of functions useful to control the variable MTRRs and I made framebuffer's memory region to be WC by adding a new MTRR in the first available slot.
On my UDOO x86 (https://www.udoo.org/udoo-x86/) it worked great and I achieved a performance of about 1.7 cycles / pixel, which is about 10x faster than what I was able to achieve before using MTRRs.
Note: I actually cheated a little before working with MTRRs because I used FPU instructions (SSE, AVX2) when available in order to increase the performance [the larger the register, the better]. Anyway I said that just as a proof that, at least on some hardware, the new MTRR-related code has a real and tangible effect.
Now, my (apparently unsolvable) problem is with my Dell XPS 13" 9360: the kernel runs "well" but the new MTRR entry has no effect. The performance is terrible: about ~250 cycles/pixel. After some debugging, it turned out that the new MTRR entry is overridden by the MTRR entry 0, which marks the memory from +2 GB to +4 GB as UC (uncacheable). According to Intel's manual, UC always wins over other types of memory, in case of an overlap.
Note: the framebuffer's address on that machine is: 0x90000000 [+2.25 GB].
My first question is:
What's point of having such a big region of the physical memory marked as uncacheable?
My machine has 16 GB of physical memory, so I'd expect that region to be a perfectly regular (WB) part of the RAM. Usually the memory regions used for memory-mapped I/O are much smaller, a few MBs at most (like other MTRRs, on the same machine). There must be something I'm missing in the big picture.
My second question is:
So what could I do to make the framebuffer's memory to be WC? I tried to just invalidate the MTRR entry 0, hopefully in the proper way, as described in Intel's System Programming Guide, Section 11.11.7.2, but it did not work. The screen just forze while trying to re-enable the MTRRs after the change [I implemented pre_mtrr_change() and post_mtrr_change() as described in the Intel's documentation].
If necessary, I can copy-paste my code here, but the theoretical problem is: am I allowed to do that [removing an MTRR set by the firmware] in general? If not, what I'm supposed to do? I believe that there is a some kind of solution to this issue since Linux's framebuffer on the same machine is fast as expected.
Please tell me that there are other solutions than "surrender" and deal with the IOMMU in order to re-map the framebuffer.
Thanks a lot for the help guys,
Vlad
time ago I added in my kernel support for a framebuffer. It noticed back then that its performance was not good enough and, after some research, it turned out that the problem was that framebuffer's memory region was not marked as write-combining.
Now I finally decided to solve this problem using x86's MTRRs or PAT. After studying how to do that, I added a set of functions useful to control the variable MTRRs and I made framebuffer's memory region to be WC by adding a new MTRR in the first available slot.
On my UDOO x86 (https://www.udoo.org/udoo-x86/) it worked great and I achieved a performance of about 1.7 cycles / pixel, which is about 10x faster than what I was able to achieve before using MTRRs.
Note: I actually cheated a little before working with MTRRs because I used FPU instructions (SSE, AVX2) when available in order to increase the performance [the larger the register, the better]. Anyway I said that just as a proof that, at least on some hardware, the new MTRR-related code has a real and tangible effect.
Now, my (apparently unsolvable) problem is with my Dell XPS 13" 9360: the kernel runs "well" but the new MTRR entry has no effect. The performance is terrible: about ~250 cycles/pixel. After some debugging, it turned out that the new MTRR entry is overridden by the MTRR entry 0, which marks the memory from +2 GB to +4 GB as UC (uncacheable). According to Intel's manual, UC always wins over other types of memory, in case of an overlap.
Note: the framebuffer's address on that machine is: 0x90000000 [+2.25 GB].
My first question is:
What's point of having such a big region of the physical memory marked as uncacheable?
My machine has 16 GB of physical memory, so I'd expect that region to be a perfectly regular (WB) part of the RAM. Usually the memory regions used for memory-mapped I/O are much smaller, a few MBs at most (like other MTRRs, on the same machine). There must be something I'm missing in the big picture.
My second question is:
So what could I do to make the framebuffer's memory to be WC? I tried to just invalidate the MTRR entry 0, hopefully in the proper way, as described in Intel's System Programming Guide, Section 11.11.7.2, but it did not work. The screen just forze while trying to re-enable the MTRRs after the change [I implemented pre_mtrr_change() and post_mtrr_change() as described in the Intel's documentation].
If necessary, I can copy-paste my code here, but the theoretical problem is: am I allowed to do that [removing an MTRR set by the firmware] in general? If not, what I'm supposed to do? I believe that there is a some kind of solution to this issue since Linux's framebuffer on the same machine is fast as expected.
Please tell me that there are other solutions than "surrender" and deal with the IOMMU in order to re-map the framebuffer.
Thanks a lot for the help guys,
Vlad