Hi folks,
I knew from EVERYWHERE that the MMIO region should be uncachable (UC), but I don't exactly know the reason. From the memory cache control chapter in Intel IA-32 software developer's manual v3a, I knew that the cache coherent protocol (MESI) is realized by processor snooping the other processer's memory access and this snooping ability keeps processors' internal caches consistent with the system memory.
From the description, the processor should not care about the source of the content that it is caching; whether it is from DRAM or from MMIO has no difference with the processor's snooping operation. In other word, if processor A is writing to an MMIO region, processor B is aware of this and the MESI protocol should still be effective, such as set B's cacheline to Invalid state.
I found some discussions saying that it is because of ordering[1]; MMIO is mostly used to be the configuration space of a device and ordering should be enforced. But why if I want to treat the MMIO as a memory storage region and I don't care about out-of-order delivering? For example video frame buffer is usually set as UC or WC, but definitely not write-back.
Given that the MESI works fine among processors, is it because that the cache controller can only read or write from/to a cache line to DRAM but not from/to a device's memory[2]? Is cache controller capable of flushing a cacheline into a device's MMIO region, other that usually flushing to the DRAM?
Thanks!
William
Reference:
1. http://stackoverflow.com/questions/4698 ... ction-stil
2. http://lkml.indiana.edu/hypermail/linux ... /0762.html
Why should MMIO region be uncachable (UC)?
Re: Why should MMIO region be uncachable (UC)?
When you read the contents of an MMIO register, it might have either been set by the device, or by one of the processors writing to it - so the processors talking to each other to ensure cache coherency is not sufficient, as events external to the processors (ie. data coming from the device) won't be accounted for.
Looking at it the other way round, if caching was enabled, and you write to an MMIO register, the write may be cached, and not reach the device until some undefined point in the future when the cache is flushed.
[Disclaimer: I haven't actually personally programmed MMIO yet...]
Looking at it the other way round, if caching was enabled, and you write to an MMIO register, the write may be cached, and not reach the device until some undefined point in the future when the cache is flushed.
[Disclaimer: I haven't actually personally programmed MMIO yet...]
Re: Why should MMIO region be uncachable (UC)?
Hi,
Now imagine that you enable write-back caching and write the value 123 to the device's register. The CPU gets a cache miss, so it reads the entire cache line (and the device sees the read and drowns your chickens), then the CPU writes the value 123 into the cache (but doesn't write to the device itself, so the device doesn't water you garden). Basically; caching breaks everything and prevents the device from working as designed.
Cheers,
Brendan
Imagine a PCI device that waters your garden. It might have one memory mapped register, where writing to the register causes the garden to be watered (e.g. if you write 123 it will turn the water on for 123 minutes), and if you read from the register it might tell the device to drown your chickens.u9012063 wrote:I knew from EVERYWHERE that the MMIO region should be uncachable (UC), but I don't exactly know the reason.
Now imagine that you enable write-back caching and write the value 123 to the device's register. The CPU gets a cache miss, so it reads the entire cache line (and the device sees the read and drowns your chickens), then the CPU writes the value 123 into the cache (but doesn't write to the device itself, so the device doesn't water you garden). Basically; caching breaks everything and prevents the device from working as designed.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Why should MMIO region be uncachable (UC)?
Dear all,
Thanks for the reply!
Let's assume the MMIO region is meant for storage, not for device configuration. And only processors have the privilege to access it, so the device itself and no other devices will touch it. Actually we can think of it as a DRAM not on the memory bus but on the PCIe device only meant for the processors to access.
Like Brendan said, I assume after enabling write-back when writing the value 123 to the device's register, the CPU gets a cache miss, so it reads the entire cache line, then the CPU writes the value 123 into the cache. --> This is exactly what I want the system to behave.
I did the following experiment to test it:
In the Intel Xeon environment, I pick an unused device, unload anything related to it, disable it, and set its BAR0(128K range) to be cacheable using /proc/mtrr. The system immediately reboots when a program writes anything to that 128K cacheable range, while if setting as non-cacheable, the system works fine.
So I'm guessing that the reason for UC MMIO is not because of data inconsistence, contention, or device confusion, but it is architecturally incapable of doing this due to lack of support for cache controller to flush to the MMIO region. However, I did not find any document saying about this limitation.
Regards,
William
Thanks for the reply!
Let's assume the MMIO region is meant for storage, not for device configuration. And only processors have the privilege to access it, so the device itself and no other devices will touch it. Actually we can think of it as a DRAM not on the memory bus but on the PCIe device only meant for the processors to access.
Like Brendan said, I assume after enabling write-back when writing the value 123 to the device's register, the CPU gets a cache miss, so it reads the entire cache line, then the CPU writes the value 123 into the cache. --> This is exactly what I want the system to behave.
I did the following experiment to test it:
In the Intel Xeon environment, I pick an unused device, unload anything related to it, disable it, and set its BAR0(128K range) to be cacheable using /proc/mtrr. The system immediately reboots when a program writes anything to that 128K cacheable range, while if setting as non-cacheable, the system works fine.
So I'm guessing that the reason for UC MMIO is not because of data inconsistence, contention, or device confusion, but it is architecturally incapable of doing this due to lack of support for cache controller to flush to the MMIO region. However, I did not find any document saying about this limitation.
Regards,
William
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Why should MMIO region be uncachable (UC)?
The main reason is as Brendan elucidated: If the region is mapped cacheable, you have no idea when (or how - for all you know the device locks up when a 16byte read reaches it) your writes are going to arrive at the device
For buffers in device memory, such as the video buffer, the main reason Write Combining is used are
For buffers in device memory, such as the video buffer, the main reason Write Combining is used are
- To ensure that the data reaches the device promptly without need to flush the caches. Flushing the caches is a very expensive process (10s of cycles per cache line if using CLFLUSH; thousands of cycles minimum if you're doing a WBINVD, more likely hundreds of thousands if there is much data in the cache)
- For performance reasons. If your screen buffer is mapped WB/WB (writeback/write-through), and you copy a full screen buffer to it, on a system with a 1080p display that is 8MB. My just over a year old machine has an Intel Sandy Bridge i5-2500K with 6MB of cache total. In other words, copying that screen buffer has completely blown away everything else in your cache (possibly twice - once for the source, once for the destination).
- You wrote into the BAR
- The processor realised it didn't have that cache line in memory
- The processor did a burst read (i.e. whole cache line, ~64 bytes) to the BAR
- The device, not supporting burst reads (UC access will never trigger a burst read/write), entered an unsupported state machine state and locked up/started doing other nonconformant things
- The controller detected that the PCI Express lane had experienced a major error, and raised the system RESET line
- The system rebooted
Re: Why should MMIO region be uncachable (UC)?
Hi Owen,
Thanks a lot for the explanation.
You mean the the processors do not issue x86 instructions such as store/load/move when the target address is cacheable? So there are some special burst read/write instructions that is designed specifically for cacheline access and only the DRAM address range is effective but not the device?
I'm googling about the burst read/write instruction, do you mean something like SIMD (http://en.wikipedia.org/wiki/SIMD) or Intel's SSE?
Regards,
William
Thanks a lot for the explanation.
You mean the the processors do not issue x86 instructions such as store/load/move when the target address is cacheable? So there are some special burst read/write instructions that is designed specifically for cacheline access and only the DRAM address range is effective but not the device?
I'm googling about the burst read/write instruction, do you mean something like SIMD (http://en.wikipedia.org/wiki/SIMD) or Intel's SSE?
Regards,
William
Re: Why should MMIO region be uncachable (UC)?
No, it's the low-level protocol that CPU uses for bus operations. It has nothing to do with the CPU instruction set.u9012063 wrote:I'm googling about the burst read/write instruction, do you mean something like SIMD (http://en.wikipedia.org/wiki/SIMD) or Intel's SSE?
Learn to read.
Re: Why should MMIO region be uncachable (UC)?
Thanks for everyone's explanation! very appreciated!
William
btw, I found a blog talking about cached MMIO.
http://blogs.utexas.edu/jdm4372/2013/05/29/
Basically, the author claim it's possible to do that under some limitations.
William
btw, I found a blog talking about cached MMIO.
http://blogs.utexas.edu/jdm4372/2013/05/29/
Basically, the author claim it's possible to do that under some limitations.