MMIO read/write latency

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
u9012063
Member
Member
Posts: 26
Joined: Mon Jan 23, 2012 5:00 am
Location: Stony Brook University | ITRI

MMIO read/write latency

Post by u9012063 »

Hi,

I found my MMIO read/write latency is unreasonably high. I hope someone could give me some suggestions.

In the Linux kernel space, I wrote a simple program to read a 4 byte value in a PCIe device's BAR0 address. The device is a PCIe Intel 10G NIC and plugged-in at the PCIe x16 bus on my Xeon E5 server. I use rdtsc to measure the time between the beginning of the MMIO read and the end, a code snippet looks like this:

Code: Select all

vaddr = ioremap_nocache(0xf8000000, 128); // addr is the BAR0 of the device
rdtscl(init); 
ret = readl(vaddr); 
rmb(); 
rdtscl(end);
I'm expecting the elapsed time between (end, init) to be less than 1us, after all, the data traversing the PCIe data link should be only a few hundreds of nanoseconds. However, my test results show at lease 5.5use to do a MMIO PCIe device read. I'm wondering whether this is reasonable. I change my code to remote the memory barrier (rmb) , but still get around 5 us latency.

Code: Select all

rdtscl(init); 
end = readl(vaddr); 
rdtscl(end);
This paper mentions about the PCIe latency measurement. Usually it's less than 1us.
http://www.cl.cam.ac.uk/~awm22/.../mill ... ating.pdf‎

Do I need to do any special configuration such as kernel or device to get lower MMIO access latency? or Does anyone has experiences doing this before?

Thanks!
William
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: MMIO read/write latency

Post by Owen »

rdtscp causes a pipeline stall and adds significant overhead. Repeat the read thousands of times in between measurements order to average it out.

Otherwise, yes: uncached reads are slow
u9012063
Member
Member
Posts: 26
Joined: Mon Jan 23, 2012 5:00 am
Location: Stony Brook University | ITRI

Re: MMIO read/write latency

Post by u9012063 »

I did more experiments as below

1 MMIO PCIe read: 4.12 us
10 MMIO PCIe read: 9.72 us
100 MMIO PCIe read: 69 us
1000 MMIO PCIe read: 674 us --> 0.6us per read

Does this mean MMIO can be batched and flush to the device? or it's the overhead of rdtsc so that 1 MMIO PCIe read took so long?

Regards,
William
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: MMIO read/write latency

Post by Brendan »

Hi,
u9012063 wrote:I did more experiments as below

1 MMIO PCIe read: 4.12 us
10 MMIO PCIe read: 9.72 us
100 MMIO PCIe read: 69 us
1000 MMIO PCIe read: 674 us --> 0.6us per read
If ten reads costs 9.72 us; then you can assume that "overhead plus one read" costs 4.12 us and the remaining nine reads cost a total of 5.6 us or about 0.622 us each. If "overhead plus one read" costs 4.12 us and one read costs 0.622 us, then the overhead is 4.12 - 0.622 = 3.498 us.

Based on this you'd expect 1000 reads to cost 3.398 + 1000*0.622 us; which works out to a total of 625.398 us. This is close enough to what you actually measured.

So, where does the 3.5 us of overhead come from (given that it has nothing to do with the read itself)? My guess is that the first read causes a TLB miss, "rdtscl()" does more than you imagine (e.g. maybe there's function call overheads, the raw result is scaled, etc), there's loop setup costs, etc.

In the same way, I wouldn't assume that "readl();" doesn't add overhead of its own. For example; the actual read might take 0.5 us and you might be spending 0.122 us on function call overhead, parameter sanity checks, etc.
u9012063 wrote:Does this mean MMIO can be batched and flush to the device?
No. It just means you have no idea what you're actually measuring (until/unless you examine the assembly and prevent external factors like TLB misses).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
u9012063
Member
Member
Posts: 26
Joined: Mon Jan 23, 2012 5:00 am
Location: Stony Brook University | ITRI

Re: MMIO read/write latency

Post by u9012063 »

Oh I got it! thank you.
Post Reply