Page 1 of 1

Questions on memory access time test result

Posted: Wed Apr 29, 2009 1:51 am
by crasher
I have made 2 tests on PCI config space and normal PC memory access . memory block size are all 0x3000 bytes, samples count is 1000.

PCI Memory read time is around 4.5 millisecond (jittering 200 micro seconds), writing time is around 0.4 millisecond, jittering (20 micro second)
Q1. Why is there such a huge difference between reading and writing time?
Is it because of device sleep state or other fancy power management?

Normal PC memory access takes 7.3 micro seconds for each memcpy. Its jittering is very small(0.04 micro second) excepts the first loop which is around 8.6 micro seconds. Result is the same if I trying "invlpg" instruction in each loop.
The problem should only be in memory copy. If I called memcpy(buf2, buf, 0x3000) before the "for" loop, all the recorded data is around 7.3 us.
Q2. What causes the first loop taking such a long time?


Below are my codes.

buf and buf2 are allocated from heap
v_bar0 is the address of my PCI card

Test1:

Code: Select all

for ( i = 0; i < 1000; i++ )
{
	timer_start();
	memcpy(buf, v_bar0, 0x3000); //read from pci config space
	timer_stop();
	diff[i][0] = timer_diff();
}
for ( i = 0; i < 1000; i++ )
{
	timer_start();
	memcpy(v_bar0, buf, 0x3000); //write to pci config space
	timer_stop();
	diff[i][1] = timer_diff();
}
Test 2:

Code: Select all

for ( i = 0; i < 1000; i++ )
{
	timer_start();
	memcpy(buf2, buf, 0x3000); //normal memory copy
	timer_stop();
	diff[i][2] = timer_diff();
}

Re: Questions on memory access time test result

Posted: Wed Apr 29, 2009 6:16 am
by johnsa
Q2.. probably related to priming the TLB and cache priming. The first iteration is not cached, there-after it is.

Q1.. The PCI config space probably needs to ensure that data is available and valid before allowing the read / coherency and all... plus the mapped space (assuming you're using the mcfg enhanced config mechanism) will have different caching policy to normal memory. Writes makes sense as once the write is delivered the cpu can carry on and the bus will spend time ensuring that data is queued to update.