Page 1 of 1

memcpy large number of 4KB chunks

Posted: Tue Oct 21, 2008 1:44 pm
by sawdust
Hello Gurus,
What is a good way to do an efficient memcpy of a large number of 4KB chunks? The source is near 1GB address. If my cpu spends lot of time doing this memcpy, it is going to crawl. I'm doing a lot of this memory copy in my little kernel. There is no paging.
TIA :?: :idea:

Re: memcpy large number of 4KB chunks

Posted: Tue Oct 21, 2008 2:54 pm
by Combuster
Grab the AMD optimisation reference (includes optimizing memcpy as an example) and try to minimize the amount of copying you need to do.

The real guru would probably do paging and CoW, but you since you don't want to hear that... :-#

Re: memcpy large number of 4KB chunks

Posted: Wed Oct 22, 2008 10:35 am
by i586coder
first of all i'am not a guru,i'am just KERNEL coder :mrgreen:
any way,
the solution for your memcopy, that copy huge chunks of data is
SSE
this instruction present in INTEL & AMD

TO READ MORE ABOUT SSE, USE THIS LINK
http://softpixel.com/~cwright/programming/simd/sse.php

GOOD LUCK

Re: memcpy large number of 4KB chunks

Posted: Wed Oct 22, 2008 11:03 am
by sawdust
AhmadTayseerDajani wrote: the solution for your memcopy, that copy huge chunks of data is
SSE
this instruction present in INTEL & AMD

TO READ MORE ABOUT SSE, USE THIS LINK
http://softpixel.com/~cwright/programming/simd/sse.php
Thanks. I haven't used SIMD so far. I'll look now.

Re: memcpy large number of 4KB chunks

Posted: Wed Oct 22, 2008 11:27 am
by Brendan
Hi,
sawdust wrote: What is a good way to do an efficient memcpy of a large number of 4KB chunks? The source is near 1GB address. If my cpu spends lot of time doing this memcpy, it is going to crawl. I'm doing a lot of this memory copy in my little kernel. There is no paging.
These clues lead me to several possibilities, but none of these possibilities lead to anything sane.

It's like saying that hitting your head with a hammer hurts, and asking if there's something you can do to make it hurt less.... ;)

What exactly are you copying, and why?

Note: Using SSE won't increase RAM bandwidth, and RAM bandwidth is probably the biggest bottleneck with whatever you're already using. Prefetching won't help because the CPU's own hardware prefetcher will start prefetching for you after the first few cache lines. Flushing cache lines (and/or using non-temporal stores) won't improve the time spent copying (but would reduce the number of cache misses you get after copying).


Cheers,

Brendan