Hi,
sawdust wrote:
What is a good way to do an efficient memcpy of a large number of 4KB chunks? The source is near 1GB address. If my cpu spends lot of time doing this memcpy, it is going to crawl. I'm doing a lot of this memory copy in my little kernel. There is no paging.
These clues lead me to several possibilities, but none of these possibilities lead to anything sane.
It's like saying that hitting your head with a hammer hurts, and asking if there's something you can do to make it hurt less....
What exactly are you copying, and why?
Note: Using SSE won't increase RAM bandwidth, and RAM bandwidth is probably the biggest bottleneck with whatever you're already using. Prefetching won't help because the CPU's own hardware prefetcher will start prefetching for you after the first few cache lines. Flushing cache lines (and/or using non-temporal stores) won't improve the time spent copying (but would reduce the number of cache misses you get after copying).
Cheers,
Brendan