Hello Gurus,
What is a good way to do an efficient memcpy of a large number of 4KB chunks? The source is near 1GB address. If my cpu spends lot of time doing this memcpy, it is going to crawl. I'm doing a lot of this memory copy in my little kernel. There is no paging.
TIA
memcpy large number of 4KB chunks
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: memcpy large number of 4KB chunks
Grab the AMD optimisation reference (includes optimizing memcpy as an example) and try to minimize the amount of copying you need to do.
The real guru would probably do paging and CoW, but you since you don't want to hear that...
The real guru would probably do paging and CoW, but you since you don't want to hear that...
Re: memcpy large number of 4KB chunks
first of all i'am not a guru,i'am just KERNEL coder
any way,
the solution for your memcopy, that copy huge chunks of data is
SSE
this instruction present in INTEL & AMD
TO READ MORE ABOUT SSE, USE THIS LINK
http://softpixel.com/~cwright/programming/simd/sse.php
GOOD LUCK
any way,
the solution for your memcopy, that copy huge chunks of data is
SSE
this instruction present in INTEL & AMD
TO READ MORE ABOUT SSE, USE THIS LINK
http://softpixel.com/~cwright/programming/simd/sse.php
GOOD LUCK
Distance doesn't make you any smaller,
but it does make you part of a larger picture.
but it does make you part of a larger picture.
Re: memcpy large number of 4KB chunks
Thanks. I haven't used SIMD so far. I'll look now.AhmadTayseerDajani wrote: the solution for your memcopy, that copy huge chunks of data is
SSE
this instruction present in INTEL & AMD
TO READ MORE ABOUT SSE, USE THIS LINK
http://softpixel.com/~cwright/programming/simd/sse.php
Re: memcpy large number of 4KB chunks
Hi,
It's like saying that hitting your head with a hammer hurts, and asking if there's something you can do to make it hurt less....
What exactly are you copying, and why?
Note: Using SSE won't increase RAM bandwidth, and RAM bandwidth is probably the biggest bottleneck with whatever you're already using. Prefetching won't help because the CPU's own hardware prefetcher will start prefetching for you after the first few cache lines. Flushing cache lines (and/or using non-temporal stores) won't improve the time spent copying (but would reduce the number of cache misses you get after copying).
Cheers,
Brendan
These clues lead me to several possibilities, but none of these possibilities lead to anything sane.sawdust wrote: What is a good way to do an efficient memcpy of a large number of 4KB chunks? The source is near 1GB address. If my cpu spends lot of time doing this memcpy, it is going to crawl. I'm doing a lot of this memory copy in my little kernel. There is no paging.
It's like saying that hitting your head with a hammer hurts, and asking if there's something you can do to make it hurt less....
What exactly are you copying, and why?
Note: Using SSE won't increase RAM bandwidth, and RAM bandwidth is probably the biggest bottleneck with whatever you're already using. Prefetching won't help because the CPU's own hardware prefetcher will start prefetching for you after the first few cache lines. Flushing cache lines (and/or using non-temporal stores) won't improve the time spent copying (but would reduce the number of cache misses you get after copying).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.