I found this today:
http://lwn.net/Articles/250967/
It is a 7 part series on how memory/cache works and how programmers can take advantage of them fully. I finished reading the first page and am half way through the comments ( i recommend reading the comments also ), and so far it is very good.
I did not know where to post this, so I have a feeling it may get moved.
lwn series on memory and using cache wisely
I just spent the last 12 hours reading it slowly. There is a lot more there than you would expect!
This article demonstrates why I abhor traditional spinlocks. I find it quite sad that the author did all this testing on cache contention (RFO invalidation) on list tracing code, and never specifically tested any spinlock code! I am contemplating writing him an email about it.
I mean, it is all well and good to reduce the number of accidental cache problems you cause with your code -- but maybe you should look at the *intentional* ones, too?!?
Much of the info was a good clarification for me. The biggest idea that surprised me was the concept of aligning functions in memory.
Oh, one more thing. There are some important notes here about hyperthreading. It's a little strage to me that he goes off on how a hyperthread may use up half your L1 cache ... and then he casually glosses over how it will usually also mess up half your TLB. But the TLB is a much smaller cache than L1, and a TLB "miss" is soooooo much more expensive than an L1 miss (according to his very own data)! That one fact alone would almost force you to abandon hyperthreading as useless ... EXCEPT, then he goes on to suggest a really inspired way to always use hyperthreading (when it's available) in a very generic context as a program-controlled virtual memory prefetch tool, with very little overhead. It seems like such a good idea that I'm already expecting to implement it.
It is very specific to x86 family processors, though. And sections 6 and 7 were very specific to Linux users -- not too useful (except as examples) to osdever's who are not writing Linux apps.
This article demonstrates why I abhor traditional spinlocks. I find it quite sad that the author did all this testing on cache contention (RFO invalidation) on list tracing code, and never specifically tested any spinlock code! I am contemplating writing him an email about it.
I mean, it is all well and good to reduce the number of accidental cache problems you cause with your code -- but maybe you should look at the *intentional* ones, too?!?
Much of the info was a good clarification for me. The biggest idea that surprised me was the concept of aligning functions in memory.
Oh, one more thing. There are some important notes here about hyperthreading. It's a little strage to me that he goes off on how a hyperthread may use up half your L1 cache ... and then he casually glosses over how it will usually also mess up half your TLB. But the TLB is a much smaller cache than L1, and a TLB "miss" is soooooo much more expensive than an L1 miss (according to his very own data)! That one fact alone would almost force you to abandon hyperthreading as useless ... EXCEPT, then he goes on to suggest a really inspired way to always use hyperthreading (when it's available) in a very generic context as a program-controlled virtual memory prefetch tool, with very little overhead. It seems like such a good idea that I'm already expecting to implement it.
It is very specific to x86 family processors, though. And sections 6 and 7 were very specific to Linux users -- not too useful (except as examples) to osdever's who are not writing Linux apps.