Page 1 of 1

Hardware: How does CPU know to which RAM stick to use

Posted: Wed May 22, 2019 7:42 am
by StjepanV
Hi!

Maybe I'm asking a really stupid question, but I'm trying to understand how does CPU know to which stick of RAM to write which addresses...
Example: For we have two sticks of RAM 2 x 1GB. Who is responsible to know that anything above 1GB should be stored on second stick of RAM? Is it done automagically by chipset for example?

Excuse my ignorance :)

Re: Hardware: How does CPU know to which RAM stick to use

Posted: Wed May 22, 2019 9:15 am
by Schol-R-LEA
Quite all right, you are trying to learn, and it is actually an interesting question.

Unfortunately, I personally don't have an answer beyond "it is done by the chipset". Actually, even that isn't strictly right - it is done by the memory addressing hardware, which is independent of the chipset proper and which the chipset itself is dependent upon - but I really don't know the details. Some of the details I can't know, as they would be specific to the particular motherboard, but in this instance, I don't even have a general outline to give for it. Hopefully, someone else here knows more about this than I do.

I will tell you a few things which may help, however. Comments and corrections welcome.

For modern PC-class systems, the CPU is not generally accessing the memory directly, but through the cache, which tries to predict which memory pages are going to be needed by the CPU core proper, in order to fetch them ahead of time and keep them available in the fast cache memory (with level 1 cache usually being the fastest but smallest; level 2 being larger but slightly slower because of light-speed delay - the signals have to travel farther; L3 slightly slower still but significantly larger; and so on for systems with more elaborate caching). It is possible to disable caching (but it is a really bad idea), and a cache miss at the lowest level cache will of course cause it to fetch from off-chip memory, but for the most part, the CPU deals with the RAM indirectly.

The important thing to see here is that the off-chip memory isn't accessed on a byte-per-byte basis, as a rule, but in pages (or blocks of pages, more likely). That is to say, if you are accessing an address in a given page, and it isn't in cache already, then the CPU will fetch that whole page, not just the specific byte or word you needed.

Even with caching off, the memory subsystem always fetches as much memory as the bandwidth allows, since the connection is a parallel one - you can't just fetch one byte, it always has to be the size of the memory bus width. In modern systems, I expect that this is either 128 or 256 bits, as the buses are optimized for transporting pages and page blocks, not individual bytes. You, as the software developer, wouldn't see this (as it is all handled in the hardware), but IIUC, that's what is actually going on.

I can also say that, in most modern systems, the individual DIMMs are not mapped to separate address ranges, at least not if the DIMMs are in paired slots to allow dual-channel access. It is my understanding - and I welcome corrections on this - that for motherboards with dual-channel memory support (which is more or less all of them today), if you have two DIMMs in paired slots (for example, slots 0 and 2, or slots 1 and 3), then the memory will be 'striped' across the two DIMMs to allow greater parallel bandwidth. This means that the page you would be fetching is actually spread across both DIMMs, with the actual order in which they are stored (e.g., alternating words, sub-page sections, alternating pages) depending on the specific hardware, I think (again, any corrections or clarifications would be welcome).

Note that it is possible to use a single DIMM with most motherboards, and some motherboards will work using two DIMMs even if the memory isn't paired (as in the infamous Verge build video last September, where the person giving the presentation made this mistake), in which case the memory would be accessed in single channel mode (i.e., roughly half the speed of dual channel). There are also motherboards (mostly for server systems) which support quad-channel, hex-channel, or even octo-channel memory, in which case the memory would again need to be banked appropriately to benefit from it.

Or at least, this is how it works on a stock x86 PC. None of this would apply to a System-on-Chip, in which the memory is in the same package as the CPU, so the details of how the CPU cores and the caching interact with the memory may be wildly different. Even in those single-board computers with separate off-chip memory, if the memory is incorporated directly into the SBC, then the memory access subsystem is likely to be specific to the SBC and have little in common with one for a PC.

Re: Hardware: How does CPU know to which RAM stick to use

Posted: Wed May 22, 2019 3:32 pm
by zaval
I have a very basic experience with this - I have been able to initialize SDRAM on one SBC, MIPS one. Still, there is a lot of gaps in my knowledge, especially regarding DDR itself. But anyway, what you ask is decided by the memory controller. Firmware, while running in SRAM yet, programs it and it knows what to address. For example, the SoC, I mentioned, has 1GB of DDR3 attached, single channel, it's probably 4 memory array chips. There is two sets of registers in the controller, responsible for covering some range of memory. That SBC uses just one set, that means, that all lines to the memory arrays are connected to this set. There is MMAP0 and MMAP1 registers with BASE and MASK subregisters. Programming them, you set up what this controller would accept as a valid request and send it to the DDR chip.

Code: Select all

/* Memory base. Base=0x00, Mask=0x80 */
ori $t1, $zero, 0x0080	/* MMAP0: BASE|MASK */
sw $t1, DDRC_MMAP0($s0)
ori $t2, $zero, 0xff00	/* MMAP1: BASE|MASK */
sw $t2, DDRC_MMAP1($s0)
Meaning, the highest byte of the address ADDR in the request to the DDR controller (coming from outside of it through the system interconnect bus, in this case it's AXI) is valid only if ADDR & MASK == BASE. If not used, then you set it to FF and 00 respectively, as seen in the above code for MMAP1 (not used). these values will fail every request. Should this SBC have more memory, the SoC supports ~3GB, MMAP1 would be used two, you would program it resectively, meaning some range goes to MMAP0, the other one - through MMAP1. MMAP0 would read/write one arrays, MMAP1 the other one. Dual channel controllers are just out of my knowledge, but apparently the main idea is requests are parallelized across memory arrays, they are interleaved, making them faster. I hope to get an SBC with the dual channel DDR controller (RockPro64) and hope in future to get it how does it work. It's all very DDR controller specific though.

Re: Hardware: How does CPU know to which RAM stick to use

Posted: Tue May 28, 2019 12:17 pm
by SpyderTL
As I've mentioned in other threads recently, the CPU "core" (the part that actually runs your code) really only has two interface points with the rest of the system, and the outside world. One is the memory bus, and the other is the I/O bus. It can read from, or write to, either of these ports, but that's about it. It simply sets the appropriate pins to request an "address" on the appropriate port, and waits for someone on the other end to respond.

Everything else is handled externally, although over the years, a lot of "external" components have moved off of the motherboard, and onto the CPU die, itself, to improve performance. But other than physical location, not much has changed.

As long as the components between the CPU core and the system RAM are working properly, it doesn't really matter how many physical RAM boards or chips are there. Either the RAM boards are smart enough to know what addresses to respond to, or the components between the CPU and the RAM boards do all of the work, and the RAM board will only see signals that it needs to respond to. I'm not sure how modern RAM boards work, but it's one or the other.

Re: Hardware: How does CPU know to which RAM stick to use

Posted: Tue May 28, 2019 1:19 pm
by linguofreak
Schol-R-LEA wrote:Quite all right, you are trying to learn, and it is actually an interesting question.

Unfortunately, I personally don't have an answer beyond "it is done by the chipset". Actually, even that isn't strictly right - it is done by the memory addressing hardware, which is independent of the chipset proper and which the chipset itself is dependent upon - but I really don't know the details. Some of the details I can't know, as they would be specific to the particular motherboard, but in this instance, I don't even have a general outline to give for it. Hopefully, someone else here knows more about this than I do.

I will tell you a few things which may help, however. Comments and corrections welcome.

For modern PC-class systems, the CPU is not generally accessing the memory directly, but through the cache, which tries to predict which memory pages are going to be needed by the CPU core proper, in order to fetch them ahead of time and keep them available in the fast cache memory (with level 1 cache usually being the fastest but smallest; level 2 being larger but slightly slower because of light-speed delay - the signals have to travel farther; L3 slightly slower still but significantly larger; and so on for systems with more elaborate caching). It is possible to disable caching (but it is a really bad idea), and a cache miss at the lowest level cache will of course cause it to fetch from off-chip memory, but for the most part, the CPU deals with the RAM indirectly.
All three levels of cache these days are generally on the CPU die, so light speed lag really isn't the issue for the differences in cache speeds, (it will be more significant in dealing with RAM). The big issue is that memory fast enough to respond to the CPU quickly takes up a lot of die area, and is therefore expensive. So you have a small amount of L1 cache that can respond pretty much immediately, but takes up a lot of room on chip for its capacity, and then a larger amount of L2 cache that is more compact, but can't respond immediately, and then several megabytes of L3 cache that are even more compact with even higher latency.
The important thing to see here is that the off-chip memory isn't accessed on a byte-per-byte basis, as a rule, but in pages (or blocks of pages, more likely). That is to say, if you are accessing an address in a given page, and it isn't in cache already, then the CPU will fetch that whole page, not just the specific byte or word you needed.
The amount fetched isn't a whole page, it's generally what's called a cache line, whose size varies between CPU microarchitectures. I think 64 bytes is a fairly typical current figure.