dma vs cpu transfers
dma vs cpu transfers
is a dma transfer from the HDD to memory faster than a cpu transfer? why?
Re: dma vs cpu transfers
Hi,
For latency; for hard drives there are several different DMA modes (all with different speeds) and several different PIO modes (all with different speeds). A fast method of PIO is going to be faster than a slow method of DMA, and a fast method of DMA is going to be faster than a slow method of DMA.
For throughput; (assuming you're not using obsolete PIO modes or obsolete DMA modes) the main bottleneck is how quickly the hard drive can read data from the disk's surface into its internal buffer. How quickly data can be transferred from the hard drive's internal buffer to the computer's RAM is mostly irrelevant.
For efficiency; you waste an entire CPU for however long it takes to do a PIO transfer, and that CPU could be doing something useful instead. This severe lack of efficiency can cripple the performance of an OS (especially if there aren't very many CPUs). The most efficient way is to use DMA (where both CPU and hard disk can be doing useful work at the same time).
Cheers,
Brendan
Note that "faster" can mean several different things - latency (e.g. time between asking to start reading and the data starting to arrive), throughput (e.g. how many MiB per second for very large reads) and efficiency (overall effect on a system's performance).icealys wrote:is a dma transfer from the HDD to memory faster than a cpu transfer? why?
For latency; for hard drives there are several different DMA modes (all with different speeds) and several different PIO modes (all with different speeds). A fast method of PIO is going to be faster than a slow method of DMA, and a fast method of DMA is going to be faster than a slow method of DMA.
For throughput; (assuming you're not using obsolete PIO modes or obsolete DMA modes) the main bottleneck is how quickly the hard drive can read data from the disk's surface into its internal buffer. How quickly data can be transferred from the hard drive's internal buffer to the computer's RAM is mostly irrelevant.
For efficiency; you waste an entire CPU for however long it takes to do a PIO transfer, and that CPU could be doing something useful instead. This severe lack of efficiency can cripple the performance of an OS (especially if there aren't very many CPUs). The most efficient way is to use DMA (where both CPU and hard disk can be doing useful work at the same time).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: dma vs cpu transfers
Why don't you ask your professor these questions?
Re: dma vs cpu transfers
in the DMA burst mode, I was looking at stackoverflow and they said that the cpu is put on hold in this mode. What would be the advantage of using this mode? It would still be slower in latency than PIO right?
Also, how does the dma controller read and write to memory using this mode? Would it read a 4kb block from the HDD, store it in its cache, and then write to memory one byte at a time? I say one byte at a time because it has a count register that keeps track of the number of bytes written to memory and it decrements from the total amount of bytes until it reaches 0.
Also, how does the dma controller read and write to memory using this mode? Would it read a 4kb block from the HDD, store it in its cache, and then write to memory one byte at a time? I say one byte at a time because it has a count register that keeps track of the number of bytes written to memory and it decrements from the total amount of bytes until it reaches 0.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: dma vs cpu transfers
Nope. Because you did not read.
Re: dma vs cpu transfers
what are you referring to?
Re: dma vs cpu transfers
Hi,
For example, disk drive might read 1 sector from the disk into the drive's cache (where 1 sector might be 512 bytes or 4 KiB), and then start reading the next sector. While it's reading the next sector, it'd transfer data from its cache to the computer's RAM by using the scatter-gather list to figure out where the data needs to go, then breaking it up into pieces (e.g. 64 byte bursts), then sending a series of "sector_size / burst_size" bursts across the PCI bus.
Finally; please understand that there are many buses (not just one). You might have a bus that connects CPUs, memory controller and a PCI host controller (a "CPU bus"); then a hierarchy of "PCI sub-buses"; then smaller buses for things like USB and LPC below that. Different buses (and different PCI sub-buses) are connected by some sort of controller that forwards traffic from one bus to another. For example; a disk controller might send bursts on its PCI sub-bus that get forwarded to a different PCI sub-bus by a "PCI to PCI bridge"; and then gets forwarded by a "PCI host controller" to the "CPU bus", and then get forwarded to the RAM chips by a memory controller.
Different buses can run at different speeds, and the bus that connects CPUs, memory controller and PCI host controller is likely to be a lot faster than any PCI sub-bus. What this means is that if the disk controller manages to use all of the bandwidth of a PCI sub-bus, then that traffic might only use a fraction of the "CPU bus" bandwidth (because the "CPU, memory controller and PCI host controller bus" is a lot faster than any PCI sub-bus).
Cheers,
Brendan
If they said the CPU is on hold, they're wrong. The CPU can't access that bus while a burst is happening, but it can still do processing and access its own caches. Also note that (assuming PCI) a large transfer would be broken into many relatively small bursts - if the CPU needs to access the bus then it might not have to wait (in between bursts) or might have to wait for the current burst to end (and not wait until the entire transfer finishes). Don't forget that a CPU will prefetch when it can (if a prefetch is delayed a little then the data can still arrive in CPUs caches before the CPU actually needs it) and most CPUs do "out of order execution" (where if some instructions can't be done because they depend on memory accesses that are stalled then the CPU can still do other instructions).icealys wrote:in the DMA burst mode, I was looking at stackoverflow and they said that the cpu is put on hold in this mode. What would be the advantage of using this mode?
For reading data from disk; for DMA the disk can start transferring the data as soon as it arrives (this is the fastest/lowest latency). For PIO the disk controller has to send an IRQ to the CPU, then wait for the CPU to acknowledge the IRQ, then wait for the IRQ handler to figure out what's going on and start requesting the data. All that waiting means that it takes longer before the data starts to be transferred for PIO.icealys wrote:It would still be slower in latency than PIO right?
In which way do the words "burst transfer" make you think "one byte at a time"? Surely you'd be expecting a burst of multiple bytes...icealys wrote:Also, how does the dma controller read and write to memory using this mode? Would it read a 4kb block from the HDD, store it in its cache, and then write to memory one byte at a time?
What exactly are you talking about? This sounds like the ancient ISA DMA controllers that nobody has used for hard disk transfers for about 30 years. If it is, forget about it - it's completely useless (excruciatingly slow compared to modern CPU and disk speeds, and no longer connected to the disk controller or usable for disk transfers in any way at all). Modern systems use PCI bus mastering and scatter-gather lists for DMA.icealys wrote:I say one byte at a time because it has a count register that keeps track of the number of bytes written to memory and it decrements from the total amount of bytes until it reaches 0.
For example, disk drive might read 1 sector from the disk into the drive's cache (where 1 sector might be 512 bytes or 4 KiB), and then start reading the next sector. While it's reading the next sector, it'd transfer data from its cache to the computer's RAM by using the scatter-gather list to figure out where the data needs to go, then breaking it up into pieces (e.g. 64 byte bursts), then sending a series of "sector_size / burst_size" bursts across the PCI bus.
Finally; please understand that there are many buses (not just one). You might have a bus that connects CPUs, memory controller and a PCI host controller (a "CPU bus"); then a hierarchy of "PCI sub-buses"; then smaller buses for things like USB and LPC below that. Different buses (and different PCI sub-buses) are connected by some sort of controller that forwards traffic from one bus to another. For example; a disk controller might send bursts on its PCI sub-bus that get forwarded to a different PCI sub-bus by a "PCI to PCI bridge"; and then gets forwarded by a "PCI host controller" to the "CPU bus", and then get forwarded to the RAM chips by a memory controller.
Different buses can run at different speeds, and the bus that connects CPUs, memory controller and PCI host controller is likely to be a lot faster than any PCI sub-bus. What this means is that if the disk controller manages to use all of the bandwidth of a PCI sub-bus, then that traffic might only use a fraction of the "CPU bus" bandwidth (because the "CPU, memory controller and PCI host controller bus" is a lot faster than any PCI sub-bus).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.