Hi,
stones3000 wrote:I wrote a ATA driver based on
http://www.ata-atapi.com/mindrvr.htm. I tested the performances between DMA and PIO, and I already got DMA outperform sPIO. However, from my understanding, PIO should be faster than DMA because the CPU speed is faster than the DMA controller. Am I right?
In ancient times, there were no sector buffers built into the hard drive - getting data to/from the disks heads was the bottleneck. The hard disk controller used the ISA DMA controller because the ISA DMA controller was fast enough to handle the data coming to/from the disk heads and didn't consume CPU time like PIO.
CPUs got faster and hard disks got faster, but possibly more importantly, hard disk manufacturers started putting a sector buffer into hard disks. This meant that the data coming to/from the disk drives was no longer the bottleneck - the hard drive could slowly transfer data between the disk and the sector buffer, and quickly transfer data between the CPU/RAM and the sector buffer. This was much better for PIO because the CPU could quickly transfer data to/from the sector buffer (using some CPU time), rather than slowly transfering data to/from the disk heads (using much more CPU time). The ISA DMA couldn't keep up, and everyone started using PIO for all disk transfers.
However, PIO still wasted some CPU time. Eventually someone (Intel I think) started adding bus mastering to the hard disk controllers so that disk transfers didn't waste CPU time. Because bus mastering is built into the hard disk controller it's speed isn't limited (the speed of the ISA DMA controller is limited to the speed of the ISA bus IIRC) and it could transfer 32-bits of data (or more) in one transaction (the ISA DMA controller does 8-bit or 16-bit transfers only). At this stage (80486?) there probably wasn't much difference between PIO and DMA transfers, but everything keeps getting faster...
For modern systems, probably the biggest problem with PIO is that the CPU is accessing I/O ports. For a relatively slow device (e.g. RTC/CMOS) it can take a while for the device to act on an I/O port access (e.g. adjust it's internal state). To allow for this there's an I/O port access delay, where the chipset/CPU gives the device a little time between "back-to-back" I/O port accesses. This I/O port delay is typically determined by how quickly the slowest device can act - it's too complicated for chipsets to have seperate I/O port delays for each seperate device. This means that for a fast device the I/O delay can be too long or unnecessary.
For bus mastering no I/O port delays are needed - the device can transfer each piece of data whenever it is ready. Also, for (PCI) bus mastering the device can transfer bursts of data to reduce bus bandwidth usage (e.g. bursts of 64 bytes at a time). I/O port accesses are limited to 4-bytes per transfer, even in 64-bit systems.
BTW the I/O port access delay is worth taking into account when writing any device driver - often you can cache I/O port data in RAM. When you write to the I/O port you do an "OUT" instruction and write the value to a variable, so that when you need to read the value again you can read from the variable in RAM and avoid using an "IN" instruction (and avoid the I/O port delay). For an example, instead of doing this:
Code: Select all
setPICmasks:
out PIC1+1,al
shr ax,8
out PIC2+1,al
ret
getPICmasks:
in PIC2+1,al
shl ax,8
in PIC1+1,al
ret
You'd do this:
Code: Select all
setPICmasks:
out PIC1+1,al
mov [PICmaskShadow],ax
shr ax,8
out PIC2+1,al
ret
getPICmasks:
mov ax,[PICmaskShadow]
ret
In this case, caching the value in RAM can make "getPICmasks" up to 2 ms faster (several million cycles for a modern CPU). Of course this doesn't work for things like status registers, etc that can change their value (but a lot of I/O ports don't change their value unless the CPU/software changes them)...
Cheers,
Brendan