Page 1 of 1

DMA VS PIO

Posted: Tue Oct 09, 2007 7:43 pm
by stones3000
I wrote a ATA driver based on http://www.ata-atapi.com/mindrvr.htm. I tested the performances between DMA and PIO, and I already got DMA outperform sPIO. However, from my understanding, PIO should be faster than DMA because the CPU speed is faster than the DMA controller. Am I right?

How come DMA is faster than PIO? What's so special about DMA? Can someone explain me some hints?

Thanks!!!

Posted: Tue Oct 09, 2007 8:15 pm
by frank
On floppy drives the DMA controller transfers slower than in PIO mode but on hard drives it is the other way around. IIRC ata drives use a form of DMA called bus mastering where the ATA controller actually does the DMA transfer. So on ATA drives Ultra DMA as it is sometimes called can actually be faster than PIO.

Also I really don't know all that much about this but do you have to enable the higher PIO modes? Maybe you should look into that.

Posted: Tue Oct 09, 2007 8:49 pm
by stones3000
frank wrote: Also I really don't know all that much about this but do you have to enable the higher PIO modes? Maybe you should look into that.
]

Yes. The hard driver is enabled to the hightest PIO and DMA modes. The performance tests are based the hightest modes.

So, you are saying that the Bus Master DMA controller is embedded in the ATA driver itself and it is optimised to do the disk I/O transfers. That's why DMA is faster than PIO on ATA drives.

Re: DMA VS PIO

Posted: Tue Oct 09, 2007 10:58 pm
by Brendan
Hi,
stones3000 wrote:I wrote a ATA driver based on http://www.ata-atapi.com/mindrvr.htm. I tested the performances between DMA and PIO, and I already got DMA outperform sPIO. However, from my understanding, PIO should be faster than DMA because the CPU speed is faster than the DMA controller. Am I right?
In ancient times, there were no sector buffers built into the hard drive - getting data to/from the disks heads was the bottleneck. The hard disk controller used the ISA DMA controller because the ISA DMA controller was fast enough to handle the data coming to/from the disk heads and didn't consume CPU time like PIO.

CPUs got faster and hard disks got faster, but possibly more importantly, hard disk manufacturers started putting a sector buffer into hard disks. This meant that the data coming to/from the disk drives was no longer the bottleneck - the hard drive could slowly transfer data between the disk and the sector buffer, and quickly transfer data between the CPU/RAM and the sector buffer. This was much better for PIO because the CPU could quickly transfer data to/from the sector buffer (using some CPU time), rather than slowly transfering data to/from the disk heads (using much more CPU time). The ISA DMA couldn't keep up, and everyone started using PIO for all disk transfers.

However, PIO still wasted some CPU time. Eventually someone (Intel I think) started adding bus mastering to the hard disk controllers so that disk transfers didn't waste CPU time. Because bus mastering is built into the hard disk controller it's speed isn't limited (the speed of the ISA DMA controller is limited to the speed of the ISA bus IIRC) and it could transfer 32-bits of data (or more) in one transaction (the ISA DMA controller does 8-bit or 16-bit transfers only). At this stage (80486?) there probably wasn't much difference between PIO and DMA transfers, but everything keeps getting faster...

For modern systems, probably the biggest problem with PIO is that the CPU is accessing I/O ports. For a relatively slow device (e.g. RTC/CMOS) it can take a while for the device to act on an I/O port access (e.g. adjust it's internal state). To allow for this there's an I/O port access delay, where the chipset/CPU gives the device a little time between "back-to-back" I/O port accesses. This I/O port delay is typically determined by how quickly the slowest device can act - it's too complicated for chipsets to have seperate I/O port delays for each seperate device. This means that for a fast device the I/O delay can be too long or unnecessary.

For bus mastering no I/O port delays are needed - the device can transfer each piece of data whenever it is ready. Also, for (PCI) bus mastering the device can transfer bursts of data to reduce bus bandwidth usage (e.g. bursts of 64 bytes at a time). I/O port accesses are limited to 4-bytes per transfer, even in 64-bit systems.

BTW the I/O port access delay is worth taking into account when writing any device driver - often you can cache I/O port data in RAM. When you write to the I/O port you do an "OUT" instruction and write the value to a variable, so that when you need to read the value again you can read from the variable in RAM and avoid using an "IN" instruction (and avoid the I/O port delay). For an example, instead of doing this:

Code: Select all

setPICmasks:
    out PIC1+1,al
    shr ax,8
    out PIC2+1,al
    ret

getPICmasks:
    in PIC2+1,al
    shl ax,8
    in PIC1+1,al
    ret
You'd do this:

Code: Select all

setPICmasks:
    out PIC1+1,al
    mov [PICmaskShadow],ax
    shr ax,8
    out PIC2+1,al
    ret

getPICmasks:
    mov ax,[PICmaskShadow]
    ret
In this case, caching the value in RAM can make "getPICmasks" up to 2 ms faster (several million cycles for a modern CPU). Of course this doesn't work for things like status registers, etc that can change their value (but a lot of I/O ports don't change their value unless the CPU/software changes them)...


Cheers,

Brendan

Posted: Wed Oct 10, 2007 5:31 pm
by stones3000
Thanks Brendan. Your explanation is very detailed and understandable. It's hard to understand how hardware works these days without knowing the history. Thanks :-)

I also found two links explaining why Bus Mastering DMA is faster.
http://www.tweak3d.net/articles/howbusmaster/
http://www.pcguide.com/ref/hdd/if/ide/modesDMA-c.html