SpyderTL wrote:PIO mode is probably going to be slightly faster than DMA, unless the CPU is busy doing other things. The whole point of DMA is to allow the CPU to do other things while DMA copies data from the drive to system memory.
You probably either need to disable IRQs from the controller side and use PIO mode, or make sure your interrupt handler for that particular IRQ is working properly and use DMA mode. Using the CLI instruction disables interrupts from the CPU side, but the controller is still going to send interrupt requests, and it is still going to expect the CPU to notify it when the interrupt has been handled.
What hardware is your OS running on?
I disable the interrupts by setting nIEN bit in master control port 0x3F6, but it simply didn't have effect on speed or interrupts.
How can I disable interrupts fully in DMA mode?
I'm running on Bochs, QEMU and VMWare, I don't have computer for testing.
Brendan wrote:Hi,
Agola wrote:A problem that I have, it has same/lower speed with PIO. A 1.4 MB file takes ~3 seconds to read.
Is it normal? Why the DMA is as slow as PIO? I hope it is a VM based problem.
That seems odd to me. For PIO (assuming the disk image is in host OS's caches, which is quite likely given a typical OS developer's "edit, build, test" cycle); a VM has to emulate/execute all the instructions and that should/would be the performance bottleneck. For DMA, the VM can cheat and move all the data at once and (assuming disk image is in host OS's caches) the bottleneck could be host OS's RAM bandwidth.
In other words; I'd be tempted to expect DMA inside a VM to be significantly faster than both PIO and DMA on real hardware.
Agola wrote:Edit: Looks that is because of huge IRQ burst / spam. After doing a cli before 1.4 MB file takes ~0.7 seconds to read. But still far slower than ATA DMA speeds.
Where are these IRQs coming from? Are you only doing "single sector reads" (e.g. reading 2867 individual sectors and getting 2867 IRQs), or are you reading the largest amount possible (e.g. a single 1.4 MiB read that only causes one IRQ when it's finished)?
Note that the device driver probably should have a queue of pending operations; where the device driver's IRQ handler handles the end of the previous operation (checks if it succeeded or failed and notifies something - e.g. the rest of the device driver) and then immediately starts the next operation in the queue of pending operations (if there is one). The only case where an operation isn't started from within the IRQ handler would be if the is no operation in progress (the queue of pending operations is empty).
More specifically, imagine a sequence like this:
- TaskA asks to read something from disk, the request finds its way to storage device driver and gets added to the driver's queue of pending operations and started (because the queue was empty), and the scheduler blocks the task so it gets no CPU time and switches to a different task
- TaskB gets CPU time and does some stuff; then it also asks to read something from disk. The request finds its way to storage device driver and gets added to the driver's queue of pending operations (but not started because the disk controller is busy still), and the scheduler blocks the task so it gets no CPU time and switches to a different task.
- The disk controller's IRQ occurs indicating that the first operation completed. The IRQ handler notifies "something" which causes TaskA to be unblocked, then notices that there's a pending operation and starts that. The scheduler may or may not switch to TaskA immediately, and when TaskA does get CPU time it sees the data it was waiting for has arrived.
- The disk controller's IRQ occurs indicating that the second operation completed. The IRQ handler notifies "something" which causes TaskB to be unblocked, then notices that there's aren't any more pending operations. The scheduler may or may not switch to TaskB immediately, and when TaskB does get CPU time it sees the data it was waiting for has arrived.
The point here is that the disk controller is kept doing as much work as possible as soon as possible; while the CPU is also kept doing as much work as possible (executing tasks that aren't blocked waiting for IO).
Cheers,
Brendan
I'm doing multiple sector reads, size of ext2 block size, generally 2 sectors.
Even with all interrupts are disabled, an processing in single tasking mode, the read / write speeds are far far away from the expected speed, about 16.6 MB/s. The fastest I got about 2.4 MB/s.
And... What should I output to count port, the sector count or just 1?
outb(ATA_BASE + ATA_REG_SECTOR_COUNT0, 1);
or
outb(ATA_BASE + ATA_REG_SECTOR_COUNT0, sector_count);
This is how I initialize PRDT...
Code: Select all
uint32_t read_size = sector_count << sector_size; // sector_size is 9, to get fast multiplication with 512 bytes
prdt->size = read_size;
prdt->address = (uintptr_t) memalign(0xFFFF, read_size);
And that is the full code of read:
Code: Select all
uint32_t read_size = sector_count << sector_size; // sector_size is 9, to get fast multiplication with 512 bytes
prdt->size = read_size;
prdt->address = (uintptr_t) memalign(0xFFFF, read_size); //Memory is 64 KB aligned, can use max size without
outb(BUSMASTER_BASE + BUSMASTER_COMMAND, inb(BUSMASTER_BASE + BUSMASTER_COMMAND) & ~1); // Clear start/stop bit
outl(BUSMASTER_BASE + BUSMASTER_PRDT, (uintptr_t) prdt);
outb(BUSMASTER_BASE + BUSMASTER_COMMAND, inb(BUSMASTER_BASE + BUSMASTER_COMMAND) | 8); // Set read bit
outb(BUSMASTER_BASE + BUSMASTER_STATUS, inb(BUSMASTER_BASE + BUSMASTER_STATUS) & ~(0x04 | 0x02)); // Clear interrupt and error flags according to http://wiki.osdev.org/ATA/ATAPI_using_DMA#Standard_Order_of_Sending_Commands
outb(ATA_BASE + ATA_REG_DEV_SELECT, 0xE0 | (lba & 0x0f000000) >> 24);
outb(ATA_BASE + ATA_REG_SECTOR_COUNT0, sector_count);
outb(ATA_BASE + ATA_REG_LBA0, (lba & 0xFF));
outb(ATA_BASE + ATA_REG_LBA1, (lba & 0xFF00) >> 8);
outb(ATA_BASE + ATA_REG_LBA2, (lba & 0xFF0000) >> 16);
ata_wait_bsy();
outb(ATA_BASE + ATA_REG_COMMAND, ATA_CMD_READ_DMA);
outb(BUSMASTER_BASE + BUSMASTER_COMMAND, inb(BUSMASTER_BASE + BUSMASTER_COMMAND) | 1); // Set start/stop bit
busmaster_wait(); // Poll for interrupt bit to clear.
memcpy(buffer, prdt->address, read_size);
outb(BUSMASTER_BASE + BUSMASTER_COMMAND, inb(BUSMASTER_BASE + BUSMASTER_COMMAND) & ~0x1); // Clear start/stop bit
Thanks in advance
...
Another interesting thing is using BIOS (int 13h) with Virtual 8086 mode has nearly same speeds with my ATA DMA driver. I heard BIOS sets the fastest read mode and reads with it.
I started to suspect VM, because I'm running QEMU on Linux-guest VM on Windows