Fixed: ATA DMA bugs

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Fixed: ATA DMA bugs

Post by Octacone »

Hi,

I'm in the process of writing a real deal (no cheap hacks, trying to do everything per spec) ATA(PI) DMA driver.
After fixing endless bugs, there are still 2 bugs left, that I can't seem to tackle on my own.

1."Bug": Issuing 0xEF (set features command) command seems to break VirtualBox:
0xC8 (read sectors DMA) succeeds , interrupt fires, no errors, RDY and SRV bits are set, as soon as the interrupt fires I print the transfer buffer and there is nothing in it -> then my code returns and waits for some time and I read it again and suddenly there is data in it, like whaaat??
If I completely remove the code from below, everything works, as soon as the interrupt fires the buffer gets printed and there is expected data in it, SRV and RDY bits are set.

Code: Select all

...
Select_Device(ATA_devices[i], 0, true);
Hardware::Outportb(ATA_devices[i].base_address + 1, 0x03);
uint8_t DMA_mode = 0;
if(ATA_devices[i].ultra_DMA_mode_supported == true)
{
    DMA_mode = 0x08 << 3;
    DMA_mode |= ATA_devices[i].ultra_DMA_mode;
}
else if(ATA_devices[i].multiword_DMA_mode_supported == true)
{
    DMA_mode = 0x04 << 3;
    DMA_mode |= ATA_devices[i].multiword_DMA_mode;
}
Hardware::Outportb(ATA_devices[i].base_address + 2, DMA_mode);
Hardware::Outportb(ATA_devices[i].base_address + 7, 0xEF);
if(Wait_Specific(ATA_devices[i], 6, 1500) == true)
{
    uint8_t status_byte = Hardware::Inportb(ATA_devices[i].base_address + 1);
    if((status_byte & (1 << 2)) != 0)
    {
        error msg...
    }
}
else
{
    error msg...
}
...
2.Bug: ignoring the aforementioned code, my driver works perfectly on Qemu, Bochs and VirtualBox, but it doesn't work on real hardware -> interrupt never fires, no errors, RDY, SRV and DRQ bits are set
I've read every possible topic on this forum, but couldn't find a solution.

What I've tried:
nIEN -> check
PCI bus mastering bit -> check
PIC -> check
direction bit -> check
start DMA transfer bit -> check
waiting -> check
PRDT and transfer buffer alignment -> check
proper device select -> check
clearing error and interrupt bit -> check
zeroing out memory -> check
proper drive info detection -> check
proper dma info detection -> check
...
What did I miss?
Last edited by Octacone on Sat Feb 02, 2019 12:18 pm, edited 1 time in total.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: ATA DMA bugs

Post by nullplan »

Octacone wrote:0xC8 (read sectors DMA) succeeds , interrupt fires, no errors, RDY and SRV bits are set, as soon as the interrupt fires I print the transfer buffer and there is nothing in it -> then my code returns and waits for some time and I read it again and suddenly there is data in it, like whaaat??
Whenever stuff like this happens, I wonder whether there's a cache problem. Does it start working if you issue a wbinvd instruction in the interrupt handler? If so, you might want to set a region of memory to uncacheable, cache disable, write-combining, or write-combining+, and allocate your DMA buffers out of there.
Carpe diem!
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: ATA DMA bugs

Post by Octocontrabass »

If there's a cache problem, you don't have to disable the caches to solve it. Caches are coherent by default on x86 PCs.

Personally, I'd add some tracing code to the driver to track what it's reading from and writing to the various registers. It can be hard to spot logic bugs until you see your code doing something it shouldn't.
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: ATA DMA bugs

Post by BenLunt »

Just a thought, though I think you probably know it already, but check that you are reading the status from the ALT_STATUS and ATA_STATUS correctly. For example, the ALT_STATUS won't clear the INT Pending while the ATA_STATUS will. If you don't read from the ATA_STATUS, the hardware will never see that you have acknowledged the interrupt and it will not fire another. Likewise, if you do read from the ATA_STATUS, clearing the INT Pending, and you are not ready to receive an interrupt yet, this my throw you in the wrong direction.

Check that you are reading from the correct status register in all places you check the status.

Also, some hardware insists that you read from the ERROR register after certain commands, such as the Identify command.

Ben
- http://www.fysnet.net/media_storage_devices.htm
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: ATA DMA bugs

Post by nullplan »

Octocontrabass wrote:If there's a cache problem, you don't have to disable the caches to solve it. Caches are coherent by default on x86 PCs.
Between processors, yes. But here, the memory is changed by external hardware. To my knowledge the cache coherency protocol does not take updates by external hardware into account, though I'll have a read of the manual again for this one... well that told me all of nothing. There is something in there about the cache coherency protocol, but whether the PCI host bridge takes part in it or not is a matter of interpretation. Awesome. :roll:
Carpe diem!
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: ATA DMA bugs

Post by Octacone »

I don't think this is cache related, since there exists a topic about that specific question that suggests that everything on x86 (including DMA) should be cache coherent. Also wbinvd didn't do anything.

I think I'm onto something:
Qemu and Bochs fire one interrupt while Virtual Box fires TWO interrupts. So the reason being why I wasn't seeing anything in the buffer is that I was dumping memory at the wrong time (wrong interrupt).
When I read from ATA_STATUS two interrupts occur (in Virtual Box, Qemu and Bochs still fire only one) (also the first interrupt has the DRQ bit set), but if I read from ALT_STATUS only one interrupts occurs (in Virtual Box) and it's not the one I need.

Unrelated:
Are you people having troubles accessing this page, while writing this it timed out two times so I had to write this all over again + it is really slow to load. :|

Edit: still haven't figured it out...
Last edited by Octacone on Sun Jan 27, 2019 1:36 pm, edited 1 time in total.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: ATA DMA bugs

Post by nullplan »

Octacone wrote:I don't think this is cache related, since there exists a topic about that specific question that suggests that everything on x86 (including DMA) should be cache coherent. Also wbinvd didn't do anything.
That last one already disproves my hypothesis. So I guess the external hardware does participate in the cache coherency protocol.
Octacone wrote:Unrelated:
Are you people having troubles accessing this page, while writing this it timed out two times so I had to write this all over again + it is really slow to load. :|
Sometimes, but not in the last few days...
Carpe diem!
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: ATA DMA bugs

Post by Octacone »

Bump, any ideas?
Still no interrupts on real hardware.
Virtual Box still generates two interrupts instead of one.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: ATA DMA bugs

Post by BenLunt »

Maybe just a few more ideas:

Do you read from the ATA_STATUS register after a reset just to make sure the interrupt pending is clear? If not, the drive (controller) might be waiting for this read before it fires another.

Is the drive an ATA or ATAPI device? If it is an ATAPI device, do you send both the ATA *and* ATAPI Identify commands. Some drives expect you to first send the ATA Identify so it can purposely fail so that you can try the ATAPI Identify command. This is used to make sure it is an ATAPI device.

I have found that some devices must be reset after detection routines whether the device itself requires it, or I am missing something and the reset fixes it. Try resetting the device just before you read from it, just to check to see if that is the case. (You should not have to reset the device each time you try to read from it, so this is just a check)

Are you detecting an interrupt after the Packet Command (before actually sending the packet) as well as expecting an interrupt when the data is ready? I will have to check my notes to be sure, but there is no interrupt after the Packet Command and before the sending of the packet data.

Hope this helps,
Ben
- http://www.fysnet.net/media_storage_devices.htm

Edit: For another note, see my errata at: http://www.fysnet.net/docs/vol3_errata.pdf, page 2, "Waiting for DMA transfers"
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: ATA DMA bugs

Post by Octacone »

BenLunt wrote:Maybe just a few more ideas:

Do you read from the ATA_STATUS register after a reset just to make sure the interrupt pending is clear? If not, the drive (controller) might be waiting for this read before it fires another.

Is the drive an ATA or ATAPI device? If it is an ATAPI device, do you send both the ATA *and* ATAPI Identify commands. Some drives expect you to first send the ATA Identify so it can purposely fail so that you can try the ATAPI Identify command. This is used to make sure it is an ATAPI device.

I have found that some devices must be reset after detection routines whether the device itself requires it, or I am missing something and the reset fixes it. Try resetting the device just before you read from it, just to check to see if that is the case. (You should not have to reset the device each time you try to read from it, so this is just a check)

Are you detecting an interrupt after the Packet Command (before actually sending the packet) as well as expecting an interrupt when the data is ready? I will have to check my notes to be sure, but there is no interrupt after the Packet Command and before the sending of the packet data.

Hope this helps,
Ben
- http://www.fysnet.net/media_storage_devices.htm

Edit: For another note, see my errata at: http://www.fysnet.net/docs/vol3_errata.pdf, page 2, "Waiting for DMA transfers"
Writing this reply again... :cry: why do Linux Wi-Fi drivers suck so badly, this is really frustrating... Having to physically reconnect the adapter every x minutes...

I managed to fix the first bug, now my code works on all three emulators. It was fixed by reading the ATA_STATUS register after sending a SET_FEATURES command.

As per suggestion, I added this code after every command + device reset:

Code: Select all

uint8_t error_register = Hardware::Inportb(ATA_device.base_address + 1);
uint8_t status_register = Hardware::Inportb(ATA_device.base_address + 7);
Just to be sure.

Currently my code only supports detection and initialization of ATAPI devices, doesn't do anything else with them, until this is fixed.

What exactly do you mean by detection routines? Are you talking about the IDENTIFY command?

While reading your errata I noticed something interesting:

Code: Select all

I think the idea of setting the BSY bit and clearing the DRQ bit is backwards
to what should happen, though you have to watch for it anyway since it is in
the specification.
This is what happens on real hardware:
Before issuing 0xC8:
BSY - set, DRQ - clear
After issuing 0xC8:
BSY - clear, DRQ - set
No data read, no interrupts.

Right now I'm checking for BSY to become clear, maybe I should wait for DRQ to become set?

Also what should my interrupt handler look like? Do I need to read the ATA_STATUS or the ALT_STATUS register?
There must be a catch, something small and stupid that prevents my code from firing interrupts on real hardware.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: ATA DMA bugs

Post by Octocontrabass »

Octacone wrote:Also what should my interrupt handler look like? Do I need to read the ATA_STATUS or the ALT_STATUS register?
You need to read ATA_STATUS to acknowledge the IRQ. You should do that exactly once for each IRQ. Use ALT_STATUS if you're not responding to an IRQ or if you need to read the status register more than once in response to an IRQ.
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: ATA DMA bugs

Post by BenLunt »

As Octocontrabass states, you should read from the ATA_STATUS register only to acknowledge the interrupt and read from the ALT_STATUS each time there after.

As for the other questions, after you have reset the drive, you should read from the error register. Some (maybe most) controllers will expect this.

Also, when I stated detection, yes, I mean detecting whether the drive is ATA or ATAPI. An ATAPI drive will purposely fail an ATA IDENTIFY command and might/may expect this command before the ATAPI IDENTIFY command is sent.

- Ben
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: ATA DMA bugs

Post by Octacone »

Success! After more than 10 days of bug hunting, it's finally fixed!

After cross-referencing some code I noticed that instead of setting bits 1 and 2 in the bus master status register, I was clearing them.
Because I've read somewhere on the wiki that I should CLEAR them. What I didn't know (didn't say) was that in order to "clear" those bits you have to set them...
Somebody who knows how to edit a wiki should fix that article to prevent future confusion.

The other thing I noticed was that DRQ bit I was talking about all the time. It was fluctuating depending on whether I was testing my code on an emulator or on real hardware.
This time around I tried waiting for the BSY bit to clear and then waiting for the DRQ bit to set, and it magically worked on real hardware. But this broke my emulators so I have to switch it around when running emulated.

Thanks for all the comments.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: ATA DMA bugs

Post by BenLunt »

Yes, those bits are called Write Clear (WC) bits. In other words, you write to them to clear them. This is a must since you might need to write to this register *without* acknowledging the interrupt (writing a zero), then later acknowledging the interrupt by writing a 1 to bit 2.

Sorry, I should have remembered that with DMA transfers, you must also write to this register. I completely forgot. My mind isn't as sharp as it use to be :-)

Also, all of this just proves that emulators aren't as accurate as they should be. The emulators should also not fire an interrupt until this bit is written to.

As for the last point you make, it will be difficult for your testing if you have to always change the code, by either a #define or a command line parameter, whether you are running on an emulator or real hardware. What I would do is find out if what you are doing on real hardware is documented as correct and that that same technique is in error for the emulator, or exactly the opposite. If it is documented for hardware, but in error on an emulator, then the maintainers of the emulator might fix it if you can prove it.

Case in point: https://bugs.launchpad.net/qemu/+bug/1608802

- Ben
http://www.fysnet.net/media_storage_devices.htm
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Fixed: ATA DMA bugs

Post by Octacone »

Don't worry, you can't always remember everything, there is just too much stuff to remember.
As for the setting/clearing issue, I decided to wait for the DRQ bit to set and wait a certain amount of time, if it doesn't set I just continue the execution. I tested it out and it works everywhere.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
Post Reply