AHCI: Have to insert random sleeps for real hardware
Posted: Mon Feb 22, 2021 8:59 am
Hi, today I spent about 6 hours trying to get my AHCI driver to work on all 3 of my laptops (all are from 2012-2016 range)
It ended up working on all 3 in the end, but now my code is sprinkled with random sleeps in different places and I'm trying to understand why.
I also have to mention that I try to implement it as closely to the spec as possible, without relying on BIOS for any kind of
HBA/port initialization.
1. If I don't reset the HBA and ports everything works out of the box of course (thanks BIOS).
2. If I reset the HBA, I have to wait for about 1 extra second for Phy interface to come back online on all ports after the HBA reset bit in GHC is cleared.
(My HBA reset code is written 1 : 1 according to the spec. and the spec doesn't mention the extra wait on top of the reset bit anywhere, why???)
3. I do check the staggered spin-up bit, as well as cold presence detection bit. I have one laptop, which supports staggered spin-up, and again I have to wait for X amount of time
before communication is established after setting the spin-up bit in port, and spec mentions no ways to verify that it's back online properly.
4. If I don't reset ports after resetting HBA on one specific laptop, ATA IDENTIFY hangs the port. (command issue bit is never cleared, no errors either) Why? Again, nothing in the spec.
(For some reason it also hangs the PS/2 emulation with it (lol), and I do perform BIOS handoff of course! Again 100% according to spec.)
5. I have to wait for about 500ms after enabling the DMA engines for a port. Again spec doesn't mention such delay is needed at all.
(And again my DMA engine enabling code is 100% spec compliant, as in I set the FRE bit first, then verify CR is off, then finally set ST).
I feel like inserting sleep is just a random hack that happened to work and maybe I missed some important bit that I have to check, or some other initialization I have to perform...
Any ideas as to where I could actually read about proper 100% safe AHCI initialization code without random sleep() sprinkled everywhere? Apparently not the AHCI specification.
I have tried reading the linux source code but it's obfuscated to an extent where its not really readable. It also implements random workarounds for different controllers, which is just too much for me atm.
UPD: I managed to get rid of sleep in all places but HBA reset and port reset. Those are still a mystery to me, and I don't understand what bit indicates a reset is fully complete
UPD2: After a few more hours of trying different delays 50ms seems to be the LCD between all 3 AHCI controllers. I guess i'll leave that at that for now.
As an example here's my code for resetting a port. (It's done after disabling the DMA engines in a different function)
It ended up working on all 3 in the end, but now my code is sprinkled with random sleeps in different places and I'm trying to understand why.
I also have to mention that I try to implement it as closely to the spec as possible, without relying on BIOS for any kind of
HBA/port initialization.
1. If I don't reset the HBA and ports everything works out of the box of course (thanks BIOS).
2. If I reset the HBA, I have to wait for about 1 extra second for Phy interface to come back online on all ports after the HBA reset bit in GHC is cleared.
(My HBA reset code is written 1 : 1 according to the spec. and the spec doesn't mention the extra wait on top of the reset bit anywhere, why???)
3. I do check the staggered spin-up bit, as well as cold presence detection bit. I have one laptop, which supports staggered spin-up, and again I have to wait for X amount of time
before communication is established after setting the spin-up bit in port, and spec mentions no ways to verify that it's back online properly.
4. If I don't reset ports after resetting HBA on one specific laptop, ATA IDENTIFY hangs the port. (command issue bit is never cleared, no errors either) Why? Again, nothing in the spec.
(For some reason it also hangs the PS/2 emulation with it (lol), and I do perform BIOS handoff of course! Again 100% according to spec.)
5. I have to wait for about 500ms after enabling the DMA engines for a port. Again spec doesn't mention such delay is needed at all.
(And again my DMA engine enabling code is 100% spec compliant, as in I set the FRE bit first, then verify CR is off, then finally set ST).
I feel like inserting sleep is just a random hack that happened to work and maybe I missed some important bit that I have to check, or some other initialization I have to perform...
Any ideas as to where I could actually read about proper 100% safe AHCI initialization code without random sleep() sprinkled everywhere? Apparently not the AHCI specification.
I have tried reading the linux source code but it's obfuscated to an extent where its not really readable. It also implements random workarounds for different controllers, which is just too much for me atm.
UPD: I managed to get rid of sleep in all places but HBA reset and port reset. Those are still a mystery to me, and I don't understand what bit indicates a reset is fully complete
UPD2: After a few more hours of trying different delays 50ms seems to be the LCD between all 3 AHCI controllers. I guess i'll leave that at that for now.
As an example here's my code for resetting a port. (It's done after disabling the DMA engines in a different function)
Code: Select all
void AHCI::reset_port(size_t index)
{
auto sctl = port_read<PortSATAControl>(index);
sctl.device_detection_initialization = PortSATAControl::DeviceDetectionInitialization::PERFORM_INITIALIZATION;
port_write(index, sctl);
static constexpr size_t comreset_delivery_wait = Time::nanoseconds_in_millisecond;
auto wait_begin = Timer::nanoseconds_since_boot();
auto wait_end = wait_begin + comreset_delivery_wait;
while (wait_end > Timer::nanoseconds_since_boot());
sctl = port_read<PortSATAControl>(index);
sctl.device_detection_initialization = PortSATAControl::DeviceDetectionInitialization::NOT_REQUESTED;
port_write(index, sctl);
wait_begin = Timer::nanoseconds_since_boot();
wait_end = wait_begin + comreset_delivery_wait;
auto ssts = port_read<PortSATAStatus>(index);
while (ssts.device_detection != PortSATAStatus::DeviceDetection::DEVICE_PRESENT_PHY) {
ssts = port_read<PortSATAStatus>(index);
if (wait_end < Timer::nanoseconds_since_boot())
break;
}
if (ssts.device_detection != PortSATAStatus::DeviceDetection::DEVICE_PRESENT_PHY)
runtime::panic("AHCI: Port physical layer failed to come back online after reset");
// Removing this will cause the driver to break on real hw
sleep::for_milliseconds(50);
m_hba->ports[index].error = 0xFFFFFFFF;
log("AHCI") << "successfully reset port " << index;
}