I managed to fix my AHCI driver (I was using the wrong types) and now it can successfully read data (I think, anyway). It at least doesn't hang, though the way I do reading is a bit weird and I'm definitely going to see about reworking that sometime. Right now I'm trying to just make it possible for my AHCI driver to read data and find the boot signature (My repository has a couple bootable qemu images). My code is, as always, available at https://github.com/ethindp/kernel (that includes my two bootable images -- pi.vfd and linux-0.2.img -- as well as a blank disk, disk.img). When I run my OS, it prints out the first 512 bytes it reads from the disk. For linux-0.2.img, I get bytes like [35, 0, 0, 0, 0, 0, 0, 0, 99, 72, 0, 0, 99, 72, 0, 0, 0, 0, 0, 0, 0, 0, ...] (yes, that's in base 10). For pi.vfd I get the same thing, so clearly I'm not reading the disk properly. I'm ust confused on exactly where I'm going wrong.
I've tried a pointer to 0x1000 and then using an offset from that to read the data, I've tried reading data from a pointer and then into a slice (array)... I'm right now not sure what to try next or what's causing the problem.
Would someone mind poking through my (arguably messy) code to help me track down this problem?
AHCI driver failing to read data from disk
Re: AHCI driver failing to read data from disk
I tried building and running your code, but it hangs before it gets to the disk reading code.
There's a lot of code involved and, as you say, it's not exactly clear so I don't think I can spend more time on it.
What I would suggest is that you single-step through the relevant part of your code using gdb. If you examine the variables/buffers it should be obvious at which point the code is going wrong. It should be a good debugging exercise.
There's a lot of code involved and, as you say, it's not exactly clear so I don't think I can spend more time on it.
What I would suggest is that you single-step through the relevant part of your code using gdb. If you examine the variables/buffers it should be obvious at which point the code is going wrong. It should be a good debugging exercise.
Re: AHCI driver failing to read data from disk
Yeah, I was trying to get SMBIOS code to function (sort of, at least) since AHCI didn't work. If GDB will *actually* work this time (my OS -- as well as QEMU -- usually just ignores breakpoints, hardware or software, even with -s and -S) then I'll be able to figure it out. I'll remove SMBIOS for now . Sorry about that.
Update: GDB decided to play nice this time.
First, PI is set to 63 (bits 0-5 are set). Good thus far.
But when the port is retrieved, I think, is where things really start to go wrong. I don't know if this is correct or not, but this is what my first port looks like from GDB's perspective:
It all goes well until I get to ata_read. Within that function I clone the buffer address and turn it into a pointer. According to gDB, its now 0x10000200dbe. (Strangely its back to 0x1000 later on.)
I've just tried reading the memory on 4K boundaries, but that doesn't work (I get differing results compared to what I was getting before, but still no indication that the disk is bootable). I'll keep hacking at it.
Update: GDB decided to play nice this time.
First, PI is set to 63 (bits 0-5 are set). Good thus far.
But when the port is retrieved, I think, is where things really start to go wrong. I don't know if this is correct or not, but this is what my first port looks like from GDB's perspective:
Code: Select all
$11 = kernel::drivers::storage::ahci::internal::HbaPort {clb: 0x7fdfc00, clbu: 0x0, fb: 0x7fdfb00, fbu: 0x0, is: 0x0, ie: 0x0, cmd: 0xc017, rsv0: 0x0, tfd: 0x50, sig: 0x101, ssts: 0x113, sctl: 0x0, serr: 0x0, sact: 0x0, ci: 0x0, sntf: 0x0, fbs: 0x0, rsv1: [0x0 <repeats 11 times>], vendor: [0x0, 0x0, 0x0, 0x0]}
I've just tried reading the memory on 4K boundaries, but that doesn't work (I get differing results compared to what I was getting before, but still no indication that the disk is bootable). I'll keep hacking at it.
Re: AHCI driver failing to read data from disk
Update: OK, so no progress at all. Also, for some odd reason my OS keeps page faulting and I can't figure out why. (My VGA output gets spurious and weird too by the time this happens.) This is what it looks like:
I also thought that the strange output could be because the SMBIOS code was enabled (which might've been causing the page fault), but I've removed that completely and it still happens. The memory address is random too: I've seen it try to read to address 0x10 and 0x50. Though I have intel HDA code and SB16 code, the SB16 code is not integrated into the build yet and the HDA code does nothing useful right now. I'm not really sure what to try next... am I misunderstanding the way AHCI works? Am I misinterpreting the wiki and/or the specification?
Edit: just tried to map 0x1000-0x1200, but the kernel indicates that the pages for that address range are already mapped. I thought that since CR3 indicated 1000h I should try to map it but it appears that's already been done for me.
Edit 2: OK, so I just moved the HBA write/read buffer to the range 0x1000000-0x100c350. Then I forgot that I was allocating the buffer in paged memory space. I moved it to physical mappings and now I'm getting all zeros (I was getting all zeros before I moved it from paged to physical but after the move to the new address space). I raised the sector/LBA count to 64 to see if I wasn't catching the boot information at the immediate start of the disk but am still getting nothing. (I'm still reading only 512 bytes from the start of the buffer because I don't know my disk geometry and don't know how I'd even go about detecting that.) I also was reading the information on the Read DMA Ext command (INCITS 452-2009: AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS), section 7.25). In case the one on the wiki and this one differ, this one says:
So... I'm kinda stuck here. What should I try next?
This is incredibly confusing since my code does not print any symbols whatsoever (only ASCII characters). At first I thought that the page fault was because I enabled Spectre mitigation's via retpoline, but it does that even after I turn the features off. I've also debugged my kernel multiple times now and get the same output; my ATA read function appears to be working fine. The error, I think, is when the system is trying to access the data that the disk returns. I thought about dumping the memory and uploading it here but I don't think that would be very useful to anyone. I don't have memory analysis tools or skills myself -- at least, not of that kind.Kernel control a comm »@ ole (KC3)
Type 'he for ¦@ of commands
*** WA NG * ¦@ re currently oper ng i ¦@ mode. Remember t wit ¦@ comes
responsibil , an ¦@ e responsible for y da ¦@ at arise out of y
mis ¦@ his console, your shan +@ devices and port your +@ to
follow docume tion ¦@
3> Page fault: (empty)
Caused by read from memory address VirtAddr(0x10)
Stack frame: InterruptStackFrame {
instruction_pointer: VirtAddr(0x20a07d),
code_segment: 8,
cpu_flags: 0x2,
stack_pointer: VirtAddr(0x10000201210),
stack_segment: 0,
}
Registers:
RAX = 100002010A0 RBX = 1
RCX = 1 RDX = 23C313
RSI = 0 RDI = 100002010A0
RSP = 10000200578 RBP = 10
R8 = 0 R9 = 1
R10 = 10000200F38 R11 = 2490B0
R12 = 1 R13 = 24D060
R14 = 10000201238 R15 = 10000201530
RFLAGS = 12 CR0 = 80010613
CR2 = 10 CR3 = 1000
CR4 = 620 CR8 = 0
EFER = 0
Segments:
CS = 8 DS = 10
SS = 0 ES = 10
FS = 0 GS= 0
FSBASE = 0 GSBASE = 0
KERNELGSBASE = 0
I also thought that the strange output could be because the SMBIOS code was enabled (which might've been causing the page fault), but I've removed that completely and it still happens. The memory address is random too: I've seen it try to read to address 0x10 and 0x50. Though I have intel HDA code and SB16 code, the SB16 code is not integrated into the build yet and the HDA code does nothing useful right now. I'm not really sure what to try next... am I misunderstanding the way AHCI works? Am I misinterpreting the wiki and/or the specification?
Edit: just tried to map 0x1000-0x1200, but the kernel indicates that the pages for that address range are already mapped. I thought that since CR3 indicated 1000h I should try to map it but it appears that's already been done for me.
Edit 2: OK, so I just moved the HBA write/read buffer to the range 0x1000000-0x100c350. Then I forgot that I was allocating the buffer in paged memory space. I moved it to physical mappings and now I'm getting all zeros (I was getting all zeros before I moved it from paged to physical but after the move to the new address space). I raised the sector/LBA count to 64 to see if I wasn't catching the boot information at the immediate start of the disk but am still getting nothing. (I'm still reading only 512 bytes from the start of the buffer because I don't know my disk geometry and don't know how I'd even go about detecting that.) I also was reading the information on the Read DMA Ext command (INCITS 452-2009: AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS), section 7.25). In case the one on the wiki and this one differ, this one says:
I don't think the device is returning an error status (TFD doesn't say so) so...7.25 READ DMA EXT - 25h, DMA
7.25.1 Feature Set
This 48-bit command is mandatory for devices implementing the 48-bit Address feature set.
7.25.2 Description
The READ DMA EXT command allows the host to read data using the DMA data transfer protocol.
7.25.3 Inputs
Name | Description
Feature | Reserved
Count | The number of logical sectors to be transferred. A value of 0000h indicates that 65,536 logical sectors are to be transferred
LBA | LBA of first logical sector to be transferred
Device |
Bit | Description
7 | Obsolete
6 | Shall be set to one
5 | Obsolete
4 | Transport Dependent - See 6.2.11
3:0 | Reserved
Command | 7:0 25h
7.25.4 Normal Outputs
See table 111.
7.25.5 Error Outputs
If an unrecoverable error occurs while the device is processing this command, then the device shall return command completion with the Error bit set to one and the LBA field set to the LBA of the logical sector where the first unrecoverable error occurred. The validity of the data transferred is indeterminate. See table 131.
So... I'm kinda stuck here. What should I try next?
Re: AHCI driver failing to read data from disk
OK, so I've done some major code alterations (haven't submitted them yet to github). Now, here's what I do in my AHCI initialization function:
(1) I locate the BAR for AHCI (BAR 5) on the PCI bus.
(2) I enable AHCI by setting GHC.AE (bit 31).
(3) I determine what ports are available and begin setting them up. Here's where things get weird.
(4) In my loop:
(a) I check to see if the given port is enabled in the PI interface. If it is, then:
(i) I calculate the port address. According to the AHCI specification, v. 1.3.1, the algorithm (equation, really) is 0x100 + (bit position * 0x80). My BAR is FEBF5000h, so my port address for port 0, according to this equation, is FEBF5100h. Yet the address of the port in the memory structure is (somehow) FEBF5900h. I know that this isn't reading into vendor-specific registers because those registares are at memory addresses FEBF50A0h-FEBF50FFh.
(ii) I read P0CMD.ICC (bits 31:28). If its 0h, I request that the port transition to the active state, then wait until its been cleared, indicating that the port is idling.
(iii) I read the P0SSTS register and retrieve P0SSTS.IPM (bits 11:08) and P0SSTS.DET (bits 03:00) and verify that P0SSTS.DET == 3h and P0SSTS.IPM == 1h. If not, I skip the port.
(iv) I check the signature. I rebase the port and treat it as AHCI if I get a SATA signature.
(b) If the port is not enabled, I skip the port.
The weird thing is that port 0's signature is 11300000101h. Ports 1-5 are disabled. Can someone explain why this is and what I'm doing wrong?
Edit: just pushed all changes to github.
(1) I locate the BAR for AHCI (BAR 5) on the PCI bus.
(2) I enable AHCI by setting GHC.AE (bit 31).
(3) I determine what ports are available and begin setting them up. Here's where things get weird.
(4) In my loop:
(a) I check to see if the given port is enabled in the PI interface. If it is, then:
(i) I calculate the port address. According to the AHCI specification, v. 1.3.1, the algorithm (equation, really) is 0x100 + (bit position * 0x80). My BAR is FEBF5000h, so my port address for port 0, according to this equation, is FEBF5100h. Yet the address of the port in the memory structure is (somehow) FEBF5900h. I know that this isn't reading into vendor-specific registers because those registares are at memory addresses FEBF50A0h-FEBF50FFh.
(ii) I read P0CMD.ICC (bits 31:28). If its 0h, I request that the port transition to the active state, then wait until its been cleared, indicating that the port is idling.
(iii) I read the P0SSTS register and retrieve P0SSTS.IPM (bits 11:08) and P0SSTS.DET (bits 03:00) and verify that P0SSTS.DET == 3h and P0SSTS.IPM == 1h. If not, I skip the port.
(iv) I check the signature. I rebase the port and treat it as AHCI if I get a SATA signature.
(b) If the port is not enabled, I skip the port.
The weird thing is that port 0's signature is 11300000101h. Ports 1-5 are disabled. Can someone explain why this is and what I'm doing wrong?
Edit: just pushed all changes to github.