AHCI read issue

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
gusc
Member
Member
Posts: 50
Joined: Tue Feb 05, 2013 1:42 pm

AHCI read issue

Post by gusc »

I have a slight problem with understanding AHCI/SATA and all that. I try to do DMA read of the first 512 bytes from the SATA disk (on QEmu and VirtualBox), but I get nothing - no data has been read (all zeros as initialized). Also in VB it just freezes, and if I add a spin-lock to wait for some 100000 incrementations (it takes about 10 seconds) all the status bits show that command list is being processed (FRE, FR, CR, ST, etc. are set to 1)and still there is no data.

Also the AHCI and SATA specs take me nowhere as I was hoping to get some DMA interrupt on completion, but the only interrupts that I can configure and test are error interrupts. Am I missing something?

A little background note of my work here:
0. I'm booting from BIOS with my own bootloader (in which I'm trying to achieve AHCI support)
1. I've set up PAE4 and Long Mode
2. I've set up PIC and IDT and interrupt and IRQ service routines to debug any input
3. I've enumerated PCI bus from which I look up for a AHCI controller
4. I've enumerated all the SATA drives and ports
5. I'm trying the sample code from OSDev Wiki (http://wiki.osdev.org/AHCI#Example_-_Re ... sk_sectors)

On the side note I noticed one more strange thing with VirtualBox - once in my IRQ wrapper function I send EOI command for IRQ0 (and it's the only IRQ I receive for now) it just goes straight to GPF (there's no double fault or anything), so I masket out IRQ0. On the other hand it seems to work well with IRQ 1 (the keyboard).

Any ideas on what direction should I look?
gusc
Member
Member
Posts: 50
Joined: Tue Feb 05, 2013 1:42 pm

Re: AHCI read issue

Post by gusc »

MOTHER&%*#@ S%&# F%#& ... sorry I had to get it of my chest.

I'm sorry about the misleading and such a comprehensive topic, but as it is with OS development some bugs are really misleading and hard to debug.

So to boil it down, kids, remember - when working with interrupts - NEVER forget to push all the registers on the stack and then pop them back. You see, in 32bit mode there is a "pusha" and "popa", but in 64bit mode there isn't, so I made a mistake and instead of manually pushing all the registers, I did not do it at all.

So instead of these interrupt service routines:

Code: Select all

; Macro to create an intterupt service routine for interrupts that do not pass error codes 
%macro INT_NO_ERR 1
[global isr%1]
isr%1:
    cli                                         ; disable interrupts
    push qword 0                                ; set error code to 0
    push qword %1                               ; set interrupt number
    call isr_wrapper                            ; call void isr_wrapper(int_stack_t stack)
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro

; Macro to create an interrupt service routine for interrupts that DO pass an error code
%macro INT_HAS_ERR 1
[global isr%1]
isr%1:
    cli                                         ; disable interrupts
    push qword %1                               ; set interrupt number
    call isr_wrapper                            ; call void isr_wrapper(int_stack_t stack)
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro

; Macro to create an IRQ interrupt service routine
; First parameter is the IRQ number
; Second parameter is the interrupt number it is remapped to
%macro IRQ 2
[global irq%1]
irq%1:
    cli                                         ; disable interrupts
    push qword %1                               ; set IRQ number in the place of error code (see registers_t in interrupts.h)
    push qword %2                               ; set interrupt number
    call irq_wrapper                            ; calls void irq_wrapper(int_stack_t stack)
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro
There must be these:

Code: Select all

; Macro to push all the registers
%macro PUSH_ALL 0
    push r15
    push r14
    push r13
    push r12
    push r11
    push r10
    push r9
    push r8
    push rbp
    push rsi
    push rdi
    push rdx
    push rcx
    push rbx
    push rax
%endmacro

; Macro to pop all the registers
%macro POP_ALL 0
    pop rax
    pop rbx
    pop rcx
    pop rdx
    pop rdi
    pop rsi
    pop rbp
    pop r8
    pop r9
    pop r10
    pop r11
    pop r12
    pop r13
    pop r14
    pop r15
%endmacro

; Macro to create an intterupt service routine for interrupts that do not pass error codes 
%macro INT_NO_ERR 1
[global isr%1]
isr%1:
    cli                                         ; disable interrupts
    push qword 0                                ; set error code to 0
    push qword %1                               ; set interrupt number
    PUSH_ALL                                    ; push all the registers on the stack
    call isr_wrapper                            ; call void isr_wrapper(int_stack_t stack)
    POP_ALL                                     ; pop all the registers back from the stack
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro

; Macro to create an interrupt service routine for interrupts that DO pass an error code
%macro INT_HAS_ERR 1
[global isr%1]
isr%1:
    cli                                         ; disable interrupts
    push qword %1                               ; set interrupt number
    PUSH_ALL                                    ; push all the registers on the stack
    call isr_wrapper                            ; call void isr_wrapper(int_stack_t stack)
    POP_ALL                                     ; pop all the registers back from the stack
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro

; Macro to create an IRQ interrupt service routine
; First parameter is the IRQ number
; Second parameter is the interrupt number it is remapped to
%macro IRQ 2
[global irq%1]
irq%1:
    cli                                         ; disable interrupts
    push qword %1                               ; set IRQ number in the place of error code (see registers_t in interrupts.h)
    push qword %2                               ; set interrupt number
    PUSH_ALL                                    ; push all the registers on the stack
    call irq_wrapper                            ; calls void irq_wrapper(int_stack_t stack)
    POP_ALL                                     ; pop all the registers back from the stack
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro
User avatar
sortie
Member
Member
Posts: 931
Joined: Wed Mar 21, 2012 3:01 pm
Libera.chat IRC: sortie

Re: AHCI read issue

Post by sortie »

Hi, guess what? You're still wrong. :-)

Your assembly breaks the ABI: The parameters passed on the stack like this belongs to the called function and it is allowed to change the values, which is not what you assume here. You'll want to deprogram your bad tutorial habits using James Molloy's Tutorial Known Bugs that covers this particular bug and other bad things you inherited.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: AHCI read issue

Post by Antti »

Code: Select all

irq%1:
    cli             ; Something is wrong if this is needed here.
    ...
    ...
    sti             ; This instruction does not have any effect.
    iretq
gusc
Member
Member
Posts: 50
Joined: Tue Feb 05, 2013 1:42 pm

Re: AHCI read issue

Post by gusc »

Thank you, thank you, thank you (there's this website where it goes on the loop) :)

I think I might have nailed it this time, by properly pushing RBP on the stack and passing the stack structure as a pointer to C in RDI. It does not seem to fail anymore, and no random glitches (like every 3rd time there is no ATA controller at all or there are no IDE devices, which seems to be caused by interrupts trashng registers).

So this might be the correct one:

Code: Select all

; Macro to push all the registers
%macro PUSH_ALL 0
    push r15
    push r14
    push r13
    push r12
    push r11
    push r10
    push r9
    push r8
    push rsi
    push rdi
    push rdx
    push rcx
    push rbx
    push rax
%endmacro

; Macro to pop all the registers
%macro POP_ALL 0
    pop rax
    pop rbx
    pop rcx
    pop rdx
    pop rdi
    pop rsi
    pop r8
    pop r9
    pop r10
    pop r11
    pop r12
    pop r13
    pop r14
    pop r15
%endmacro

; Macro to create an intterupt service routine for interrupts that do not pass error codes 
%macro INT_NO_ERR 1
[global isr%1]
isr%1:
    cli                                         ; disable interrupts
    push qword 0                                ; set error code to 0
    push qword %1                               ; set interrupt number
    push rbp                                    ; push current rbp
    mov rbp, rsp                                ; store current stack frame in rbp
    PUSH_ALL                                    ; push all the registers on the stack
    mov rdi, rsp                                ; pass our stack as a pointer to isr_wrapper
    call isr_wrapper                            ; call void isr_wrapper(int_stack_t *stack)
    POP_ALL                                     ; pop all the registers back from the stack
    pop rbp                                     ; get our rbp back from the stack
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro

; Macro to create an interrupt service routine for interrupts that DO pass an error code
%macro INT_HAS_ERR 1
[global isr%1]
isr%1:
    cli                                         ; disable interrupts
    push qword %1                               ; set interrupt number
    push rbp                                    ; push current rbp
    mov rbp, rsp                                ; store current stack frame in rbp
    PUSH_ALL                                    ; push all the registers on the stack
    mov rdi, rsp                                ; pass our stack as a pointer to isr_wrapper
    call isr_wrapper                            ; call void isr_wrapper(int_stack_t *stack)
    POP_ALL                                     ; pop all the registers back from the stack
    pop rbp                                     ; get our rbp back from the stack
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro

; Macro to create an IRQ interrupt service routine
; First parameter is the IRQ number
; Second parameter is the interrupt number it is remapped to
%macro IRQ 2
[global irq%1]
irq%1:
    cli                                         ; disable interrupts
    push qword %1                               ; set IRQ number in the place of error code (see registers_t in interrupts.h)
    push qword %2                               ; set interrupt number
    push rbp                                    ; push current rbp
    mov rbp, rsp                                ; store current stack frame in rbp
    PUSH_ALL                                    ; push all the registers on the stack
    mov rdi, rsp                                ; pass our stack as a pointer to isr_wrapper
    call irq_wrapper                            ; calls void irq_wrapper(irq_stack_t *stack)
    POP_ALL                                     ; pop all the registers back from the stack
    pop rbp                                     ; get our rbp back from the stack
    add rsp, 16                                 ; cleanup stack
    sti                                         ; enable interrupts
    iretq                                       ; return from interrupt handler
%endmacro
Although one thing remains a little bit odd in my code - that I push the RBP after the interrupt number and error code (not like GCC does it at the very beginning of procedure), but It does not seem to mess any thing up (for now).

And this is how my register structure looks like from C:

Code: Select all

/**
* Stack of registers passed from assembly
*/
typedef struct {
    uint64 rax;
    uint64 rbx;
    uint64 rcx;
    uint64 rdx;
    uint64 rdi;
    uint64 rsi;
    uint64 r8;
    uint64 r9;
    uint64 r10;
    uint64 r11;
    uint64 r12;
    uint64 r13;
    uint64 r14;
    uint64 r15;
    uint64 rbp;
    uint64 int_no;				// Interrupt number
    uint64 err_code;			// Error code (or IRQ number for IRQs)
    uint64 rip;					// Return instruction pointer
    uint64 cs;					// Code segment
    uint64 rflags;				// RFLAGS
    uint64 rsp;					// Previous stack pointer
    uint64 ss;					// Stack segment
} isr_stack_t;
typedef struct {
    uint64 rax;
    uint64 rbx;
    uint64 rcx;
    uint64 rdx;
    uint64 rdi;
    uint64 rsi;
    uint64 r8;
    uint64 r9;
    uint64 r10;
    uint64 r11;
    uint64 r12;
    uint64 r13;
    uint64 r14;
    uint64 r15;
    uint64 rbp;
    uint64 int_no;				// Interrupt number
    uint64 irq_no;			    // Error code (or IRQ number for IRQs)
    uint64 rip;					// Return instruction pointer
    uint64 cs;					// Code segment
    uint64 rflags;				// RFLAGS
    uint64 rsp;					// Previous stack pointer
    uint64 ss;					// Stack segment
} irq_stack_t;
User avatar
sortie
Member
Member
Posts: 931
Joined: Wed Mar 21, 2012 3:01 pm
Libera.chat IRC: sortie

Re: AHCI read issue

Post by sortie »

Hi gusc,

Better :)

You'll want to address what Antti said, which was one of the points in the link I gave. You should know whether interrupts are on for your interrupt handlers: You get to decide through an IDT entry bit (You have the choice between interrupts being unaltered or disabled).

You are worrying about the rbp singly linked list, which lets you iterate frame contexts. It is for debugging purposes. The ABI specifies that all functions preserve rbp. Additionally, if this debug information is enabled through -fno-omit-frame-pointer (it isn't by default on x86_64 in favour of a more complicated scheme) all functions save the rbp pointer of the calling function and set rbp to point to that saved value, so you can just keep following rbp until you come to an end. One useful thing is that just prior to the saved rbp values is the saved instruction pointer (saved by the call instruction). This means that you can backtrace the stack by for the first frame (the current function) by knowing where you are and what the rbp value, for the rest of the calling frames you can dereference rbp and get the saved rbp (and next to it the saved rip). You just loop like until you hit a zero rbp or zero rip, or how the base case looks.

This is what you can use from this information: Your interrupt handler (which now looks correct, assuming your kernel doesn't use floating point state or stuff you didn't save in the interrupt handler) preserves all registers, including rbp, so it doesn't corrupt what it interrupts. You don't have to make your interrupt handler participate in the rbp chain unless you want it to, it doesn't break called functions if you don't (unless they attempt to backtrace). If you wish to add backtracing support, you could either 1) add a saved zero rip and a saved zero rbp and point rbp to the zero rbp. This should serve as a marker of the end of the chain. 2) Or, you can make a copy of the interrupted rip value followed by a copy of the interrupted rbp value and point your rbp to that. This means that you continue the rbp chain from whatever you interrupted, so if you backtrace from your interrupt handler it'll even tell you what it execute. This would, of course, be a security vulnerability if you interrupted an untrusted user-space program or if any of the involved code wasn't built with -fno-omit-frame-pointer. This is however quite valuable for debugging purposes, especially if you have a symbol table of the kernel and user-space so you can even name the functions when backtracing.

In my system, my interrupt handlers set up a zero rip and zero rbp because the interrupted state isn't related to the interrupt handler (actually, I dont do this yet, I should now that I think about it); while my system call handler directly continues the rbp chain from user-space (in debug builds, at least, my custom cross-compiler sets -fno-omit-frame-pointer by default). This allows my kernel debugger to see exactly what kernel threads (same as user-space threads, but in kernel mode) are doing and the user-space call chain that did the system call. This is invaluable when fixing regressions in ports, where you are dealing with a large third party piece of software you don't know and somehow it triggered a bug, you can see what it is doing and what code is involved.

The important lesson here is to know, never to doubt. The author of the tutorial you followed was at the time not that experienced and thus the tutorial is filled with stuff that would make an experienced osdever think twice, code where the author doesn't fully know what is meant to happen (see when he does cli in the interrupt handler, not knowing whether they are on), or code that is quite obscure and takes an expert to spot (such as the subtle ABI breakage I pointer out above). You should always verify your doubts and assumptions to be successful at osdev and debugging these weird issues. Thankfully, you can always ask for help if you need it. :)

Oh, and since you are developing for x86_64, be sure to build *all* your code that runs in the kernel with -mno-red-zone. That has caused very subtle and terribly breakage for many, including me, so worth reiterating even if you already know.
Post Reply