Strange Problem of my OS only with Bochs [SOLVED]

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Strange Problem of my OS only with Bochs

Post by AJ »

It's odd that this is only happening in Bochs. When an emulated environment behaves differently from a real one, the usual first thing you should look at is whether you have initialised all your variables correctly (don't assume that a newly declared variable contains zero).

I am (almost!) willing to bet that something (like an uninitialised variable) has trashed you stack or stack pointer. Your timer interrupt really shouldn't be doing very much - perhaps updating a tick counter and returning. Ensure that your IRQ handlers have symmetrical PUSH / POP counts (or you add to ESP as appropriate). Set up the keyboard IRQ - does that do the same?

To verify that it really is a corrupted stack, put JMP $ just before your IRQ IRET. If the system hangs at that point without falting, investigate every minute detail of your IRQ stack manipulation.

Cheers,
Adam
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Strange Problem of my OS only with Bochs

Post by jal »

AJ wrote:It's odd that this is only happening in Bochs.
Not entirely. What Bochs is, and which Qemu and a real machine aren't, is slow. Especially when handling interrupts, if you acknowledge the interrupt first and sti, then go on do some processing, a next interrupt can fire and fry your stack etc. I had similar problems once when I put the timer frequency to high (i.e. > 100/s), and got loads of spurious interrupts (IRQ7) (obviously I did not sti before returning).


JAL
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Strange Problem of my OS only with Bochs

Post by AJ »

jal wrote:Not entirely....
Ah - ok. The reason I said what I said is that in my experience it is much more likely for my kernel to work on an emulator or VM and fail on real hardware than the other way round. From what you say, the opposite is obviously true as well. Part of the reason I've never had this problem may be because I've never (yet) written an interrupt handler where I've used STI.

Cheers,
Adam
User avatar
finarfin
Member
Member
Posts: 106
Joined: Fri Feb 23, 2007 1:41 am
Location: Italy & Ireland
Contact:

Re: Strange Problem of my OS only with Bochs

Post by finarfin »

Ok,

now i found where is the problem, it seems to be my shared IRQ handling that sucks :)
I don't know why, i have to find the cause (also understand if is my handler fault, or maybe a problem of bochs), but at least, now disabling that feature, things seems to be OK.

Thank for the support.

Now another question, i noticed that when i'm in BOCHS (again), if i press enter many times, i receive an IRQ with irqn = 0.

It seems to be strange because the lower irq (the timer) has a number of 1.

Again, this problem is only with bochs, qemu and real hardware works fine.
Now i disabled shared IRQ feature, my handler now is like that:

Code: Select all

void _irqinterrupt(){
    int irqn=0;
    irqn = get_current_irq();  
    if(irqn>0) {
	   if(irqn==1) PIT_handler();
	   else if(irqn==2) keyboard_isr();
    }
//     else printf("IRQ N: %d E' arrivato qualcosa che non so gestire ", irqn);

    if(irqn<=8) outportb(0x20, MASTER_PORT);
    else if(irqn<=16)outportb(0x20, SLAVE_PORT);
}
If i decomment the printf in bochs i receive that message with irqn = 0.

Why? :D
Elen síla lúmenn' omentielvo
- DreamOS64 - My latest attempt with osdev: https://github.com/dreamos82/Dreamos64
- Osdev Notes - My notes about osdeving! https://github.com/dreamos82/Osdev-Notes
- My old Os Project: https://github.com/dreamos82/DreamOs
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Strange Problem of my OS only with Bochs

Post by AJ »

Don't know why you should be getting IRQ #0 (unless its actually the PIT firing), but that code looks a bit messed up
finarfin wrote:

Code: Select all

if(irqn==1) PIT_handler();
else if(irqn==2) keyboard_isr();
Shouldn't the PIT be irqn == 1 and the keyboard_isr == 2? There are also some more pedantic things about this, such as using a switch statement (because that if..else if statement will grow) and a function pointer system to allow installable handlers.

Code: Select all

if(irqn<=8) outportb(0x20, MASTER_PORT);
else if(irqn<=16)outportb(0x20, SLAVE_PORT);
If the IRQ >= 7, you need to send an EOI to the master and the slave ports.

Cheers,
Adam
User avatar
finarfin
Member
Member
Posts: 106
Joined: Fri Feb 23, 2007 1:41 am
Location: Italy & Ireland
Contact:

Re: Strange Problem of my OS only with Bochs

Post by finarfin »

And
Yeah, i thought to use a switch statement, that code was added yesterday only for try to solve my problem.

I have a installable handler mechanism, but I found that it's the cause of my bochs problems, the original code of my irq_handler is:

Code: Select all

void _irqinterrupt(){
    int irqn=0;
    irqn = get_current_irq();  
    IRQ_s* tmpHandler; 
    if(irqn>0) {
        shareHandler[irqn-1]->IRQ_func();
        tmpHandler->IRQ_func();
        while(tmpHandler->next!=NULL) {	    
            tmpHandler = tmpHandler->next;                           
            tmpHandler->IRQ_func();
        }
    }
//     else printf("IRQ N: %d E' arrivato qualcosa che non so gestire ", irqn);
    if(irqn<=8) outportb(0x20, MASTER_PORT);
    else if(irqn<=16)outportb(0x20, SLAVE_PORT);
}
That works fine with real hw, qemu, and causes the problems of that post in bochs, i don't know why :D

Obviously i have also function for adding irq:

Code: Select all

void add_IRQ_handler(int irq_number, void (*func)()){
    if(irq_number<16){
        IRQ_s *tmpHandler;
        tmpHandler = shareHandler[irq_number];
        if(shareHandler[irq_number]==NULL){
             shareHandler[irq_number]= (IRQ_s*) request_pages(sizeof(IRQ_s), NOT_ADD_LIST);
             shareHandler[irq_number]->next = NULL;
             shareHandler[irq_number]->IRQ_func = func;
        }
        else {
            while(tmpHandler->next!=NULL) tmpHandler = tmpHandler->next;
            tmpHandler->next = (IRQ_s*) request_pages(sizeof(IRQ_s), NOT_ADD_LIST);
            tmpHandler = tmpHandler->next;
            tmpHandler->next = NULL;
            tmpHandler->IRQ_func = func;
        }
    }
    else return; 
}
Probably there is something wrong with that functions?

Thanks.
Elen síla lúmenn' omentielvo
- DreamOS64 - My latest attempt with osdev: https://github.com/dreamos82/Dreamos64
- Osdev Notes - My notes about osdeving! https://github.com/dreamos82/Osdev-Notes
- My old Os Project: https://github.com/dreamos82/DreamOs
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Strange Problem of my OS only with Bochs

Post by AJ »

Ok - this still needs some debugging and explanation:

Code: Select all

irqn = get_current_irq(); 
Based on your other code, I'm assuming that irqn now contains (irq# + 1).

Code: Select all

IRQ_s* tmpHandler;
Uninitialised so far...

Code: Select all

shareHandler[irqn-1]->IRQ_func();
Ok - that's why I assumed that irqn contains irq# + 1.

Code: Select all

tmpHandler->IRQ_func();
Where has tmpHandler been initialised? On real hardware, it isn't even guaranteed to contain a NULL pointer at this stage. [edit]Looking at your code below, do you intend tmpHandler to be declared extern (as a global variable)?

Code: Select all

else printf("IRQ N: %d E' arrivato qualcosa che non so gestire ", irqn);
Because we have assumed irqn contains irq#+1, this can't happen unless the actual irq number is 0xFF. If irqn contains the actual irq#, the rest of your code (this sample and the example you posted above) needs some major work.

Code: Select all

if(irqn<=8) outportb(0x20, MASTER_PORT);
else if(irqn<=16)outportb(0x20, SLAVE_PORT);
I know you are only handling irq 0 and 1 at present, but this will come back to haunt you when you start handling irq's greater than 7 - send the EOI to the master every time.

..and from your handler adding routine:

Code: Select all

tmpHandler = shareHandler[irq_number];
As you test for shareHandler[] == NULL, presumably shareHandler is a manually zeroed array of pointers...

Code: Select all

tmpHandler->next = (IRQ_s*) request_pages(sizeof(IRQ_s), NOT_ADD_LIST);
After this, you should test whether tmpHandler->next is a NULL pointer (assuming that this is a memory allocation routine).
That works fine with real hw, qemu, and causes the problems of that post in bochs, i don't know why :D
TBH, in its current state I have no idea why it does work with real HW or qemu. Sorry to be harsh, but many OS's do get KB handlers and the PIT working very nicely together in Bochs. Looking at the posted code, you should go back over reviewing every step of the IRQ handler mechanism.

Cheers,
Adam

[edit]
Oh - and you don't need this:

Code: Select all

else return;
[/edit]
User avatar
finarfin
Member
Member
Posts: 106
Joined: Fri Feb 23, 2007 1:41 am
Location: Italy & Ireland
Contact:

Re: Strange Problem of my OS only with Bochs

Post by finarfin »

Ops i deleted one line for error, the code for irqinterrupt correct is:

Code: Select all

void _irqinterrupt(){
    int irqn;
    irqn = get_current_irq();  
    IRQ_s* tmpHandler; 
    if(irqn>0) {
        tmpHandler = shareHandler[irqn-1];
        tmpHandler->IRQ_func();
        while(tmpHandler->next!=NULL) {
            tmpHandler = tmpHandler->next;                           
            tmpHandler->IRQ_func();
        }
    }
    if(irqn<=8) outportb(0x20, MASTER_PORT);
    else if(irqn<=16){
      outportb(0x20, SLAVE_PORT);
      outportb(0x20, MASTER_PORT);
    }
}
My fault ^_^

Oh, i checked, shareHandler was initalized at NULL when i enable IRQ's.
Elen síla lúmenn' omentielvo
- DreamOS64 - My latest attempt with osdev: https://github.com/dreamos82/Dreamos64
- Osdev Notes - My notes about osdeving! https://github.com/dreamos82/Osdev-Notes
- My old Os Project: https://github.com/dreamos82/DreamOs
User avatar
finarfin
Member
Member
Posts: 106
Joined: Fri Feb 23, 2007 1:41 am
Location: Italy & Ireland
Contact:

Re: Strange Problem of my OS only with Bochs [SOLVED]

Post by finarfin »

Probably i solved the problem,

and the problem was very simply, obviously my fault.
Yeah, the problem was in get_cur_irq() (that return the current IRQ from the bitmask), when I wrote it I did something like that:

Code: Select all

outportb(GET_IRR_STATUS, MASTER_PORT);
cur_irq = inportb(MASTER_PORT);
return cur_irq
Now the handler has a code like that:

Code: Select all

irqn = get_cur_irq();
tmpHandler = shareHandler[irqn-1];
tmpHandler->IRQ_func();
Now sometimes in bochs happens that there are 2 pending IRQ's (actually timer and keyboard), and the bitmask is like that:

Code: Select all

0b00000011
And obviously my get_cur_irq return 3.
Now the irqhandler tries to execute the shareHandler[3]->IRQ_func() that is initialized at zero (because i don't use that IRQ).
And obviously i received that fault on bochs.
Probably on real hardware it didn't happe because real pc are faster than bochs and is hard to have keyboard an timer pending in the same time (or is it possible?).

D'oh...
It's a very silly error...
Elen síla lúmenn' omentielvo
- DreamOS64 - My latest attempt with osdev: https://github.com/dreamos82/Dreamos64
- Osdev Notes - My notes about osdeving! https://github.com/dreamos82/Osdev-Notes
- My old Os Project: https://github.com/dreamos82/DreamOs
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Strange Problem of my OS only with Bochs [SOLVED]

Post by jal »

finarfin wrote:Probably on real hardware it didn't happen because real pc are faster than bochs and is hard to have keyboard an timer pending in the same time (or is it possible?).
It is possible, but like I said before, Bochs is slow, which in many cases is a good thing catching race conditions.
It's a very silly error...
I wouldn't call it silly. It's easy to miss such details. What can be silly is blaming Bochs, but iirc you didn't do that :).


JAL
Post Reply