Local APIC and to early EOI'in an interrupt.

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
mutex
Member
Member
Posts: 131
Joined: Sat Jul 07, 2007 7:49 pm

Local APIC and to early EOI'in an interrupt.

Post by mutex »

After implementing full lapic and ioapic support i got a few issues.

First i had issues with vmware+virtualbox not behaving like bochs+real hardware;

The problem was that the kernel constantly after a random amount of time paniced into "Bad opcode fault", "IDT not present", etc.. New error all the time.

Why should the cpu suddenly go crazy with interrupts disabled?

First i started looking for possible race conditions due to two cpu's running in same context or something by error. This was not the case so that one was ruled out. After some searching i found out that i was actually inside kernel-mode in an isr the error keept coming. I noticed the value 0xfee000b0 sometimes was in EAX and that lead me to the ApicEOI. What could be wrong since interrupts are disabled, all lvt's are disabled except lapic timer so the CPU should be running "safe" when in an ISR of type interrupt gate.

Well.. after ALOT of searching, trying failing, debugging with vmware i found out that it was actually the ApicEOI() that was failing.. Seems that my isr_return function that does the context switch and the iretd was too long..? Its not more than maby 25 lines of asm.. But actually putting the ApicEOI almost at the end fixed everything. Of course this was not very sattisfying. I found the error, how to solve it, but not really why it happens.. Could be a vmware/virutalbox issue, but could also be that it was something happening due to speed of the vmm code versus bochs/real hardware. My bet is now that it was the timer interrupt was delivered again to the lapic of the cpu that at the moment of delivery still was handling a interrupt from that irq/source. When i then EOI, it will try to deliver multiple times until the cpu ACK's it. Still the cpu should not start executing new isr before interrupts are enabled. That meant is it the apic that generates a error message/interrupt or something when it fails delivery for the n'th time? Could not really find anything usefull in the intel developer manual but i think it might be the case.

Anyway. Others seem the same problem? This one really bugs me. Im going to find out really what is happening here. Only problem is that only vmware+virtualbox is showing this. Grrr!

Next thing i find a little strange is something related to the apic+ioapic. I have enabled lint1 (keyboard) to deliver to 0xFF000000 logical destination, and using lowes priority mode, edge triggered etc. This works well in vmware+bochs+virtualbox. The keyboard is always handled by first cpu (since it has lowes arbitration id). That makes sense since i have not been able to test with multiple irqs at same time and how the lowest pri mode works with many irqs at same time. Still it works well catching the interrupt like it did before with the PIC and the keyboard driver works well. But on real hardware i have to different cases.

On my dualcore laptop it seems that both cpu cores get interrupted but probably due to fast hardware and a little bit time skew on when they enter the (concurrent/reentrant) isr one of the cores always catches the scancode before second check the register if data is present so one of the cores actually enters, find no data and exits.. So there is no problem, but there is something wrong with the logic that decides what lapic that takes the interrupt. On another pc with hyperthreading only it seems that the cores execute exactly at the same time so both actually get the scancode and i get double scancodes. So in that case it ends up beeing a problem aswell.

I started debugging on the real pc's and it seems that the LDR in the apic's does not get loaded with 0xFF but with 0x03. The DFR is loaded with (0xF<<28) and shows right. Then i start wonder. When the loapic have logical dest mode 0xFF (broadcast) and lowest pri mode is set, when DFR is one's and LDR (is 0x03 or 0xFF) then it should match (logical and) on one of the bits (ref intel manual). But this is not the case.. Anyone know why? I have not been able to read trough all the intel manuals on the apic, xapic and x2apic yet, but i have seen that the newer apics should be backwards compatible when in "apic" mode. Have i missed something or what?

confused regards
Thomas
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: Local APIC and to early EOI'in an interrupt.

Post by gerryg400 »

Thomas,

Is there any chance that both cores/cpus are accessing the IOAPIC at the same time? I had this problem and it caused very strange behaviour. Added a spinlock to the functions that access IOAPIC and suddenly my os became stable.

- gerryg400
Post Reply