Page 1 of 2
Interrupt problems
Posted: Thu May 22, 2003 11:00 pm
by jos15
Hi, i'am 15 and i'am building my own OS. it's called JOS. My OS maps IRQ's 0-15 to interrupts 0x20-0x2F. When i enable interrupts the strangest things happen most often resulting in a reset. When i flush the TLB (mov %cr3,%eax; mov %eax,%cr3) all goes well until interrupts are enabled.
(i'am dutch and my english is not that good...)
RE:Interrupt problems
Posted: Fri May 23, 2003 11:00 pm
by mikeleany
What kinds of weird thing happen?
A reset is the result of a triple fault, which seems to indicate that something in your interrupts isn't set up right, but based on what you said, I couldn't possibly guess what it is, well, except that you don't handle the "double fault" exception (at least not properly).
RE:Interrupt problems
Posted: Sun May 25, 2003 11:00 pm
by jos15
I have tested my os under a Pentium MMX 200 Mhz and it works (except writing to cr3) but if i try it under my Celeron 466 Mhz it just resets at the point where interrupts are enabled. i have also tried my os under Bochs and all goes well except that i still can't change or even write to cr3. If i write to cr3 and enable interrupts under Bochs it crashes with the message: "[cpu] 3rd exception with no resolution". with debug enabled i found out that after interrupts are enabled the timer interrupt is called and then before the timer ISR is called a page fault occurs and directly after that a double fault.
RE:Interrupt problems
Posted: Sun May 25, 2003 11:00 pm
by jos15
I have tested all exceptions and they work properly until i change cr3...
RE:Interrupt problems
Posted: Sun May 25, 2003 11:00 pm
by Adek336
why do you put unknown data into cr3? if you've got paging enabled it will fault
Adrian
RE:Interrupt problems
Posted: Sun May 25, 2003 11:00 pm
by mikeleany
Sounds like you don't map the pages for your interrupts correctly. When the timer interrupt is called, it finds the present flag is 0 in your page directory or page table for your ISR's (interrupt service routine's) page, so the processor thinks the interrupt isn't in physical memory, so it generates a page fault. But when it looks for the page containing the page fault ISR, it probably finds that page is also marked not-present... and so on, creating a "triple fault". So just make sure that you're not accidentally setting cr3 to the wrong thing and that the pages for your ISRs have the present bit set (especially your page fault ISR), and that they have the correct physical address (and same with the pages for all the rest of the memory you are using).
RE:Interrupt problems
Posted: Mon May 26, 2003 11:00 pm
by jos15
I have paging enabled and it works even within interrupt until i write to cr3. the value in cr3 points to a valid page directory because i only flush the TLB:
__asm__ __volatile__ ("mov %cr3,%eax; mov %eax,%cr3":::"eax");
So all pages are correctly mapped and are present.
My kernel is loaded at 1MB physical and the boot loader uses the GDT with a base address of 0x40100000 to map the kernel to 0xC0000000 virtual then the kernel changes the GDT to a base of 0 and enables paging so the kernel is mapped to 0xC0000000 and then flushes the prefetch queue of the CPU by making a short and far jump. That works all well.
My concern is that i reprogram the PIC after enabling paging most OS'es like COSMOS and Linux first reprogram the PIC and then enable paging. Can this be the cause of my problems?? Or maybe my problems are caused by the trick that i use to map the kernel at 0xC0000000 and not as usual at 0xC0100000??
RE:Interrupt problems
Posted: Mon May 26, 2003 11:00 pm
by mikeleany
Looks like you're doing just what Adek said: writing random data into cr3. Unless your inline assembler does things backwards, you're first copying eax to cr3, then copying cr3 to eax instead of the other way around. Reprogramming the PIC after enabling paging shouldn't be a problem. As for mapping things here, there and everywhere, well, just be careful that you're really mapping it to where you think you are, and that you link it properly for the segmented address you use. And remember, it first translates the segmented (logical) address to a virtual (paged or linear) address, then from the virtual address to a physical address (which is your load address).
RE:Interrupt problems
Posted: Tue May 27, 2003 11:00 pm
by jos15
I am using gcc and gcc uses by default AT&T style assembly so
__asm__ __volatile__ ("movl %cr3,%eax; movl %eax,%cr3")
first copies cr3 to eax and then puts eax in cr3.
RE:Interrupt problems
Posted: Tue May 27, 2003 11:00 pm
by jos15
I am sure there is no unknown data in cr3 because if i reload it then the cpu doesn't reset directly after it. it resets when interrupts are enabled.
I have tried to change the cr3 with interrupts disabled. While interrupts are disabled i have done some test:
- reading from page directory using page directory self-pointer works
- short jumps work
- writing to page directory using page directory self-pointer works\
but when i try to make a far jump:
__asm__ __volatile__ ("movl 1f,%eax; jmp *%eax; 1f:") (AT&T style)
the cpu resets.
if i don't make a far jump and enable interrupts the cpu resets too.
RE:Interrupt problems
Posted: Tue May 27, 2003 11:00 pm
by mikeleany
Sorry, I didn't remember that gcc used AT&T-style assembly by default. You can tell how much I use the inline assembler (I hate the thing). So lets see if I understand what's going on.
You have your kernel at 0x100000 physical memory.
It's at address 0xc0000000 in it's segment.
You move the GDT(without modifying it) and enable paging.
Here ALL exceptions work fine.
You disable interrupts.
Is this where you reprogram the PIC? Did you modify the IDT?
(From when you turned on paging you didn't modify any page tables or the directory.)
You flush the TLB by reloading cr3 with the same value.
Here everything works fine except far jumps.
You enable interrupts.
A timer interrupt occurs.
A page fault occurs.
A double fault occurs.
The computer triple faults and resets.
Is this correct?
I can easily see how a page fault could occur after reprogramming the PIC (if the page with the timer ISR wasn't setup right). However, to cause a reset, the only thing I can think of is if you modified the GDT (or LDT if applicable), IDT or page tables or directory since you last tested the exceptions.
RE:Interrupt problems
Posted: Thu May 29, 2003 11:00 pm
by jos15
It is almost correct:
boot loader
- loads kernel at 1MB physical by using big real mode
- disables interrupts and masks off all IRQ's
- loads a GDT with code and data base at 0x40100000 (0xC0000000+0x40100000 -> 0x100000)
- makes a far jump to the kernel (nasm: jmp dword 0x08:0xC0000000)
kernel start code:
- loads 0x10 (data selector) in DS,ES,FS and GS
- uses LSS to set up a initial stack of two pages
- creates a pagetable that maps virtual 0-4MB to physical 0-4MB
- creates a pagetable for the kernel text+data+bss
- creates a pagedirectory and maps the two pagetables in it
- enables paging and makes a short and far jump to flush the prefetch queue
> at this moment the kernel still uses the bootloader GDT with a base of 0x40100000 so the temporary page table is used (0xC0000000+0x40100000=0x00100000)
- loads the kernel GDT with base address 0 and makes far jump that reloads CS and EIP to flush the prefetch queue
- reloads ESP and SS (using LSS)
- jump to kernel C code
kernel C code:
- reprograms the PIC
- initializes the memory manager
- initializes the console etc.
> at this point i tried to flush the TLB with interrupts still disabled. it works until interrupts are enabled.
> here i can make a far jump but the following code does not work (causes a reset)
movl %%esp,%%eax //load eax with esp
pushl $0x10 //ss
pushl %%eax //esp
pushfl //eflags
pushl $0x10 //cs
pushl $1f //eip
iret
1f:
- initializes multitasking and sets up the timer IRQ
- enables interrupts
- initializes devices etc.
> at this point within the timer IRQ (interrupts disabled) i have tried to
flush the TLB or change the cr3.
- kernel enters idle loop
RE:Interrupt problems
Posted: Fri May 30, 2003 11:00 pm
by mikeleany
Well, I can see why the iret would cause an exeption. You use the same selector for SS as you do for CS. However, if your general protection fault ISR worked, then it shouldn't reset unless you told it to.
As for flushing the TLB by reloading cr3, I don't know why that doesn't work. So if you reload cr3, the computer resets when interrupts are enabled, but otherwise it doesn't? And you tested all interrupts after paging was enabled and after you moved the GDT?
Anyone else have some ideas to help him?
RE:Interrupt problems
Posted: Sun Jun 01, 2003 11:00 pm
by jos15
There must be some logical reason because it doesn't work in Bochs either. I am going to rebuild the bootloader and the kernel start based on COSMOS because in COSMOS the cr3 can be changed.
RE:Interrupt problems
Posted: Sat Jun 14, 2003 11:00 pm
by jos15
I have found the mistake: if i don't unmap the first page table then all goes wright. I am sure the kernel doesn't use this page but it seems that interrupts follow the first page table if i write something to cr3.