OSDev.org

Posted: **Fri Sep 13, 2013 6:05 am**

Hey!

I have a small hobby micro-kernel (surprise!) found at [1]. I'm relatively far already (physical memory, virtual memory, threading, and (the newest one) user mode threads. Now that i have added user mode i'm facing a strange problem with the local APIC timer, and a page-fault which i completely fail to explain.

Some details on the problem, i really hope somebody has a good idea what i could check ...

:

* I have two kernel threads that print out the current time in microseconds. If i let only those two threads run, the whole system is stable (at least for a few hours

). [To be clear]: the local APIC timer code works flawlessly in this scenario.
* Now i added a user mode thread, that writes "hello userspace" using a system call (through int 50). If it writes this only a few times (number varies), all is ok, and even exiting and cleaning up the thread works well. If it writes the same in a loop more than ~5-20 times, things get strange (explanation of output below):

Code: Select all

[176264] info:  3: hello thread 176307 (0xffffffff7ffe1f98)
[181638] trace: dispatch 35
[181752] trace: switching: 1:0 (6) 
[181989] trace: scheduled for 5234
[182459] trace: dispatch 50
hello userspace
[183330] trace: dispatch 50
hello userspace
[184663] trace: dispatch 50
hello userspace
[185732] trace: dispatch 50
hello userspace
[186767] trace: dispatch 50
hello userspace
[188076] trace: dispatch 14
[188489] trace: page-fault at 0x00006fffffff006a <0x00006fffffff0045:uapi_sysc_call + 0x25> while writing to 0xfffffffffffffff8
[189366] trace: CPU registers after fault:
 RAX: 0x0000000000000000, RBX: 0x0000000000000000
 RCX: 0x0000000000000003, RDX: 0xffffffff8013fd8a
 RBP: 0x00007fffffffff98, RSP: 0x00007fffffffff70
[190089] trace: fault occured in thread 0, process 1
[190388] trace: no translation for the given page was available!
[191146] error: don't know anything more to do. page-fault @ 0x00006fffffff006a (code 0x2) without resolution!
[193326] error: thread 0 in process 1 aborted!
    [0xffffffff80127d91] thr_abort + 0x9b
    [0xffffffff8011d9a7] pgflt_handler + 0x7d2
    [0xffffffff8011ff99] intr_dispatch + 0x1a3
    [0xffffffff8013ce1e] _x86_64_isr14 + 0x47
    [------ intr ------]
    [0x00006fffffff006a] uapi_sysc_call + 0x25
    [0x00006fffffff0038] test_thr2 + 0x38
    [0x00006fffffff0097] uapi_thr_trampoline + 0x20
    [0x0000000000000000] <unknown>

from the above I can see:
* the kernel thread 3 is running ("hello thread")
* the user mode thread is running and printing "hello userspace" using int 50 (i added trace output to see that interrupt dispatcher gets triggered).
* this LAPIC timer is scheduled to somewhere close to 181989 (timestamp of log output = /very/ roughly the timestamp the LAPIC uses for calculation) + 5234 (the "scheduled for" output). means it should fire somewhere around 187223 (again, roughly. usually log timestamps are ~200-300 microseconds off).
* after a few short microseconds, i get this not-so-cool page-fault at 0x00006fffffff006a <0x00006fffffff0045:uapi_sysc_call + 0x25> while writing to 0xfffffffffffffff8. looking at the disassembly there:

Code: Select all

uintptr_t SECTION(SECT_USER_CODE) uapi_sysc_call(syscall_t call, volatile uintptr_t p0, volatile uintptr_t p1) {
    6fffffff0045:       55                      push   %rbp
    6fffffff0046:       48 89 e5                mov    %rsp,%rbp
    6fffffff0049:       48 83 ec 28             sub    $0x28,%rsp
    6fffffff004d:       89 7d ec                mov    %edi,-0x14(%rbp)
    6fffffff0050:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
    6fffffff0054:       48 89 55 d8             mov    %rdx,-0x28(%rbp)
    volatile uintptr_t res;

    // TODO: sysenter, etc.?
    
    asm volatile(
    6fffffff0058:       48 8b 4d e0             mov    -0x20(%rbp),%rcx
    6fffffff005c:       48 8b 55 d8             mov    -0x28(%rbp),%rdx
    6fffffff0060:       8b 45 ec                mov    -0x14(%rbp),%eax
    6fffffff0063:       89 c7                   mov    %eax,%edi
    6fffffff0065:       48 89 ce                mov    %rcx,%rsi
    6fffffff0068:       cd 32                   int    $0x32
    6fffffff006a:       48 89 c0                mov    %rax,%rax
    6fffffff006d:       48 89 45 f8             mov    %rax,-0x8(%rbp)
        "\tint %1\n"
        "\tmov %%rax, %0\n"
        : "=a"(res) 
        : "i"(SYSC_INTERRUPT), "D"(call), "S"(p0), "d"(p1));

    return res;
    6fffffff0071:       48 8b 45 f8             mov    -0x8(%rbp),%rax
}
    6fffffff0075:       c9                      leaveq 
    6fffffff0076:       c3                      retq

* as you can see, the page-fault happens at a "mov %rax,%rax". i don't trust this. i suspect something with interrupt handling is wrong...?!

* as an additionaly issue, the local APIC never fires anymore (to be clear: until it stops, it already fired a few times fine, and if i only run kernel threads, it fires fine all the time). I use the local APIC as only scheduler clock source. I use it in one-shot mode with a small timer framework to calculate the next fire time (so i'm more or "tickless"). Currently the scheduler is mostly alone with programming the timer to after the threads current time slice expired. But because the timer interrupt seems to be lost (can the be due to another disabled interrupts while the timer fires? i thought the interrupt will be delivered when i re-enable interrupts?) the timer never gets re-programmed, and thus will never fire again

maybe also useful information:

* the same happens on both qemu and bochs. there is no helpful information from both qemu and bochs (

usually there is at least /something/).
* on qemu with gdb attached, if i let the kernel run on (executing now kernel thread 3 ("hello thread") all the time, as timer does not trigger scheduler anymore), the kernel hangs after another ~50 rounds, and gdb breaks completely when trying to investigate (invalid package reply, ...). This does not happen with bochs. There it just keeps on running that single thread.
* i thought about maybe having some parallelism somewhere in the interrupt handling, but i cannot imagine this, as the first stmt in the ISR is cli and the last sti; iretq;

If you need any more details, please ask! I'd really appreciate any kind of help or hint.

[1] https://github.com/mduft/tachyon3

Cheers, Thanks,
Markus

Posted: **Sun Sep 15, 2013 9:22 am**

In the meantime i found i was right. The local APIC interrupt indeed fires while i'm handling another interrupt (50 - syscall) and the IF is clear. How could i detect this case? I always thought the interrupt will be delivered after im done with the first interrupt...? The question how to handle this, and the question about what this page fault is about remain... still nobody an idea?

Posted: **Mon Sep 16, 2013 1:08 am**

I did even more research on the issue. I discovered that (as i suspected) the issues are related to each other. It seems that a hardware interrupt (44 -> RTC) causes the pagefault, which leads to a missing EOI on the APIC, inhibiting the LAPIC Timer interrupt from there on. I still don't understand how the pagefault could happen though. I have this in the bochs logs:

Code: Select all

00134765386d[CPU0 ] LONG MODE IRET
00134765387d[CPU0 ] interrupt(): vector = 2c, TYPE = 0, EXT = 1
00134765387d[CPU0 ] interrupt(long mode): INTERRUPT TO INNER PRIVILEGE
00134765387d[CPU0 ] page walk for address 0xfffffffffffffff8
00134765387d[CPU0 ] PAE PTE: entry not present
00134765387d[CPU0 ] page fault for address fffffffffffffff8 @ 00006fffffff006a
00134765387d[CPU0 ] exception(0x0e): error_code=0002
00134765387d[CPU0 ] interrupt(): vector = 0e, TYPE = 3, EXT = 1
00134765387d[CPU0 ] interrupt(long mode): INTERRUPT TO INNER PRIVILEGE
00134765387d[CPU0 ] interrupt(long mode): trap to IST, vector = 1

so: first interrupt 2c fires. this immediately causes a pagefault, and i have NO idea why?! from what the cpu tells me (00006fffffff006a) that would be this code:

Code: Select all

    6fffffff0068:       cd 32                   int    $0x32
    6fffffff006a:       48 89 c0                mov    %rax,%rax

as one can see from the above log, yes, the CPU is IRET'ing right before the hardware interrupt is dispatched. that would explain the location. now i suspected a problem with the ISR somewhere, so i put this code in the ISR template (all ISR are generated from this by GAS):

Code: Select all

_x86_64_isr\num:
    cli

    push %rax
    push %rdx
    mov $0xE9, %dx 
    mov $0x5B, %al 
    out %al, %dx 
    mov $\num, %al 
    out %al, %dx 
    mov $0x5D, %al 
    out %al, %dx 
    pop %rdx
    pop %rax

.........

this should output '[<char-with-code-equal-interrupt-number>]', so fex. [#] for interrupt 35, as #=35. now, bochs gives me this: "...[#][,][#][#][2][#][#][,][#][2][#][#][#][]". the empty one at the end is the pagefault (^N not printable, but i verified this by piping to a file

).

this means, that the code that prints the interrupt number to bochs is /NOT/ called for the int 0x2C that causes the pagefault (as otherwise a [,] for 0x2C should have been there before the empty []). this means that the ISR is not even called... right? how could i find out what causes the problem there?! i'm lost

Cheers,
Markus

Posted: **Mon Sep 16, 2013 9:30 am**

i now debugged the bochs debugger (

) to find out, that the page-fault originates from:

Code: Select all

#0  BX_CPU_C::long_mode_int (vector=14 '\016', soft_int=0, push_error=1, error_code=2) at exception.cc:45
#1  0x00000000004d97fc in BX_CPU_C::interrupt (vector=14 '\016', type=3, push_error=1, error_code=2) at exception.cc:737
#2  0x00000000004d9b6e in BX_CPU_C::exception (vector=14, error_code=2) at exception.cc:922
#3  0x00000000004f6b53 in BX_CPU_C::page_fault (fault=0, laddr=18446744073709551608, user=0, rw=1) at paging.cc:562
#4  0x00000000004f6dc9 in BX_CPU_C::translate_linear_long_mode (laddr=18446744073709551608, lpf_mask=@0x7fffffffc810: 4095, combined_access=@0x7fffffffc814: 2, user=0, rw=1) at paging.cc:722
#5  0x00000000004f7da9 in BX_CPU_C::translate_linear (tlbEntry=0x844300 <bx_cpu+68832>, laddr=18446744073709551608, user=0, rw=1) at paging.cc:1160
#6  0x00000000004f863c in BX_CPU_C::access_write_linear (laddr=18446744073709551608, len=8, curr_pl=0, data=0x7fffffffc8d8) at paging.cc:1913
#7  0x00000000004fb756 in BX_CPU_C::write_new_stack_qword_64 (laddr=18446744073709551608, curr_pl=0, data=67) at access64.cc:1020
#8  0x00000000004d8164 in BX_CPU_C::long_mode_int (vector=44 ',', soft_int=0, push_error=0, error_code=0) at exception.cc:163
#9  0x00000000004d97fc in BX_CPU_C::interrupt (vector=44 ',', type=0, push_error=0, error_code=0) at exception.cc:737
#10 0x000000000049d634 in BX_CPU_C::InterruptAcknowledge () at event.cc:144
#11 0x000000000049d7b7 in BX_CPU_C::handleAsyncEvent () at event.cc:316
#12 0x000000000049c620 in BX_CPU_C::cpu_loop () at cpu.cc:89
#13 0x00000000004874ee in bx_dbg_continue_command () at dbg_main.cc:1794

the problem is in frame #8 when handling vector 44 - the 0x2C interrupt that i expect to happen. the pagefault is a problem when mapping the stack. from this code:

Code: Select all

137	  Bit64u RSP_for_cpl_x;
138	
139	  Bit64u old_CS  = BX_CPU_THIS_PTR sregs[BX_SEG_REG_CS].selector.value;
140	  Bit64u old_RIP = RIP;
141	  Bit64u old_SS  = BX_CPU_THIS_PTR sregs[BX_SEG_REG_SS].selector.value;
142	  Bit64u old_RSP = RSP;
(gdb) 
143	
144	  // if code segment is non-conforming and DPL < CPL then
145	  // INTERRUPT TO INNER PRIVILEGE:
146	  if (IS_CODE_SEGMENT_NON_CONFORMING(cs_descriptor.type) && cs_descriptor.dpl < CPL)
147	  {
148	    BX_DEBUG(("interrupt(long mode): INTERRUPT TO INNER PRIVILEGE"));
149	
150	    // check selector and descriptor for new stack in current TSS
151	    if (ist > 0) {
152	      BX_DEBUG(("interrupt(long mode): trap to IST, vector = %d", ist));
(gdb) 
153	      RSP_for_cpl_x = get_RSP_from_TSS(ist+3);
154	    }
155	    else {
156	      RSP_for_cpl_x = get_RSP_from_TSS(cs_descriptor.dpl);
157	    }
158	
159	    // align stack
160	    RSP_for_cpl_x &= BX_CONST64(0xfffffffffffffff0);
161	
162	    // push old stack long pointer onto new stack
(gdb) 
163	    write_new_stack_qword_64(RSP_for_cpl_x -  8, cs_descriptor.dpl, old_SS);

i saw that RSP_for_cpl_x is 0x0, thus 0x0-8 explains the strange pagefault address. it tries to read the new RSP from the TSS for dpl=0x0. argl. this lead me to the conclusion that my IDT misses the IST entry for 0x2C...

so if you /just that moment/ wanted to answer the thread, thanks a lot, but the problem is solved now

OSDev.org

[SOLVED] Strange Pagefault on hardware interrupt.

[SOLVED] Strange Pagefault on hardware interrupt.

Re: Strange PF, LAPIC timer not firing in one-shot.

Re: Strange PF, LAPIC timer not firing in one-shot.

Re: Strange PF, LAPIC timer not firing in one-shot.