Page 1 of 1

Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 12:14 am
by Ethin
So I have absolutely no idea what's going on here. I recently updated my qemu version to qemu 6.2.0. I also changed the compile target for my kernel (Rust recently added an "x86_64-uknown-none" target for bear metal targets like OSes) and now whenever I run my kernel in qemu it receives signal SIGQUIT when its trying to initialize its internal logger. I've no idea why this is happening; the target specification I switched to is no different from the one I was using other than the fact that I no longer have to rely on a JSON file to tell Rustc how to build my kernel. My interrupt handlers aren't even initialized so I have no idea if this is a triple fault or what (it might be, though I've no idea why that'd happen since I'm not to my knowledge doing anything that would cause that -- my kernel has pretty much just executed its first couple instructions, which don't do anything that would generate exceptions). Specifically, the line it runs that causes this to happen is:

Code: Select all

set_logger(&LOGGER).unwrap();
Stepping instruction-by-instruction tells me that the system only executes four instructions before it dies. Specifically, those instructions, I believe, are:

Code: Select all

   0x0000000000019a16 <+22>:	lea    -0xaf2d(%rip),%r14        # 0xeaf0
   0x0000000000019a1d <+29>:	lea    0x59f54(%rip),%rsi        # 0x73978
   0x0000000000019a24 <+36>:	mov    %r14,%rdi
=> 0x0000000000019a27 <+39>:	call   *0x63ee3(%rip)        # 0x7d910
The problem is that I can't confirm that the set_logger function is the problem because it dies as soon as the call instruction is executed. (I don't even know if the instructions I'm getting are correct, though they just might be.) Is this a bug in Qemu or my code? I'm curious because in the past when a triple fault occurs it didn't do a SIGQUIT so...
Edit: so it definitely appears to be a triple fault. Specifically, a page fault. However, I have no idea why its being caused -- none of my code writes to raw memory, and the code for set_logger/set_logger_racy is simple:

Code: Select all

#[cfg(atomic_cas)]
pub fn set_logger(logger: &'static dyn Log) -> Result<(), SetLoggerError> {
    set_logger_inner(|| logger)
}
All set_logger_inner does is atomically set the logger to use.

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 11:41 am
by Octocontrabass
Ethin wrote:Edit: so it definitely appears to be a triple fault. Specifically, a page fault. However, I have no idea why its being caused -- none of my code writes to raw memory, and the code for set_logger/set_logger_racy is simple:
Any memory access can cause a page fault. What's the error code from the page fault? What sets up the page tables before your kernel runs?

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 12:02 pm
by iansjack
A combination of the address being accessed, the type of access requested (from the error code), and the information from the qemu monitor "info mem" command should go a long way to tracking down the problem.

My guess is that the address of your "set_logger" function isn't properly mapped - the "info mem" command will let you know if that is so.

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 3:54 pm
by Ethin
Octocontrabass wrote:
Ethin wrote:Edit: so it definitely appears to be a triple fault. Specifically, a page fault. However, I have no idea why its being caused -- none of my code writes to raw memory, and the code for set_logger/set_logger_racy is simple:
Any memory access can cause a page fault. What's the error code from the page fault? What sets up the page tables before your kernel runs?
If e is the error code in qemu logs (which I think it is) then the error code is 0011. Specifically, Qemu outputs the following exception information:
check_exception old: 0xffffffff new 0xe
756: v=0e e=0011 i=0 cpl=0 IP=0008:0000000000000000 pc=0000000000000000 SP=0010:00000080007fef28 CR2=0000000000000000
RAX=000000000001a1d0 RBX=000000007ff11690 RCX=000000007ff11018 RDX=00000080007ff000
RSI=0000000000073b30 RDI=000000000000e8d0 RBP=000000007ff116b8 RSP=00000080007fef28
R8 =0000000000002000 R9 =0000000000000029 R10=000000000000007b R11=0000000000000000
R12=000000007ff11600 R13=0000000000000320 R14=000000000000e8d0 R15=000000007ff110e0
RIP=0000000000000000 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= 0000000001573000 00000017
IDT= 000000007f67e018 00000fff
CR0=80010033 CR2=0000000000000000 CR3=0000000000002000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000004 CCD=00000080007fef30 CCO=EFLAGS
EFER=0000000000000d00
check_exception old: 0xe new 0xe
757: v=08 e=0000 i=0 cpl=0 IP=0008:0000000000000000 pc=0000000000000000 SP=0010:00000080007fef28 env->regs[R_EAX]=000000000001a1d0
RAX=000000000001a1d0 RBX=000000007ff11690 RCX=000000007ff11018 RDX=00000080007ff000
RSI=0000000000073b30 RDI=000000000000e8d0 RBP=000000007ff116b8 RSP=00000080007fef28
R8 =0000000000002000 R9 =0000000000000029 R10=000000000000007b R11=0000000000000000
R12=000000007ff11600 R13=0000000000000320 R14=000000000000e8d0 R15=000000007ff110e0
RIP=0000000000000000 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= 0000000001573000 00000017
IDT= 000000007f67e018 00000fff
CR0=80010033 CR2=000000007f67e0f8 CR3=0000000000002000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000004 CCD=00000080007fef30 CCO=EFLAGS
EFER=0000000000000d00
check_exception old: 0x8 new 0xe
My bootloader is setting up the page tables. The problem is that this didn't happen before I upgraded rustc and changed targets; I thought it was the bootloader at first, so I downgraded to a previous version and it still happens. It can't be the target, because the target specification is identical to my old one with the only difference being that it disables more features (e.g. it disables code generation of SSSE3 for example) and it allows me to re-enable them at my discretion (though I won't need to since I'm never going to be using SSE or AVX in my kernel). I've considered the compiler angle and have asked about it on the Rust users forum because it very well might be a compiler bug (the assembly is, to put it blunt, incredibly weird; I've never seen such a huge offset relative to rIP (0x63F7A) ever and have no idea why its generating that because I'm pretty sure that's outside my kernel).

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 4:21 pm
by Ethin
iansjack wrote:A combination of the address being accessed, the type of access requested (from the error code), and the information from the qemu monitor "info mem" command should go a long way to tracking down the problem.

My guess is that the address of your "set_logger" function isn't properly mapped - the "info mem" command will let you know if that is so.
Yeah, I don't think it is. The output from info mem is:
0000000000000000-0000000000073000 0000000000073000 -r-
0000000000073000-0000000000089000 0000000000016000 -rw
0000000001573000-0000000001574000 0000000000001000 -r-
0000000140006000-0000000140008000 0000000000002000 -r-
0000008000000000-0000008000800000 0000000000800000 -rw
0000010000000000-00000100001d5000 00000000001d5000 -rw
0000018000000000-0000018280000000 0000000280000000 -rw
0000020000000000-0000020000001000 0000000000001000 -rw
And the registers are:
RAX=000000000001a1d0 RBX=000000007ff11690 RCX=000000007ff11018 RDX=00000080007ff000
RSI=0000000000073b30 RDI=000000000000e8d0 RBP=000000007ff116b8 RSP=00000080007fef28
R8 =0000000000002000 R9 =0000000000000029 R10=000000000000007b R11=0000000000000000
R12=000000007ff11600 R13=0000000000000320 R14=000000000000e8d0 R15=000000007ff110e0
RIP=0000000000000000 RFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= 0000000001573000 00000017
IDT= 000000007f67e018 00000fff
CR0=80010033 CR2=000000007f67e098 CR3=0000000000002000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d00
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=0000000000000000 0000000000000000 XMM01=0000000000000000 0000000000000000
XMM02=0000000000000000 0000000000000000 XMM03=0000000000000000 0000000000000000
XMM04=0000000000000000 0000000000000000 XMM05=0000000000000000 0000000000000000
XMM06=0000000000000000 0000000000000000 XMM07=0000000000000000 0000000000000000
XMM08=0000000000000000 0000000000000000 XMM09=0000000000000000 0000000000000000
XMM10=0000000000000000 0000000000000000 XMM11=0000000000000000 0000000000000000
XMM12=0000000000000000 0000000000000000 XMM13=0000000000000000 0000000000000000
XMM14=0000000000000000 0000000000000000 XMM15=0000000000000000 0000000000000000

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 4:31 pm
by Ethin
Okay, so I just tried to load the IDT and GDT to see if I could at least catch the exception (I wouldn't see anything on the serial port but I would see something in GDB) and it dies as soon as I try to set the GDT up. So this makes me wonder if my kernel is mapped correctly at all to begin with, because it seems like any function I try to call from there causes an exception. I'm just extremely confused on why its happening because it doesn't make any sense (my kernel isn't mapped differently at all from the way it was before, that's never changed). Could it be a bug in QEMU or new CPU features (it happens both with TCG and KVM)?

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 7:11 pm
by Octocontrabass
Ethin wrote:If e is the error code in qemu logs (which I think it is) then the error code is 0011.
That error means you're trying to execute code from a page that's not executable. CR2 is 0, which suggests you're jumping to a null pointer. It looks like the compiler is using relocations that the bootloader isn't able to resolve.

Re: Very strange signal being received in qemu

Posted: Sat Jan 01, 2022 7:42 pm
by Ethin
Octocontrabass wrote:
Ethin wrote:If e is the error code in qemu logs (which I think it is) then the error code is 0011.
That error means you're trying to execute code from a page that's not executable. CR2 is 0, which suggests you're jumping to a null pointer. It looks like the compiler is using relocations that the bootloader isn't able to resolve.
I saw that, I hadn't thought of the relocation model being the problem. Oops!