OSDev.org

The Place to Start for Operating System Developers
It is currently Mon Apr 29, 2024 11:04 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Including an unused variable corrupts the multiboot kernel
PostPosted: Mon Apr 08, 2024 7:41 am 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
Hello,

I've encountered a really strange bug in my nascent kernel development. Whilst PatienceOS is a C# bare metal kernel (nb. nothing close to an OS yet), the simplicity of the codebase and compilation to direct machine code means that it's nothing much more than the C barebones tutorial here.

Bootstrap Assembly: src
Linker template: src
Main function: src
Console struct: src
Build script: src

The checked-in code (above) builds and runs fine in QEMU. However, when I add a single line to the console struct (see below), a variable which is declared but never used/referenced, the kernel no longer boots in QEMU. Rather, the screen flashes as if the multiboot has been corrupted somehow.

Code:
private byte foregroundColor = 0x0F;


I'm guessing it's something to do with the packing of the struct (see here) and/or the memory alignment in the linker template, perhaps.

To be honest, I'm a little out of my depth, but I would really appreciate any suggestions as to how I can practically troubleshoot the situation. I'm more interested in learning how to go about understanding how to fix this, rather than seeking a silver bullet.

Frank


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 8:18 am 
Online
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4597
Location: Chichester, UK
I'd suggest that you run the kernel under a debugger. I'm not familiar with Windows debuggers, but gdb can run on Windows and works in cooperation with qemu. Ideally you debug in the high-level language, but I'm not familiar enough with C# to know how you could set that up. But the program is simple enough for you to just debug the assembly code directly.

Here's a link to gdb for Windows: https://rpg.hamsterrepublic.com/ohrrpgce/GDB_on_Windows

and using gdb with qemu: https://qemu-project.gitlab.io/qemu/system/gdb.html

Learning how to use a debugger is a very good discipline for OS development, and this provides an opportunity to gain that knowledge on a simple system.

I could say that all of this would be much easier if you were using C or Rust with a Linux development machine, but I'm guessing you don't want to hear that. ;)


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 8:18 am 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 694
I don't have an appropriate build environment to build this. Would you be able to make available the kernel.elf file that works (prior to the change) and the kernel.elf that doesn't work? You could put them somewhere in your Github repo.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 10:24 am 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
MichaelPetch wrote:
I don't have an appropriate build environment to build this. Would you be able to make available the kernel.elf file that works (prior to the change) and the kernel.elf that doesn't work? You could put them somewhere in your Github repo.


Thank you. I have placed both of them here: https://github.com/FrankRay78/PatienceOS/tree/main/private/Debugging

I load them in QEMU with the following command
Code:
qemu-system-i386 -kernel <kernel filename>.elf


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 10:28 am 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
Thank you for the advice and links to the debugger, I will seriously look into this more.

iansjack wrote:
I could say that all of this would be much easier if you were using C or Rust with a Linux development machine, but I'm guessing you don't want to hear that. ;)


Believe me, I really did try to get the toolchain working end to end on Linux. Explanation of my failed attempts are here: Commentary on the build environment. Something to come back to, in the fullness of time.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 11:00 am 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 694
I ran QEMU with these options to see what exceptions and interrupts were occurring:
Code:
qemu-system-i386 -kernel kernel-notworking.elf -d int -no-reboot -no-shutdown
I saw this:
Code:
     0: v=06 e=0000 i=0 cpl=0 IP=0008:00201006 pc=00201006 SP=0010:00207fd4 env->regs[R_EAX]=00000000
EAX=00000000 EBX=00009500 ECX=00207ff0 EDX=00010511
ESI=00000000 EDI=00002000 EBP=00207fe8 ESP=00207fd4
EIP=00201006 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000cb2b4 00000027
IDT=     00000000 000003ff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000014 CCD=00207fd4 CCO=SUBL
EFER=0000000000000000
v=06 is exception 0x06 (Invalid opcode). When I look at address 0x00201006 where the exception occurred I see this:
Code:
201006:       0f 57 e4                xorps  %xmm4,%xmm4
This is an SSE instruction. I didn't look at your code but I suspect the issue is because SSE instructions are not enabled in the processor before executing this code. I guess the option is to build without SSE instructions (don't know if you can do that with C#) or enable SSE instruction support. You can find code to do that here: https://wiki.osdev.org/SSE. In the working version of the kernel SSE instructions aren't being used. The change you made seems to have prompted some optimizations that include using SSE/SIMD.

Because you don't have an IDT set up with proper exception handlers the processor ends up triple faulting and reboots when it encounters the Invalid Opcode.

Note: I didn't connect a debugger to determine what was at address 0x00201006. I dumped the contents of the ELF file with this command:
Code:
objdump -Dx kernel-notworking.elf


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 3:37 pm 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
Thank you, MichaelPetch, that's incredibly helpful and very much appreciated. It's amazing seeing what you've done step by step.

For a moment there, I thought the solution would be trivial.
Code:
ilc --help
indicates a number of instruction sets can be used:

Code:
x86: base, sse, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2, aes, etc


and
Code:
qemu-system-i386 -cpu help | more
indicates the actual CPU can be specified:

Code:
Available CPUs:
x86 486                   (alias configured by machine type)
x86 486-v1
x86 Broadwell             (alias configured by machine type)
x86 Broadwell-IBRS        (alias of Broadwell-v3)
x86 Broadwell-noTSX       (alias of Broadwell-v2)
etc


So... I explicitly enabled sse in the compilation, and also set the CPU to pentium3 (which has sse support)

Code:
ilc --targetos windows --targetarch x86 --instruction-set base,sse --verbose kernel.ilexe -g -o kernel.obj --systemmodule kernel --map kernel.map -O
...
qemu-system-i386 -cpu pentium3 -kernel kernel.elf


But alas, the issue still remains.

I'll need to look into this further. I suspect it's either the .Net AOT compiler, ilc, not respecting the command line switch, or my native Windows install of QEMU (which they mark as 'experimental').

Massive progress though, and thanks once again.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 3:54 pm 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 694
Your compile process is already emitting SSE (causing the issues when you added the extra ember to the structure). That is the problem. You want to be able to turn that off (not on). The issue revolves around the fact that GRUB doesn't guarantee anything about whether the processors SSE support is enabled when transferring control to your kernel. It is likely not enabled even on processors that support SSE.

If you want to enable SSE in your kernel you have to programmatically turn it on. Adding the appropriate code to loader.asm before your kernel main is called is where that should be done. https://wiki.osdev.org/SSE has code to do that. I haven't tested this (it is based on the Wiki code) but I think the logic is correct:
Code:
_start:
    cli                   ; block interrupts
    mov esp, stack_space  ; set stack pointer

enablesse:
    ; Is SSE supported on this CPU?
    mov eax, 0x1
    cpuid
    test edx, 1<<25
    jnz .sse                   ; If SSE supported enable it.
.nosse:
    ; SSE not supported - do something like print an error and stop
    jmp $

.sse:
    ;now enable SSE and the like
    mov eax, cr0
    and ax, 0xFFFB             ; clear coprocessor emulation CR0.EM
    or ax, 0x2                 ; set coprocessor monitoring  CR0.MP
    mov cr0, eax
    mov eax, cr4
    or ax, 3 << 9              ; set CR4.OSFXSR and CR4.OSXMMEXCPT at the same time
    mov cr4, eax

    ; Call Main
    call __managed__Main

    ; Infinite loop
    hlt
    jmp $
If you choose not to disable SSE from your code generator, you will need your kernel to check for SSE *support* and if there is none do something (print an error) and go into an infinite loop informing the user that you need a CPU with SSE support. If there is SSE support in the processor then you need to enable the SSE instruction set.


Last edited by MichaelPetch on Mon Apr 08, 2024 4:45 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Mon Apr 08, 2024 4:09 pm 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 694
FrankRay78 wrote:
Code:
x86: base, sse, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2, aes, etc
I assume (just a guess) "base" would be code without SSE. If you can change to that then you may find the code works. From what you are saying SSE code generation could be disabled using `--instruction-set base` (notice I removed SSE). If you can't turn off code generation with SSE instructions you'll have to enable SSE at run time with code similar to what I have in my previous post.

I don't believe the problem here is with QEMU. Use QEMU as you were originally invoking it.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Tue Apr 09, 2024 12:03 am 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
Apologies MichaelPetch, it was late and my response was poor.

I did try everything with the ilc to prevent the sse code from being emitted. The instruction-set switch with only ‘base’ didn’t work. I trawled and trawled GitHub issues and could not find a single bit of documentation whether this was intended, or not. It was at that point I decided to see if I could force sse to be always on, but ran foul of (what I thought) was QEMU not behaving.

Today I plan to log an issue with Microsoft regarding the ‘base’ switch, to confirm whether that should be allowing sse optimisations, and in the meantime, enable sse support in my bootstrapper, which you’ve kindly pointed out. Requiring that startup assembly was a gap in my understanding, even though I was reading about what cpus supported which versions of sse.

Update - An issue has been logged with the Microsoft runtime/AOT team, here: ilc.exe is emitting the sse instruction, xorps, with --instruction-set base


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Tue Apr 09, 2024 3:33 pm 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
The answers given on the above GitHub issue I raised are clear and unambiguous, namely:

Quote:
Firstly, win-x86 is unsupported. Secondly, the baseline is SSE2.

and also

Quote:
The support for pre-SSE2 hardware was removed several years back and there is no interest in adding it back. We consider at least SSE, SSE2, CMOV, and CPUID as part of our baseline requirements.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Tue Apr 09, 2024 6:29 pm 
Offline
Member
Member

Joined: Fri Aug 26, 2016 1:41 pm
Posts: 694
So keep your build as it was before and modify loader.asm with the code I suggested. Hopefully if I haven't screwed anything up that should work. My code changes to loader.asm check if SSE is supported by the CPU. If it isn't supported it just goes into an infinite loop (you could add code to print an error to the display). If SSE is supported then I enable the SSE features. That should allow your kernel code to run even if it uses SSE.

Initializing the x87/FPU to a valid state probably isn't a bad idea either although that's not currently an issue for you. On some systems if you issue a x87 FPU instruction it may also cause an exception if not initialized ahead of time.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Tue Apr 09, 2024 8:35 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
If your compiler always uses SSE2, that means you'll need to save/restore the SSE registers in every kernel entry/exit point instead of only during a context switch. The same applies to any other registers your compiler might use, but most examples you'll see were written with the assumption that the compiler only uses general-purpose registers.

Most Linux distros require i686+SSE2 at minimum, but the Linux kernel (usually) doesn't use SSE registers.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Tue Apr 09, 2024 11:37 pm 
Online
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4597
Location: Chichester, UK
What happens if you initial the variables in the constructor rather than in the structure definition? Perhaps C# is using an inbuilt memory move routine (which often uses SSE instructions) when there are multiple initialized variables in a structure definition.

If that is the case then, IMO, C# isn’t a suitable tool for OS development. It would be interesting to know whether the same problem exists if open-source tools, such as mono, are used.


Top
 Profile  
 
 Post subject: Re: Including an unused variable corrupts the multiboot kern
PostPosted: Wed Apr 10, 2024 4:26 am 
Offline

Joined: Fri Jan 05, 2024 10:10 am
Posts: 15
Dear MichaelPetch, your suggestion worked and I'm very grateful, here's the commit: Enable cpu support for sse in bootstrap. I'm also very inspired to take seriously my OS learning, given how your support has opened my eyes to this truly fascinating subject.

Dear iansjack, I tried the following:

Code:
        private byte foregroundColor;

        public Console(int width, int height, FrameBuffer frameBuffer, byte foregroundColor = 0x0F)
        {
            this.width = width;
            this.height = height;
            this.frameBuffer = frameBuffer;
            this.foregroundColor = foregroundColor;
        }


and also without the default value specified on the constructor, both still result in the sse instruction being emitted.

I don't understand enough about how sse works, nor the memory move comments, and given the 32-bit AOT compiler isn't officially supported yet, I'm not sure what I can deduce from this. I'll read up some more, and probably inspect the generated IL (resulting in with/without sse) to see if that sheds any light.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot], SemrushBot [Bot] and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group