Page 1 of 2

James Malloy OS; modern infra

Posted: Tue Apr 15, 2025 12:13 pm
by teverett
Hello OS Dev Community.

I've been trying to build the Malloy examples up to chapter 10 with modern gcc and to get GitHub actions working for it. I have a repo here, which uses the Malloy code without any changes, and compiles on both linux and OS x if I use the gcc11 standard.

However, when I start it, somewhere in "initialise_paging();" it seem to crash with:

"unhandled interrupt: 0x6", which I think is a QEMU message.

0x6, I think, is an "invalid opcode" and there is inline assembler called initialise_paging(). For example in paging.c

https://github.com/teverett/JamesMolloy ... c/paging.c

Code: Select all

void switch_page_directory(page_directory_t *dir)
{
    current_directory = dir;
    asm volatile("mov %0, %%cr3":: "r"(dir->physicalAddr));
    u32int cr0;
    asm volatile("mov %%cr0, %0": "=r"(cr0));
    cr0 |= 0x80000000; // Enable paging!
    asm volatile("mov %0, %%cr0":: "r"(cr0));
}
also

Code: Select all

void page_fault(registers_t *regs)
{
    // A page fault has occurred.
    // The faulting address is stored in the CR2 register.
    u32int faulting_address;
   asm volatile("mov %%cr2, %0" : "=r" (faulting_address));
There are some references on the wiki to gcc generating different code with different versions of gcc and I'm not on gcc 4.8, 'm on 14.2.0

https://wiki.osdev.org/James_Molloy%27s ... th_GCC_4.8

Has anyone else run into this issue with "invalid opcode"?

Re: James Malloy OSl; modern infra

Posted: Tue Apr 15, 2025 12:56 pm
by Octocontrabass
teverett wrote: Tue Apr 15, 2025 12:13 pmI've been trying to build the Malloy examples up to chapter 10 with modern gcc and to get GitHub actions working for it.
Have you seen the list of known bugs? You might be better off starting from scratch.
teverett wrote: Tue Apr 15, 2025 12:13 pmHowever, when I start it, somewhere in "initialise_paging();" it seem to crash with:

"unhandled interrupt: 0x6", which I think is a QEMU message.
It isn't.
teverett wrote: Tue Apr 15, 2025 12:13 pm0x6, I think, is an "invalid opcode" and there is inline assembler called initialise_paging().
What debugging did you do to determine that the inline asm is causing the exception? Did you check QEMU's interrupt log ("-d int") for the address of the faulting instruction and use objdump or addr2line to find the corresponding line in the source code? Or are you just guessing?

Re: James Malloy OSl; modern infra

Posted: Tue Apr 15, 2025 4:58 pm
by teverett
Thanks Octocontrabass

So it turns out that with the right compiler, things work better. I was using X86_64-elf which didn't work. Things are better with i686-elf. I now get to the last step of https://github.com/teverett/JamesMolloy ... src/main.c

switch_to_user_mode();

and I can confirm that this function is entered but doesnt exit. The code is here

Code: Select all

void switch_to_user_mode()
{
    // Set up our kernel stack.
    set_kernel_stack(current_task->kernel_stack+KERNEL_STACK_SIZE);
    
    // Set up a stack structure for switching to user mode.
    asm volatile("  \
      cli; \
      mov $0x23, %ax; \
      mov %ax, %ds; \
      mov %ax, %es; \
      mov %ax, %fs; \
      mov %ax, %gs; \
                    \
       \
      mov %esp, %eax; \
      pushl $0x23; \
      pushl %esp; \
      pushf; \
      pushl $0x1B; \
      push $1f; \
      iret; \
    1: \
      "); 
      
}
So it's got the fixes recommended here https://wiki.osdev.org/James_Molloy%27s ... aging_Code

the error is that execution ends up in paging.c

Code: Select all

void page_fault(registers_t *regs)
{
    // A page fault has occurred.
    // The faulting address is stored in the CR2 register.
    u32int faulting_address;
    asm volatile("mov %%cr2, %0" : "=r" (faulting_address));
    
    // The error code gives us details of what happened.
    int present   = !(regs->err_code & 0x1); // Page not present
    int rw = regs->err_code & 0x2;           // Write operation?
    int us = regs->err_code & 0x4;           // Processor was in user-mode?
    int reserved = regs->err_code & 0x8;     // Overwritten CPU-reserved bits of page entry?
    int id = regs->err_code & 0x10;          // Caused by an instruction fetch?

    // Output an error message.
    monitor_write("Page fault! ( ");
    if (present) {monitor_write("present ");}
    if (rw) {monitor_write("read-only ");}
    if (us) {monitor_write("user-mode ");}
    if (reserved) {monitor_write("reserved ");}
    monitor_write(") at 0x");
    monitor_write_hex(faulting_address);
    monitor_write(" - EIP: ");
    monitor_write_hex(regs->eip);
    monitor_write("\n");
    PANIC("Page fault");
}
So i'm not sure how I'm ending up with a page fault.

Re: James Malloy OSl; modern infra

Posted: Tue Apr 15, 2025 5:33 pm
by Octocontrabass
teverett wrote: Tue Apr 15, 2025 4:58 pmSo i'm not sure how I'm ending up with a page fault.
It looks like your page fault handler tells you enough information to start debugging, but QEMU's interrupt log may also be useful.

If you need help debugging, share that information with us.

Re: James Malloy OSl; modern infra

Posted: Tue Apr 15, 2025 5:55 pm
by teverett
Well honestly my goal is simply a working build in GitHub that others can use.

Here is the screen cap:
Screenshot 2025-04-15 at 5.50.18 PM.jpg

Re: James Malloy OSl; modern infra

Posted: Tue Apr 15, 2025 8:53 pm
by Octocontrabass
teverett wrote: Tue Apr 15, 2025 5:55 pmHere is the screen cap:
The page fault was caused by a protection violation from a user-mode write to address 0x6ebc by the instruction at address 0x1035c3.

What is the instruction at address 0x1035c3? Which line of code does it correspond to? Does it make sense for that code to be writing to address 0x6ebc? Should address 0x6ebc be writable and accessible to user mode?

Re: James Malloy OSl; modern infra

Posted: Wed Apr 16, 2025 9:04 pm
by teverett
ok here are the last 4 lines of main.c

Code: Select all

    monitor_write("switching to usermode\n");
    switch_to_user_mode();

   // syscall_monitor_write("Hello, user world!\n");
    return 0;
With that syscall_monitor_write commented out, there is no crash. So I am tempted to think that the issue is with "syscall_monitor_write".

With respect to the question of

What is the instruction at address 0x1035c3?

I looked in the kernel.map file, which I hope is the right place to look. I now have this crash:
Screenshot 2025-04-16 at 8.59.36 PM.png
The instruction at 0x10219b is:

Code: Select all

 .text          
 		0x0010219a      0x478 bin/monitor.o
                0x001022af                monitor_put
                0x001023d7                monitor_clear
                0x0010242e                monitor_write
                0x00102472                monitor_write_hex
                0x00102538                monitor_write_dec
This seems somewhat reasonable since I would expect "syscall_monitor_write" to eventually enter monitor.o?

Re: James Malloy OSl; modern infra

Posted: Wed Apr 16, 2025 9:18 pm
by Octocontrabass
teverett wrote: Wed Apr 16, 2025 9:04 pmI looked in the kernel.map file, which I hope is the right place to look.
No, you have to look at a disassembly to find the instruction. Try using objdump, or set your debugger to display a disassembly of the code when you're stepping through it.

Re: James Malloy OSl; modern infra

Posted: Wed Apr 16, 2025 9:31 pm
by teverett
like this, from objdump?

Code: Select all

  syscall_monitor_write("Hello, user world!\n");
  102198:	83 ec 0c             	sub    $0xc,%esp
  10219b:	68 4e 42 10 00       	push   $0x10424e
  1021a0:	e8 59 0f 00 00       	call   1030fe <syscall_monitor_write>
  1021a5:	83 c4 10             	add    $0x10,%esp
    return 0;
  1021a8:	b8 00 00 00 00       	mov    $0x0,%eax

Re: James Malloy OSl; modern infra

Posted: Wed Apr 16, 2025 9:49 pm
by Octocontrabass
teverett wrote: Wed Apr 16, 2025 9:31 pmlike this, from objdump?
Yep, that's it.
teverett wrote: Wed Apr 16, 2025 9:31 pm

Code: Select all

  10219b:	68 4e 42 10 00       	push   $0x10424e
Looks like your stack is around 0x6ed0, and it appears to be read-only (at least in user mode; CR0.WP controls whether pages may be read-only in kernel mode).

Is that where your stack should be? When you set up the pages for your stack, did you make them writable?

Re: James Malloy OS; modern infra

Posted: Thu Apr 17, 2025 8:19 am
by teverett
Firstly @Octocontrabass thank-you.

Ok, so what you are saying is "The issue is the push instruction, and therefore most likely the stack pointer is in a weird place".

I'll take a further look at the code and hope to understand how the stack got set up. 0x6ed0 does seem to be an odd place to put a stack :)

Re: James Malloy OS; modern infra

Posted: Thu Apr 17, 2025 9:11 pm
by Octocontrabass
teverett wrote: Thu Apr 17, 2025 8:19 amOk, so what you are saying is "The issue is the push instruction, and therefore most likely the stack pointer is in a weird place".
Correct.
teverett wrote: Thu Apr 17, 2025 8:19 amI'll take a further look at the code and hope to understand how the stack got set up. 0x6ed0 does seem to be an odd place to put a stack :)
Was the stack set up at all? That's one of the bugs you'll have to fix.

Re: James Malloy OS; modern infra

Posted: Fri Apr 18, 2025 4:10 pm
by teverett
Ok after some re-learning, I got the debugger up. Just before calling

Code: Select all

syscall_monitor_write
, the processor state looks like:
Screenshot 2025-04-18 at 4.08.06 PM.jpg
So, esp is 0x6ef8. I am fairly sure that is the initial SP passed to main.c. So no, the stack is not set up. That would explain quite a lot.

Re: James Malloy OS; modern infra

Posted: Sat Apr 19, 2025 12:26 pm
by teverett
So, I see this on the page here.

https://wiki.osdev.org/Stack#Setup_the_stack

Code: Select all

SECTION .text

set_up_stack:
    
    MOV  ESP, stack_end  ; Set the stack pointer

SECTION .bss

stack_begin:
    RESB 4096  ; Reserve 4 KiB stack space
stack_end:
I presume this is a typo of some sort? There shouldn't be an assembler instruction in what appears to be loader configuration?

Re: James Malloy OS; modern infra

Posted: Sat Apr 19, 2025 1:39 pm
by teverett
ok so here is a modified link.ld which sets aside some space for a stack

Code: Select all

ENTRY(start)

STACK_SIZE = 0x100;

SECTIONS
{
    .mbheader 0x100000 :
    {
        *(.mbheader)
    }
    .text :
    {
        code = .; _code = .; __code = .;
        *(.text)
        . = ALIGN(4096);
    }

    .data :
    {
        data = .; _data = .; __data = .;
        *(.data)
        *(.rodata)
        . = ALIGN(4096);
    }

    .stack (NOLOAD):
    {
        . = ALIGN(8);
        stack_bottom = .;
        . = . + STACK_SIZE;
        stack_top = .;
        . = ALIGN(8);
    } 

    .bss :
    {
        bss = .; _bss = .; __bss = .;
        *(.bss)
        . = ALIGN(4096);
    }


    end = .; _end = .; __end = .;
}
and here is modified main.c that sets it up

Code: Select all

extern u32int stack_bottom;
extern u32int stack_top;
extern u32int placement_address;
u32int initial_esp;

int main(struct multiboot *mboot_ptr, u32int initial_stack)
{
    // find the stack defined in the linker script
    u32int kernel_stack_bottom = (u32int) &stack_bottom;
    u32int kernel_stack_top = (u32int) &stack_top;
    
    // save the initial stack pointer (task.c::move_stack needs this)
    initial_esp = kernel_stack_bottom;

    // set the stack up
    asm volatile("mov %0, %%esp" : : "r" (kernel_stack_bottom));
The OS now boots successfully into user-mode.
Screenshot 2025-04-19 at 1.37.45 PM.jpg
A giant thank-you to Octocontrabass for guiding me through the fixes.

So, an open question: I see lots of GitHub repo's with the James Molloy code in them from the tutorial. Did they ever actually boot?