Page 1 of 1

Adding new code breaks loading the kernel

Posted: Sat Jan 14, 2023 10:39 am
by Elia
Hi, I've been reading around here for the past couple months while learning about OS internals.

Goal: Write a basic, minimal, x86 bootloader and 32bit kernel to fulfill my curiosity about OS internals. (full time job is in no way related, it's full on python).
Code: https://github.com/eliaonceagain/edu-x8 ... der-kernel

Checkpoints
- Real mode 16bit bootloader
- load gdt
- setup video mode
- enable protected mode
- setup interrupts
- setup tss
- create processes
- scheduler to round robin created processes on every clock interrupt
- enable paging

All the above "works". And I'm writing "works" because I'm sure there are stuff that are misconfigured but are magically working.

Current problem:
Adding any new C code makes the kernel not load.
Stage 1 bootloader (src/bootloader.asm) makes far jump to stage 2 (src/kernel_init.asm) and it remains stuck there.
New code could be as simple as:

Code: Select all

echo "void helloworld(){}" > src/filler.c
Suspect
Using bochs gui I've managed to pinpoint this to the instruction that loads the tss
src/kernel_init.asm -> setup_task_register -> ltr ax -> hang

Reproduction

Code: Select all

git clone https://github.com/EliaOnceAgain/edu-x86-bootloader-kernel.git
cd edu-x86-bootloader-kernel && echo "void helloworld(){}" > src/filler.c
make clean && make run
Requesting help
It seems that the core code is not stable enough but I'm missing a direction to follow.
Some questions in random order:
- Why is adding new code breaks loading the kernel?
- How to properly configure TSS? (tss segment defined in src/gdt.asm)
- Should I setup a stack in different way? (currently it's set in src/bootloader.asm as starting from 0x7C00 downwards)

Any tips / suggestions / changes related or unrelated to the above questions would be highly appreciated

Thanks,
Elia

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 8:45 am
by MichaelPetch
How big is your kernel when it stops loading properly. Looking at your bootloader you read a maximum of 15 sectors (7.5KiB). Any chance as your kernel has grown that it exceeds that? I'd try building your kernel but there are multiple redefinition errors when linking.

I see the multiple redefinitions is because you define global objects in your header files. If you load such a header file into more than one file those objects will be redefined in each C file that uses it. Put definitions of global objects in the .C file and an `extern` declaration in the header.

As an example in process.h you have these definitions:

Code: Select all

process_t *process_table[15];
int processes_count, curr_pid;
Declare them extern instead:

Code: Select all

extern process_t *process_table[15];
extern int processes_count, curr_pid;
Then in process.c you can define them somehwere after you include process.h like this:

Code: Select all

#include "process.h"
#include "vsa.h"        /* vsa_t, alloc()                                   */

process_t *process_table[15];
int processes_count, curr_pid;
There is a similar problem in paging.c/paging.h and scheduler.c/scheduler.h

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 9:32 am
by mtbro
Debugger is your friend. I use gdb. While it is a bit of pain to use in realmode you can make your life easier with some predefined macros.
Checking the actual state in memory is the best way to see what happened. And then single stepping in loading process to see what's happening in real time. In my bootloader I put signature at the end of the loader, defined in linker script .signature which helped me do a quick check in memory if all contents of the memory was loaded.

Stack should be set so you don't overwrite your data as it grows down and doesn't hit relevant section in memory (such as BIOS area at 0:400h).

There are way more qualified people here who could help you; I kept my notes of sort of standard memory layout in my pmbr code.

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 9:58 am
by iansjack
As Michael says, problems when you add code almost certainly mean that you are not loading the whole kernel.

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 11:43 am
by Elia
Thank you for the input, I will work on sorting out the multiple redefinitions.

Meanwhile, the size of the non-zero area in the kernel binary is <5kb so I expect 15sectors to cover it

While debugging further, encountered this:

Code: Select all

00017824976i[BIOS ] Booting from 0000:7c00
00017923971e[CPU0 ] LTR: doesn't point to an available TSS descriptor!
00017923971e[CPU0 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
00017923971e[CPU0 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
00017923971i[CPU0 ] CPU is in protected mode (active)
00017923971i[CPU0 ] CS.mode = 16 bit
00017923971i[CPU0 ] SS.mode = 16 bit
00017923971i[CPU0 ] EFER   = 0x00000000
00017923971i[CPU0 ] | EAX=60000028  EBX=00001c00  ECX=00092000  EDX=00000018
00017923971i[CPU0 ] | ESP=00007bd8  EBP=00007bfe  ESI=000e7ca9  EDI=0000ffac
00017923971i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af PF cf
00017923971i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00017923971i[CPU0 ] |  CS:0900( 0004| 0|  0) 00009000 0000ffff 0 0
00017923971i[CPU0 ] |  DS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] |  SS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] |  ES:0900( 0005| 0|  0) 00009000 0000ffff 0 0
00017923971i[CPU0 ] |  FS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] |  GS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00017923971i[CPU0 ] | EIP=0000025d (0000025d)
00017923971i[CPU0 ] | CR0=0x60000011 CR2=0x00000000
00017923971i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00017923971e[CPU0 ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
00017923971i[SYS  ] bx_pc_system_c::Reset(HARDWARE) called
00017923971i[CPU0 ] cpu hardware reset

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 11:58 am
by mtbro
I only glanced at your code but I see gdt_size_in_bytes has to be (6*8)-1. Similar correction is needed for IDT.
Some exceptions do push error code and may require different cleanup upon exit, you are probably not iret-ing properly from some of them.

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 1:06 pm
by Octocontrabass
You load your kernel to 0x9000, and your kernel is linked to run at 0x9000, but you jump to 0x0900:0x0000 to run it. You've already added several hacks to try to work around this discrepancy (using "- start" to fix addresses) when you could have instead jumped to 0x0000:0x9000 and avoided the problem entirely.

In your bootloader, you store DL (the boot drive) into memory using a memory reference relative to DS before you've set DS.

In load_gdt you disable interrupts, then in init_video_mode you call INT 0x10 which may return with interrupts enabled.

In enable_protected_mode you set CR0.PE without immediately loading CS with a new valid code selector. You must not place any instructions between the MOV that sets CR0.PE and the instruction that sets CS. It's also important to load data segment registers with protected mode data segments before using them to access memory in protected mode.

In remap_pic you enable all IRQs, including IRQs you are not yet prepared to handle.

You never set the upper bits of ESP.

Your kernel_main() returns, but start_kernel is not prepared to handle a return.

There may be other issues, I didn't look at everything.

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 1:40 pm
by MichaelPetch
Octocontrabass wrote:There may be other issues, I didn't look at everything.
The entering protected mode, doing `ret` and calling other functions in quasi 16-bit protected mode stood out to me.

Another issue I noticed is that the TSS itself is only a DWORD in size.

Re: Adding new code breaks loading the kernel

Posted: Tue Jan 17, 2023 3:24 pm
by MichaelPetch
I had a few more minutes to look at things. I think I can understand why adding a new file (in your case) when it appears the kernel isn't exceeding 7.5KiB yet (another issue that will bite you later) could make things fail where they might have worked previously (by luck).

The other issue is this - In your `makefile` you have:

Code: Select all

$(LNK) $(LDFLAGS) $(BUILDDIR)/*.$(OBJEXT) -o $(BUILDDIR)/kernel.elf
In general there isn't anything wrong with this but you are at the mercy of the of the order of object files returned by the file system. Since you are ultimately generating a BINARY file the entry point will be the first code in memory loaded at 0x0900:0x0000. The problem is that there isn't a guarantee that `kernel_init.o` is listed first with your linker command line. That code you want to be executed before anything else. If the first object file happens to be `filler.o` that will start first, and probably `ret` into no mans land.

To fix this you can alter your `linker.ld` file to ensure the `.text` section of obj/kernel_init.o is always first. So this should probably fix that issue:

Code: Select all

.text 0x09000 :
  {
    code = .; _code = .; __code = .;
    obj/kernel_init.o(.text)
    *(.text)
  }
This way no matter what order the objects are listed when linking, the .text (code) section of obj/kernel_init.o will be first.

Octocontrabass previously commented with a number of things that should be fixed. But once you start getting into Ring 3 with interrupts/exceptions you are going to need a proper TSS structure. For the moment you have it defined as a DWORD and that is quite a substantial problem for user mode (ring 3). I wrote a Stackoverflow question/answer about the TSS here https://stackoverflow.com/questions/548 ... an-io-bitm that contains information, a structure, and some external links. The TSS has a bit of a sordid history.

Re: Adding new code breaks loading the kernel

Posted: Mon Jan 23, 2023 12:32 pm
by Elia
Thank you, everyone. Appreciate your insights and comments. It was of tremendous help.
I fixed (most) of the mentioned points, and still have to dive deeper into TSS / IDT to make sure they behave correctly.
The problem is that there isn't a guarantee that `kernel_init.o` is listed first with your linker command line
This was it. In alphabetical order it always came first.

Cheers,
Elia

PS. always a pleasure to read your SO answers MichaelPetch.