Calling the Kernel on Real Hardware

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Calling the Kernel on Real Hardware

Post by human00731582 »

Hello, friends! I've been going back and forth for about a week now trying to decide whether or not it's really worth possibly getting flamed over posting this, but I'm really getting torn about not being able to finish the call to the kernel from my bootloader on the metal.

Basically, I have a custom bootloader for legacy systems that uses the BIOS extended read INT 13h. It sets the kernel to load at address 0x1000, and while it is entirely functional on the emulators, I'm having some real pains not being able to find out why it does not work on bare metal. Perhaps one of the wiser members of this place can guide me, and ultimately anyone else who has this problem as well, toward the correct solution.

For reference: the machine I am trying to boot on is an HP t5000 slim model, with a Phoenix AwardBIOS and a VIA Esther C7 (400MHz) processor. It has 64MB of onboard flash memory, and 128MB of RAM. When I got it as a gift, I just knew it'd be the perfect legacy model to test with real hardware. I have considered that perhaps this PC is the problem and not the code, I do not know. I am booting from a flash drive. All of my code has worked up until the kernel call. Nothing in my code reaches even close to addresses towards the end of the small on-board RAM.

I know this is asking a lot, but the boot code actually will triple-fault once before even getting to the kernel call, then on restart it will reach the kernel call. Then once it finally reaches the kernel call, it never makes the jump. What gives??

Code: Select all

[BITS 16]
_bootLoadKernel:
	
	push si
	mov ah, 0x42
	mov dl, [bDrive]
	mov si, DiskAddrPkt

	int 0x13		; Read the disk into memory
	jc .errorRead
	pop si

	ret
 .errorRead:
	mov si, szDiskReadError
	call _Bootloopstr
	jmp $

......

ALIGN 16
DiskAddrPkt:
	db 0x10			; Packet size (16 bytes)
	db 0				; Reserved (0)
	dw 0x0010			; Blocks to transfer (16x512 = 8192 = sizeof kernel)
	dw KERNEL_OFFSET	; OFFSET (KERNEL_OFFSET = 0x1000)
	dw 0x0000			; SEGMENT
	dd 0x00000001		; start at sector 1
	dd 0x00000000
Forgive me if it's perhaps a misunderstanding with the DiskAddrPkt (though I doubt it), it's been quite hard to find accurate documentation about the function (or I'm just looking in the wrong places :) ). Up next is the other section of my boot code, with quite a bit blurred out for ease of reading. This is where the kernel is called. I've checked again and again: the memory map for x86 has ~30 KiB free for use between 0x0700 and 0x7BFF, so there is no way my 8 KiB kernel loading at 0x1000 is trashing anything. I've also tried disabling the paging scheme altogether, to no avail.

Code: Select all

_globalBootStart:
mov [bDrive], dl			; Capture the drive number before anything else.
; Clear the segments.
xor ax, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov ax, 0x7C00
mov sp, ax

call _bootLoadKernel   ; This is the function from the code above. It does work on all machines, no read errors.
jmp near A20_setup

A20_setup:
; Key-press initialization is required on the bare metal machine I am using.
; -- Skipping this for the sake of brevity.
.........
	
_globalBootProtMode:
cli

xor ax, ax			; Video mode	
mov al, 03h				
int 0x10				

lgdt [gdt_descriptor]	

mov eax, cr0               ; Protected Mode.
or eax, 0x1
mov cr0, eax

jmp CODE_SELECTOR:_bootInitializeSegments

[BITS 32]
_bootInitializeSegments:

	mov eax, DATA_SELECTOR	; flushing segments
	mov ds, ax				; this completes gdt_init process
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov ss, ax	
	
	mov ebp, 0x90000
	mov esp, ebp
	
        ; --- Paging ---
        pushad
	mov edi, 70000h	
	push edi		
	mov ecx, 800h		
	xor eax, eax
	rep stosd 
	pop edi
	
	lea eax, [edi + 0x1000]
	or eax, 011b
	mov [edi], eax
	
	lea eax, [edi + 0x2000]
	or eax, 011b		
	mov [edi + 0x1000], eax	
	
	push edi				
	mov dword edx, [pageBasePtr]	
	add edx, 011b			; Flags: PAGE_RW|PAGE_PR
	lea edi, [edi + 0x2000]
	mov eax, edx			

 ._bootLoopBuildPageTable:	
	mov [edi], eax			
	add eax, 0x1000
	add edi, 8				
	cmp eax, 0x200000		
	jb ._bootLoopBuildPageTable
	
        ; Set flags in control registers.
	pop edi
	mov edx, edi
	mov cr3, edx
	
	mov eax, cr4			; Get CR4 register values.
	or eax, 10100000b		; Enable bits 5&7, PAE&PGE
	mov cr4, eax			
	
	mov eax, cr0			
	or eax, 0x80000001		
	mov cr0, eax		

	popad
	jmp near _bootInitKernel
	
	
_bootInitKernel:
	call KERNEL_OFFSET             ; <----- This does NOT work on bare metal. CPU triple-faults. JMP does not work either.
	jmp $	                                ; hang if the kernel returns for any reason (should not happen)
Thank you all so much for any input you could have on this situation. I'd like to add, as I've said before, that I'm very level-headed about this process: I understand the dedication it takes and I'm very enthusiastic about being a part of the OSDev community. I'm sure the solution to this will be very helpful to the future moderate OSDevers like myself. 8)
2024-05-07: Returning from a 7-year disappearing act; please be kind.
User avatar
BenLunt
Member
Member
Posts: 941
Joined: Sat Nov 22, 2014 6:33 pm
Location: USA
Contact:

Re: Calling the Kernel on Real Hardware

Post by BenLunt »

human00731582 wrote:

Code: Select all

[BITS 16]

_globalBootStart:
mov [bDrive], dl			; Capture the drive number before anything else.
; Clear the segments.
xor ax, ax
mov es, ax
mov fs, ax
mov gs, ax

 ..............

_bootLoadKernel:
	
	push si
	mov ah, 0x42
	mov dl, [bDrive]
	mov si, DiskAddrPkt
Without looking further, and without knowing where _bootLoadKernel fits into the scheme of the DS segment adjustment, my guess is that you are using an unknown area for the DS segment access.

On some BIOSes, DS may be 0x0040, others, who knows. The emulator(s) may have DS = 0x0000.[1]

If you use the following code before you set the DS register,

Code: Select all

	mov dl, [bDrive]
you might be overwriting something the BIOS needs to preserve. If bDrive is at offset 0x0040 in your boot code (for example), relative to the start of you code, if DS = 0x0040, you are actually writing DL to address 0x0440, the floppy tick status location.

Don't use a memory access using DS until you have set DS to something you know is not volatile memory.

Just a guess.
Ben

[1] I think that there was a post at alt.os.development one time that listed the register values at boot time for some emulators, the values after POST and just as the BIOS relinquishes control to 0x07C00. However, you can not rely on any of the register values except that the CS:IP register pair will point to 0x07C00, the DL register, on some BIOSes the DH register, and on PnP BIOSes, the ES:DI register pair. That's it.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: Calling the Kernel on Real Hardware

Post by human00731582 »

BenLunt wrote: Without looking further, and without knowing where _bootLoadKernel fits into the scheme of the DS segment adjustment, my guess is that you are using an unknown area for the DS segment access.
Ben,

Thank you for your recommendation. I can't believe I missed the DS initialization... #-o
However, the issue still remains, and both problems are the exact same as before. That is an excellent point about extracting DL after initializing the segments, and I have adjusted my code accordingly!

-human
2024-05-07: Returning from a 7-year disappearing act; please be kind.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Calling the Kernel on Real Hardware

Post by LtG »

Do you set ORG somewhere? Does it match all your memory references (segment + offsets)?

Other than that, you may want to start by testing that your error printing actually works on this specific piece of hardware, so immediately after setting segments to known values print an error message and hang. Then work from there moving the "hang" part forward and check relevant values until you find out where it goes bad.

After you find that others here might be in a better position to help you, or might be that at that point you already know the cause and can fix it yourself.

edit. Forgot to mention, you don't seem to set CS? Without setting it you won't know what it is...
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: Calling the Kernel on Real Hardware

Post by human00731582 »

LtG wrote:Do you set ORG somewhere? Does it match all your memory references (segment + offsets)?
...
edit. Forgot to mention, you don't seem to set CS? Without setting it you won't know what it is...
It was a part of the section I didn't really mention, but yes, my ORG is set to 0x7C00 as per the standard procedure. I believe the CS:IP is set to 0x0000:0x7C00 (or 0x07C0:0x0000, whichever you prefer) after the BIOS is finished, right? So I thought the CS doesn't need to be initialized because that's part of the initialization done by the BIOS. I have tried to edit it recently, before LGDT, but it is a guaranteed triple-fault, no matter which value I put into it.
LtG wrote:Other than that, you may want to start by testing that your error printing actually works on this specific piece of hardware, so immediately after setting segments to known values print an error message and hang. Then work from there moving the "hang" part forward and check relevant values until you find out where it goes bad.

After you find that others here might be in a better position to help you, or might be that at that point you already know the cause and can fix it yourself.
I have already done this plenty of times, back and forth all weekend long. I mentioned that the code specifically has a problem with the call to KERNEL_OFFSET, which I know because I have done this sort of rigged debugging.
2024-05-07: Returning from a 7-year disappearing act; please be kind.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: Calling the Kernel on Real Hardware

Post by human00731582 »

Quick update...

I've changed the jump to my kernel away from some of the lower memory areas, just in case anything strange with the BIOS was going on there. It is now being loaded to physical address 0x10000, as per "convention", yet the hardware will still not run it.

If it is worth mentioning, the paging isn't working with the PC, but works with the emulator. On the PC only, the CR4 change will cause a triple-fault if I don't blank it out of the code. It works 100% on the emulator. Perhaps this is related?

Also, A20 on this PC requires that I press a key on the keyboard to activate it. I've done the "cmp 0x012345, 0x112345" tests to check bit 20 after the key-press and it says they are not equal, so I'd say A20 works. This issue's really weighing me down, I'd love to see even a hint of my kernel operating on real hardware.
2024-05-07: Returning from a 7-year disappearing act; please be kind.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: Calling the Kernel on Real Hardware

Post by Octocontrabass »

Do you have an online code repository so we can see all of your code? The problem with your bootloader might be in one of the parts you haven't shown us.
human00731582 wrote:Also, A20 on this PC requires that I press a key on the keyboard to activate it.
It sounds like your code to enable A20 doesn't actually work correctly.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Calling the Kernel on Real Hardware

Post by LtG »

I'm a bit tired, so maybe I missed it, but where is KERNEL_OFFSET defined? Where does its value come from? I saw that it is supposed to be 0x1000, but where do you set it?

IIUC, the KERNEL_OFFSET points to the first byte of your kernel that is loaded by your bootloader. Have you verified that it is valid? For instance making your kernels first instruction HLT or infinite jump to itself? Also you could try to print the value in hex, just to ensure that what you are loading is actually valid and to "prove" that it's the CALL itself and not what happens after the call.

I assume that if you add an infinite jump right before the CALL that you don't get a triple fault? And if the above is also done (both ways) to prove that the CALL never actually occurs, then you can focus on the possible failure modes of CALL itself, like stack problems, etc.


edit. Just to mention the obvious, emulators often have memory zeroed for practical reasons where as real hardware often has memory set to more or less random values. I'm not sure if any of the emulators/simulators have an option for "fuzzed" memory at boot, which you could try to replicate the issue, making it easier to see what exactly is going wrong. So that's an alternative avenue to explore, recreate it on emulators/simulators. If none of them support it, it might be relatively easy (or hard, I don't know) to add such functionality to the open source variants.. I just wanted to mention this avenue in case nothing else really works for you.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: Calling the Kernel on Real Hardware

Post by human00731582 »

Octocontrabass wrote:Do you have an online code repository so we can see all of your code? The problem with your bootloader might be in one of the parts you haven't shown us.
Here is the entire source code for my loader: https://github.com/human00731582/Orchid ... c/BOOT.asm

KERNEL_OFFSET represents the actual physical address of the loading (0x10000).
KERNEL_SEGMENT_OFFSET is for the DiskAddrPacket and is just the same address as KERNEL_OFFSET in terms of segments (0x1000).
I just recently changed this and though it's a better placement, it still isn't going through with the jmp or call.

Please go easy on me with paging, I'm still learning. :mrgreen:

I did not write the A20 section, my original one worked with the emulator, but this one was given to me as a quick-fix for all pesky hardware issues regarding the A20, and so far it has worked (but with having to press a damn key).
Octocontrabass wrote:
human00731582 wrote:Also, A20 on this PC requires that I press a key on the keyboard to activate it.
It sounds like your code to enable A20 doesn't actually work correctly.
I have done the test outlined in the comments above the A20 function of my code, the bit about 0x012345 vs. 0x112345 to check the 20th bit. That returns properly on my machine, so I think the keyboard method is working to enable the gate.
LtG wrote:IIUC, the KERNEL_OFFSET points to the first byte of your kernel that is loaded by your bootloader. Have you verified that it is valid? For instance making your kernels first instruction HLT or infinite jump to itself? Also you could try to print the value in hex, just to ensure that what you are loading is actually valid and to "prove" that it's the CALL itself and not what happens after the call.
Yes, I am absolutely 100% positive that it's the jmp instruction to the physical address that the kernel is loaded to that is faulting my program. Please take a look at my source code if you'd like and let me know what you think.
LtG wrote:I assume that if you add an infinite jump right before the CALL that you don't get a triple fault?
Correct.
LtG wrote:edit. Just to mention the obvious, emulators often have memory zeroed for practical reasons where as real hardware often has memory set to more or less random values. I'm not sure if any of the emulators/simulators have an option for "fuzzed" memory at boot, which you could try to replicate the issue, making it easier to see what exactly is going wrong... I just wanted to mention this avenue in case nothing else really works for you.
Thank you for the suggestion! I use QEMU on Windows (gross, I know lol), and I'll take a look at this option! :D
2024-05-07: Returning from a 7-year disappearing act; please be kind.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: Calling the Kernel on Real Hardware

Post by Octocontrabass »

human00731582 wrote:Here is the entire source code for my loader:
How are you writing your bootloader and kernel to the disk? If you're not writing them where you think you are, your bootloader may still run but it won't load your kernel.

How are you booting your disk? If it's from USB, some BIOSes will assume your code has space reserved for a BPB and corrupt your bootloader.
human00731582 wrote:I did not write the A20 section, my original one worked with the emulator, but this one was given to me as a quick-fix for all pesky hardware issues regarding the A20, and so far it has worked (but with having to press a damn key).
The A20 code you're using relies on interrupts being disabled so it can poll the keyboard controller without an interrupt handler getting in the way. I suspect strange interactions with the interrupt handlers are causing your problem here.
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: Calling the Kernel on Real Hardware

Post by human00731582 »

Octocontrabass wrote:How are you writing your bootloader and kernel to the disk? If you're not writing them where you think you are, your bootloader may still run but it won't load your kernel.
I am writing them using DD for Windows. I DD the first sector as the bootloader, and then sector 1 I load the kernel binary directly into it, so the first 17 total sectors on the disk are the bootloader (0-1), plus the kernel (1-17).
Octocontrabass wrote:How are you booting your disk? If it's from USB, some BIOSes will assume your code has space reserved for a BPB and corrupt your bootloader.
I am loading with a USB stick from the BIOS, yes. Most of the articles I've seen on loading from a thumb-drive say things about UEFI, but I haven't yet seen much information about the USB loading on legacy systems. I will have to do some research on BPB -- sorry if I sound ignorant or uninformed, but I actually thought there wasn't as much to USB loading as I thought most BIOSes treat them as hard disks.
Octocontrabass wrote:The A20 code you're using relies on interrupts being disabled so it can poll the keyboard controller without an interrupt handler getting in the way. I suspect strange interactions with the interrupt handlers are causing your problem here.
I don't have an IDT loaded at this point in the loader. Do you mean it's interacting strangely with BIOS interrupts? Thank you very much for reading and helping, I truly appreciate it.
2024-05-07: Returning from a 7-year disappearing act; please be kind.
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: Calling the Kernel on Real Hardware

Post by Octocontrabass »

human00731582 wrote:I am writing them using DD for Windows.
I'm not familiar with that program. Are you sure you're not writing to a partition on the disk?
human00731582 wrote:I actually thought there wasn't as much to USB loading as I thought most BIOSes treat them as hard disks.
Sadly, it's not that simple. We have a wiki page detailing potential problems booting from USB.
human00731582 wrote:Do you mean it's interacting strangely with BIOS interrupts?
Yes. The BIOS configures the keyboard controller to generate interrupt requests and configures the IVT to handle those interrupts. Your A20 code will cause the keyboard controller to generate at least one interrupt request that the BIOS isn't prepared to handle, so you must disable interrupts before calling it.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Calling the Kernel on Real Hardware

Post by LtG »

A few notes that may be useful:
- Wrt to the USB stick, have you tried attaching the USB stick (and not some image) to your virtual machine and tested if it works (to prove that the issue is not with the stick)?
- You said you are 100% sure it's the JMP instruction.. that's a bit odd, I thought it's the CALL instruction?
- What happens if you get rid of the A20 code? I mean since your kernel/OS is so small (I assume), using just the first MiB shouldn't be a problem..

Unfortunately I don't have the time to compile that and run it myself, or to disasm it to see if some pointers were wrong.. Quite often with asm people get pointers wrong, whether they are using some value or using that value as a memory pointer..
human00731582
Member
Member
Posts: 38
Joined: Wed Jul 19, 2017 9:46 pm

Re: Calling the Kernel on Real Hardware

Post by human00731582 »

LtG wrote:Wrt to the USB stick, have you tried attaching the USB stick (and not some image) to your virtual machine and tested if it works (to prove that the issue is not with the stick)?
Yes, I boot QEMU from the USB quite often. Unfortunately, the issue still remains.
LtG wrote:You said you are 100% sure it's the JMP instruction.. that's a bit odd, I thought it's the CALL instruction?
Semantics. I am trying both a CALL and a JMP interchangeably (though I know the difference in terms of CALL's effect on the stack).
LtG wrote:What happens if you get rid of the A20 code? I mean since your kernel/OS is so small (I assume), using just the first MiB shouldn't be a problem..
WEW. Taking away that entire A20 nightmare shows that the A20 gate is on by default... I honestly can't believe I didn't try this sanity check long ago, but I guess that's what OSDev is all about, right? :lol: Now I can cut that A20 section down by at least 75% for other system tests down the line.
LtG wrote:Quite often with asm people get pointers wrong, whether they are using some value or using that value as a memory pointer..
I'd like to think it's something else I missed like that and not a hardware issue. I did have a thread not too long ago about memory references in NASM, so I am still a bit noobish when it comes to addressing things properly. But that was with offsets in include files, so eh, probably not the issue.
Octocontrabass wrote:I'm not familiar with that program. Are you sure you're not writing to a partition on the disk?
I'm certain that the DD program is writing to the disk at sector 0, with a block size of 512 bytes, so that's what I'm going to add to the BPB I am writing out as per the Wiki's instructions. By the time you read this, it should be updated if you'd be so kind as to have a look. :mrgreen:
Octocontrabass wrote:Your A20 code will cause the keyboard controller to generate at least one interrupt request that the BIOS isn't prepared to handle, so you must disable interrupts before calling it.
Interesting, my interrupts were disabled when I called it. But nevertheless, that nightmare is avoidable for now, since I only plan on testing this on one system for a long time ahead. See my above response to LtG about the A20 gate. #-o

Thanks so much for your guys' help to figure this mess out. Unfortunately, the BPB is not working either, but at least now I can use the USB stick to transfer files still across computers while still having a bootloader at sector 0. :) The source is updated to the newest version.

Also! Here is the command I'm using the run the emulator (QEMU):

Code: Select all

qemu-system-i386 -cpu 486 -drive format=raw,index=0,file="..\bin\image.img"
or

Code: Select all

qemu-system-x86_64 -drive format=raw,index=0,file="..\bin\image.img"
And replace the "..\bin\image.img" by something like "F:" when testing the USB. Both of them emulate properly and as expected.
2024-05-07: Returning from a 7-year disappearing act; please be kind.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: Calling the Kernel on Real Hardware

Post by ~ »

Maybe the source code of my current OS can help you to get a better A20 and Unreal Mode setup. My code has been debugged in a 386DX. Your current code to set up Unreal Mode could lock up sometimes if it isn't debugged carefully against a 386:
http://devel.archefire.org/downfile.php ... -06-16.zip

Project page:
http://devel.archefire.org/forum/viewto ... 2274&hl=en


You have to decide specifically to write code for all the ways in which A20 can be enabled, and code to test whether memory is fully accessible to avoid trying to enable an already enabled A20 line, which is already enabled in most systems built after 2005.

I have code at least to enable/disable the A20 line via the KBC, Fast A20 and if I remember well, BIOS services. If a method fails, my code simply tries the other until the A20 is detected functional.

All the code necessary doesn't fit in the boot sector, it will result in very low quality code that will be discarded later. You should load another Real Mode image from your boot sector and make it set up the most basic configuration, like A20, Unreal Mode, memory maps, initial video mode, and then enter 32 or 64-bit mode with a stable configuration.
Post Reply