Trouble with 2 Stage Bootloader - Calling 2nd Stage

Octocontrabass · Post by **Octocontrabass** » Thu Oct 15, 2015 4:54 pm

Ready4Dis wrote:

	xor		ax,		ax
	mov		ss,		ax
	mov		ds,		ax
	mov		es,		ax
	mov		ax,		0x07c00
	mov		sp,		ax

While we're focusing on this section of code, I'd like to point out that it sets up the stack wrong. You must set SP immediately after setting SS, or an interrupt may cause corruption by using the new SS and the old SP for the stack. SP is an ordinary register, just like AX, so there's no reason to set AX first and then MOV to SP. The correct code looks like this:

Code: Select all

	xor		ax,		ax
	mov		ds,		ax
	mov		es,		ax
	mov		ss,		ax
	mov		sp,		0x07c00
	mov		ax,		sp

(You can remove the last line if you don't really need AX to be 0x7C00.)

Ready4Dis · Post by **Ready4Dis** » Thu Oct 15, 2015 5:55 pm

Good catch, I haven't ever had it fail, but there is always the possibility. I seem to remember disabling interrupts but I guess at some point it got removed. I originally wrote that boot loader some time around 2005 and I'm sure it could probably use a re-visit. Let me know if you find anything else obviously wrong. I don't need ax to be 0x7c00 at the time I didn't realize I could write into SP directly

. I know better now, but haven't really gone through it since it's worked fine ever since. Thanks for the quick fix, I'll be sure to put it in! (or add the cli next to the cld).

lesniakbj · Post by **lesniakbj** » Thu Oct 15, 2015 8:59 pm

Ok.. I think I'm closer...

I still can't get the transfer of control to happen correctly. It seems I am now loading at the correct area, but I still am having trouble jumping there.

Boot1:

Code: Select all

; NOTE:
;	This file follows the BPB format of
;	the FAT12 specification, listed at
;	http://wiki.osdev.org/FAT#BPB_.28BIOS_Parameter_Block.29
[BITS 16]
[ORG 0x7C00]

; Cononicalize CS:IP to 0x0000:0x7C00
; 	Note: Some BIOS' start us at 
;	0x07C0:0x0000. We fix that.
jmp 0x0000:boot1_start
nop

;====================================;
;		BIOS PARAMETER BLOCK		 ;
;====================================;
oemIdentifier		db 'BrenOS  '
bytesPerSector		dw 512
sectorsPerCluster	db 1
reservedSectors		dw 1
numberOfFATs		db 1
rootEntries			dw 224
totalNumberSectors	dw 0
mediaDescriptor		db 0xF8
sectorsPerFAT		dw 0
sectorsPerTrack		dw 0
numberOfHeads		dw 0
hiddenSectors		dd 0
largeMediaFields	dd 0

;====================================;
;  FAT12 EXTENDED BOOT RECORD (EBPB) ;
;====================================;
driveNumber			db 0x00
reservedField		db 0x00
signatureFAT12		db 0x29
volumeID			dd 0
volumeLabel			db 'BrenOS Sys '
fileSystemID		db 'FAT     '


; ... whew! Now that we defined all the
; things "necissary" to define our FAT
; bootable device, time to start our boot1
; code.

boot1_start:
	; Setup the segments
	cli
	xor ax, ax
	mov ds, ax
	mov es, ax
	
	; Time for the stack
	mov ss, ax
	mov sp, 0x7C00
	sti
	
	; Save the disk that we are
	; booted from.
	mov [diskNumber], dl
	
	; DL contains the drive number,
	; which is conviently where we
	; put the parameter for the 
	; write_hex(dl) function.
	; call write_hex
	call reset_disk
	call read_from_disk
	
	mov si, READ_TO
	call write_string
	mov dx, [readSegment]
	call write_hex
	mov si, OFFSET_CHAR
	call write_string
	mov dx, [readOffset]
	call write_hex
	mov si, NEW_LINE
	call write_string
	
	mov si, CNTRL_MSG
	call write_string
	mov si, NEW_LINE
	call write_string
	
	jmp 0x7C00 + 512
	cli
	hlt
	
; Note: These can't be included due to the
; fact that they use variables defined here.
; Thus, they are are simply included as 
; functions here.
; %include "funcs/disk_functions.asm"
; %include "funcs/output_functions.asm"

reset_disk:
	mov ah, 0				; Reset disk function
	mov dl, [diskNumber]	; This will only be run if on Floppy
	int 0x13
	jc reset_disk
	
	ret
	
read_from_disk:
	; Read Sector Function
	mov ah, 0x02
	
	; Setup the function defining where
	; we are reading from...
	mov al, 1				; Number of Sectors to Read
	mov dl, [driveNumber]	; Use the 1st (C:) Drive. HDD.
	mov ch, 1				; Use the 1st Cylinder/Track
	mov dh, 0				; Use the 1st Read/Write Head
	mov cl, 2				; Read the 2nd Sector
	
	; Where to buffer the disk read to...
	; ES:BX -> 0x0000:0x7E00
	mov bx, 0
	mov es, bx
	mov bx, 0x7E00
	
	; ERROR CHECKING [soon]...	
	mov [readSegment], es
	mov [readOffset], bx
	
	int 0x13
	
	jc .disk_read_error
	ret 

.disk_read_error:
	mov si, READ_ERROR
	call write_string
	jmp $
	

write_string:
	push ax
	push si

	
.string_loop:
	lodsb
	cmp al, 0
	je .string_end
		
	mov ah, 0x0E
	int 0x10
	jmp .string_loop
	
.string_end:
	pop si
	pop ax
	ret

	
write_hex:
	push bx
	push si
	
	mov bx, dx
	shr bx, 12
	and bx, 0x0F
	add bx, HEX_CHARS
	mov bl, [bx]
	mov [HEX_OUT + 2], bl
	
	mov bx, dx
	shr bx, 8
	and bx, 0x0F
	add bx, HEX_CHARS
	mov bl, [bx]
	mov [HEX_OUT + 3], bl
	
	mov bx, dx
	shr bx, 4
	and bx, 0x0F
	add bx, HEX_CHARS
	mov bl, [bx]
	mov [HEX_OUT + 4], bl
	
	mov bx, dx
	and bx, 0x0F
	add bx, HEX_CHARS
	mov bl, [bx]
	mov [HEX_OUT + 5], bl
	
	mov si, HEX_OUT
	call write_string
	
	pop si
	pop bx
	ret

;===================;
;	BOOT-1 DATA
;===================;
; String Data
READ_ERROR 	db 'Error reading disk!', 0
CNTRL_MSG	db 'Handing off control...', 0
READ_TO		db 'Reading sector to: ', 0
OFFSET_CHAR	db ':', 0
NEW_LINE	db 0x0A, 0x0D, 0

; Other Data
HEX_CHARS	db '0123456789ABCDEF', 0
HEX_OUT 	db '0x???? ', 0
diskNumber	db 0

; Error Checking DATA
readSegment	dw 0
readOffset	dw 0
	
TIMES 510 - ($ - $$) db 0 
dw 0xAA55

Boot2:

Code: Select all

[BITS 16]
[ORG 0x7E00]

mov si, TEST_STRING
call write_string

mov ax, 0xBE01
cli
hlt

write_string:
	push ax
	push si
	
.string_loop:
	lodsb
	cmp al, 0
	je .string_end
		
	mov ah, 0x0E
	int 0x10
	jmp .string_loop
	
.string_end:
	pop si
	pop ax
	ret

TEST_STRING db 'Are we loaded correctly?!'

; NOTE:
; ======================
; Some emulators and disk drives will
; not read a sector unless it is fully 
; padded out, thus we need to pad this
; sector or it will not be read. This is
; true of all sectors we read in some 
; emulators. Thus, the last sector of every
; code segment must be padded.
TIMES 512 db 0

When running this as a 1440k Floppy on Bochs, I get the following register values on exit (it hangs after it says its handing off control):

00366341000i[CPU0 ] CPU is in real mode (active)
00366341000i[CPU0 ] CS.d_b = 16 bit
00366341000i[CPU0 ] SS.d_b = 16 bit
00366341000i[CPU0 ] EFER = 0x00000000
00366341000i[CPU0 ] | RAX=0000000000000000 RBX=0000000000007e00
00366341000i[CPU0 ] | RCX=0000000000090102 RDX=0000000000000000
00366341000i[CPU0 ] | RSP=0000000000007bff RBP=0000000000000000
00366341000i[CPU0 ] | RSI=00000000000e7d60 RDI=000000000000f787
00366341000i[CPU0 ] | R8=0000000000000000 R9=0000000000000000
00366341000i[CPU0 ] | R10=0000000000000000 R11=0000000000000000
00366341000i[CPU0 ] | R12=0000000000000000 R13=0000000000000000
00366341000i[CPU0 ] | R14=0000000000000000 R15=0000000000000000
00366341000i[CPU0 ] | IOPL=0 id vip vif ac vm rf nt of df IF tf sf ZF af PF cf
00366341000i[CPU0 ] | SEG selector base limit G D
00366341000i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00366341000i[CPU0 ] | CS:0000( 0004| 0| 0) 00000000 0000ffff 0 0
00366341000i[CPU0 ] | DS:0000( 0005| 0| 0) 00000000 0000ffff 0 0
00366341000i[CPU0 ] | SS:0000( 0005| 0| 0) 00000000 0000ffff 0 0
00366341000i[CPU0 ] | ES:0000( 0005| 0| 0) 00000000 0000ffff 0 0
00366341000i[CPU0 ] | FS:0000( 0005| 0| 0) 00000000 0000ffff 0 0
00366341000i[CPU0 ] | GS:0000( 0005| 0| 0) 00000000 0000ffff 0 0
00366341000i[CPU0 ] | MSR_FS_BASE:0000000000000000
00366341000i[CPU0 ] | MSR_GS_BASE:0000000000000000
00366341000i[CPU0 ] | RIP=000000000000f706 (000000000000f706)
00366341000i[CPU0 ] | CR0=0x60000010 CR2=0x0000000000000000
00366341000i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00366341000i[CPU0 ] 0x000000000000f706>> lock add ah, dl : F000D4
00366341000i[CMOS ] Last time is 1444964365 (Thu Oct 15 21:59:25 2015)
00366341000i[ ] restoring default signal behavior
00366341000i[CTRL ] quit_sim called with exit code 1

Why is my jump not working correctly? I've tried various combinations of 0x7E00, 0x07E0, 0x7C00 + 512, moving to a register then jumping from there... etc... I haven't a clue whats going on.

Edit:
Annndddd I found my mistake. I had set the Cylinder/Track to 1 (thinking it was 1 indexed instead of 0). This was causing my read to not fail, so I executed junk when I jumped.

sebihepp · Post by **sebihepp** » Fri Oct 16, 2015 12:26 am

I found another bug in your code. You are jumping to 0x0000:0x7E00 - that's correct so far. But in your second stage you do a "ret" - you can't return from a jump.

iansjack · Post by **iansjack** » Fri Oct 16, 2015 12:41 am

To be strictly accurate, you can return from a "jmp" (or from no call at all), by setting up the stack appropriately. It's a technique often used to start new tasks or to switch tasks.

Schol-R-LEA · Post by **Schol-R-LEA** » Fri Oct 16, 2015 1:15 am

I think some additional explanation may be in order, even though the Wiki page referenced earlier covers most of this.

First, to understand segment:offset addressing, you will need to see the historical context a bit. When Intel designed the original 8086 CPU, they wanted to extend the earlier 8080 design from 8 bits to 16 bits, but since the 8080 already had a 16-bit address space (the IP and SP registers was 16-bit, the others were all 8-bit but the B and C, D and E, and H and L registers could be accessed together to act as three 16-bit registers), they wanted to extend the address space. However, a 32-bit address space was deemed impractical, as it would need double the number of address pins on the cip package, and in any case it was never anticipated that memories larger than 1 MiB would become available so soon afterwards (Intel still saw their CPUs mainly as microcontrollers, and didn't take the home computer market very seriously yet). So they devised a compromise, similar to one used by IBM twelve years earlier on the System/360: they set up the memory addressing as a series of overlapping segments, in which two 16-bit registers were combined to form an address space a little larger than 1 MiB (an effective 20-bit address range - there actually was a small amount of additional addressing capacity, but they only gave the chips 20 address pins, so that was effectively lost).

The memory segmentation works like this: the segment address holds the address in memory where a 64K segment begins. Each segment begins at an offset that is a multiple of 16, so each potential segment can start 16 bytes after the previous segment. So, if I have a segment value of, say, 000A, it would give you a segment that starts at physical address 000A0, while a segment value of 000B would start 16 bytes higher in memory, at 000B0, and end 16 bytes after the end of the first segment.

The offset register holds an address within a given segment, and if the segment register is constant can be used just as if it were simply an address in a 16-bit address space. To get the physical address from the segment and offset, you would have to add them together with the segment address value multiplied by 16 (that is, shifted by 4 bits to the left), so for the first segment value given above and an offset of 00A4, you would get an address of

Code: Select all

000A0
 00A4
-------
00144

Note that the segments overlap; if we had a segment value of 0010 and an offset of 0044, it would map to the same location in physical memory:

Code: Select all

00100
 0044
-------
00144

This means that you have to be careful when you have segments that overlap.

Now, the IP register is in fact a 16-bit register, which holds an offset. The corresponding segment register is CS, the Code Segment. Thus, changing CD would in effect jump you to the same offset but in a different register, which is why the MOV instruction cannot access CS at all. Only by using a FAR JMP or FAR CALL can you change CS (the conditional jump instructions can only use relative jumps - that is to say, a SHORT Jxx instruction can move the IP 127 bytes up or down, while a NEAR Jxx instruction can move IP by 32,767 up or down).

The SS is, of course, the Stack Segment, and by default the stack grows down. The Stack Pointer, SP, is an offset from SS, so each time a value is pushed onto the stack, SP is decremented by 2 (4 in 32-bit modes on later processor models), while a pop increments it by 2 (or 4). Thus, if you set SP to 0000, then push a value, it will be inserted into memory at SS:FFFE.

The Data Segment (DS) is the default segment for data. In small programs - such as a boot loader - it is often set to map to the same segment as the code segment, as a way of saving memory and simplifying addressing. Whenever you access an memory location by a data instruction, such as

Code: Select all

MOV AX, [8]          ; move data from DS:0008 to AX

if there is no explicit data segment, then DS is assumed. The Extra Segment (ES) is used as an additional data segment, but have to explicitly accessed:

Code: Select all

MOV AX, ES[8]

Ready4Dis · Post by **Ready4Dis** » Fri Oct 16, 2015 4:12 am

sebihepp wrote:I found another bug in your code. You are jumping to 0x0000:0x7E00 - that's correct so far. But in your second stage you do a "ret" - you can't return from a jump.

this is not correct if he's trying to conform to the FAT standard. this must be a short jump with a NOP that way it ends up 3 bytes. If you use a far jump, it does not fit in 3 bytes and your BPB and FAT information don't end up in the correct spot. Anything attempting to use either of these will then fail. This shouldn't make a difference for the boot loader as it's not using the FAT table or anything, but it will matter once he gets further along.

First, did you verify it's actually loading the second sector? A memory dump, or display some contents to screen, something to know for sure it's being loaded? Second, are you positive if it's not making it to the second stage, or if it's actually making it there and then failing? If you're running bochs or qemu, I suggest setting up some break points and seeing what's actually happening. It makes troubleshooting so much easier than I what I started with, lol. Put floppy in, reboot. Triple fault, crap. Take floppy out, reboot into OS, print random values to screen and infinite loop, reboot. Dang, triple fault must have happened before infinite loop, reboot, move inf loop sooner, reboot yay, i see info. Crap, not the info I needed, it doesn't show me anything that could be wrong, reboot, display more info, reboot.

You get the point, there are better tools available, learn how to use them.

sebihepp · Post by **sebihepp** » Fri Oct 16, 2015 5:13 am

this is not correct if he's trying to conform to the FAT standard. this must be a short jump with a NOP that way it ends up 3 bytes. If you use a far jump, it does not fit in 3 bytes and your BPB and FAT information don't end up in the correct spot.

I didn't spoke of this jump right at the beginning of the bootsector. I meant the jump, after he loaded the 2nd stage.

To be strictly accurate, you can return from a "jmp" (or from no call at all), by setting up the stack appropriately. It's a technique often used to start new tasks or to switch tasks.

Okay, okay, right. But if he does a jump, he needs to push the return address manually on the stack. As I didn't saw such code, I mentioned it.

Ready4Dis · Post by **Ready4Dis** » Fri Oct 16, 2015 5:23 am

One thing I noticed, you use a HLT instruction, instead use a jmp $, so that it infinite loops. Reason being, hlt only stops the CPU until there is an interrupt. If interrupts are enabled, this will continue on after an interrupt. If you are going to use a hlt you must either disable interrupts (cli). I normally just use a jmp $, as this is much simpler to put in and remove (without having to worry about the state of interrupts).

I didn't spoke of this jump right at the beginning of the bootsector. I meant the jump, after he loaded the 2nd stage.

You are of course correct, sorry about that, I was looking at the problem jmp and didn't realize you were talking about another. He only does a jmp 0x7c00 + 512, where you said 0x0000:0x7e00, so when I saw the jmp at the top with the seg:offset I thought that was the one you were speaking of.

Antti · Post by **Antti** » Fri Oct 16, 2015 6:38 am

A simple question (but may be tricky) while we at this topic. If we run 16-bit code, i.e. at CS:IP, would it be guaranteed that the upper word of EIP is zero?

Octocontrabass · Post by **Octocontrabass** » Fri Oct 16, 2015 7:17 am

If you're not doing any ridiculous hacks, you'll never see the high bits of EIP while you're in 16-bit mode, so you don't have to worry about that.

Intel says they're zero.

lesniakbj · Post by **lesniakbj** » Fri Oct 16, 2015 8:40 am

Ready4Dis wrote:
sebihepp wrote:I found another bug in your code. You are jumping to 0x0000:0x7E00 - that's correct so far. But in your second stage you do a "ret" - you can't return from a jump.
this is not correct if he's trying to conform to the FAT standard. this must be a short jump with a NOP that way it ends up 3 bytes. If you use a far jump, it does not fit in 3 bytes and your BPB and FAT information don't end up in the correct spot. Anything attempting to use either of these will then fail. This shouldn't make a difference for the boot loader as it's not using the FAT table or anything, but it will matter once he gets further along.

First, did you verify it's actually loading the second sector? A memory dump, or display some contents to screen, something to know for sure it's being loaded? Second, are you positive if it's not making it to the second stage, or if it's actually making it there and then failing? If you're running bochs or qemu, I suggest setting up some break points and seeing what's actually happening. It makes troubleshooting so much easier than I what I started with, lol. Put floppy in, reboot. Triple fault, crap. Take floppy out, reboot into OS, print random values to screen and infinite loop, reboot. Dang, triple fault must have happened before infinite loop, reboot, move inf loop sooner, reboot yay, i see info. Crap, not the info I needed, it doesn't show me anything that could be wrong, reboot, display more info, reboot. You get the point, there are better tools available, learn how to use them.

I did manage to get it to load, I was trying to read the wrong sector!

Real quick question on the beginning jump. Will

Code: Select all

jmp short <label>

reset CS:IP correctly? The only reason I did the long jump was to reset the segment. And your debugging process sounds exactly like mine, I'm just trying to pick the brains of knowledgeable people along the way

sebihepp wrote: I didn't spoke of this jump right at the beginning of the bootsector. I meant the jump, after he loaded the 2nd stage.
To be strictly accurate, you can return from a "jmp" (or from no call at all), by setting up the stack appropriately. It's a technique often used to start new tasks or to switch tasks.
Okay, okay, right. But if he does a jump, he needs to push the return address manually on the stack. As I didn't saw such code, I mentioned it.

I still don't see the ret that is the problem in the second stage. I jump, have a cli hlt after the jmp, the 2nd stage prints a string, moves a value into ax, cli then hlt. I don't see a return there anywhere except the string function.

Sorry if I seem to be asking a lot of questions, I want to legitimately learn this stuff and understand it.

Ready4Dis · Post by **Ready4Dis** » Fri Oct 16, 2015 9:56 am

lesniakbj wrote: Real quick question on the beginning jump. Will
Code: Select all
jmp short <label>
reset CS:IP correctly? The only reason I did the long jump was to reset the segment. And your debugging process sounds exactly like mine, I'm just trying to pick the brains of knowledgeable people along the way

No, it won't do anything with CS:IP, but it won't us CS since it's a short jump, it only does a relative jump, so you don't need to bother with it until after that jump. I wouldn't even bother until the end where you jump to your second stage.

jmp 0x0000:0x7e00

I still don't see the ret that is the problem in the second stage. I jump, have a cli hlt after the jmp, the 2nd stage prints a string, moves a value into ax, cli then hlt. I don't see a return there anywhere except the string function.

Sorry if I seem to be asking a lot of questions, I want to legitimately learn this stuff and understand it.

I'm not sure what they're looking at, it looks perfectly acceptable (almost). Don't rely on a cli/hlt honesty. A cli does NOT disable an NMI (non-maskable interrupt) and can still end up executing. The better way is:

halt_label:
cli
hlt
jmp halt_label

This way, if you get an NMI for some reason, it will just hlt again so you're cpu isn't running at 100%

Antti · Post by **Antti** » Sun Oct 18, 2015 12:30 am

Octocontrabass wrote:you'll never see the high bits of EIP while you're in 16-bit mode, so you don't have to worry about that

Agreed. However, would it do any harm to do something like this anyway?

Code: Select all

        BITS 16

        ; Tested that 32-bit instructions are supported before executing this

        mov ebx, .EIP           ; ebx = jump address
        jmp ebx                 ; absolute near jump (eip = ebx)
.EIP:   ; nop

        and esp, 0x0000FFFF     ; clear upper bits of esp
        ; and so on...

The question is not anymore whether it is needed or not. The question is whether this itself may cause problems. I have not read all the CPU errata...

Ready4Dis · Post by **Ready4Dis** » Sun Oct 18, 2015 7:13 am

That just adds confusion for no reason, and I'm not even sure if it's supported in 16-bit mode? Since you can't do 32-bit jumps in 16-bit mode, I am going to guess it either does a 16-bit jump anyways, or won't even compile, but I have not tried it personally as there is absolutely no benefit or use for it. Just use the 16-bit registers for the 16-bit portions, and set the 32-bit registers once you go into pmode where they are used. Why do this work in the very limited boot loader instead of a 2nd stage loader?

OSDev.org

Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage

Re: Trouble with 2 Stage Bootloader - Calling 2nd Stage