Page 1 of 3

How is MOVSD supposed to work?

Posted: Wed Jun 08, 2016 2:30 pm
by RobbieE
I have been following the Brokenthorn OS Developement Series but have hit a snag in Chapter 14

Initially, when I ran the code, I kept on getting triple-fault errors after jumping to the kernel code but after a lot of tracing and stepping through code line-by-line, I found the root cause of my problem: the line where stage 2 copies the loaded kernel to the 1MB mark.

(snippet from: Demo6\SysBoot\Stage2\stage2.asm)

Code: Select all

CopyImage:
  	 mov	eax, dword [ImageSize]
  	 movzx	ebx, word [bpbBytesPerSector]
  	 mul	ebx
  	 mov	ebx, 4
  	 div	ebx
   	 cld
   	 mov    esi, IMAGE_RMODE_BASE
   	 mov	edi, IMAGE_PMODE_BASE
   	 mov	ecx, eax
   	 rep	movsd                   ; copy image to its protected mode address
The line causing the problem is the last one: "rep movsd"

My problem is as follows:

1. The kernel is successfully loaded from the floppy image to the 0x3000 mark.
2. Gate A20 is enabled, the GDT is set up and then we switch to Protected Mode.
3. Now that there's access to up to 4GB of memory, we create a copy of the kernel at the 1MB mark, then jump to the kernel at 0x100000
4. The first thing the kernel must do is set up the segment registers but this is the point where the entire system crashes.

When execution reached the point where the kernel is copied, the GDT was as follows:
<bochs:27>info gdt 0 4
Global Descriptor Table (base=0x000000000000060e, limit=23)
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-conforming, Accessed, 32-bit
GDT[0x02]=Data segment base=0x00000000, limit=0xffffffff, Read/Write, Accessed
After allowing the "rep movsd" to complete, the GDT was as follows:
Global Descriptor Table (base=0x000000000000060e, limit=23)
GDT[0x00]=??? descriptor hi=0x418b0000, lo=0x008be900
GDT[0x01]=16-Bit TSS (Busy) at 0x0414247c, length 0x18d04
GDT[0x02]=32-Bit TSS (Available) at 0x24001424, length 0xc44c6
By stepping through the code and examining the various memory locations, it turns out that the kernel is definitely being copied from 0x3000 to 0x100000, but, it turns out that the data is also being copied over the bootloader as well, and since the GDT is located at 0x60e, it is getting overwritten.

The questions are thus:
1. If MOVSD is supposed to copy data from [DS:ESI] to [ES:EDI], why is it simultaneously copying from [DS:SI] to [ES:DI] as well? (I have gone through the code multiple times and definitely both destinations are being copied to.)
2. How do I copy the kernel to the 1MB mark without overwriting my GDT?

* I am using Bochs 2.6.8 as my emulator

Re: How is MOVSD supposed to work?

Posted: Wed Jun 08, 2016 3:48 pm
by iansjack
Have you tried single-stepping through the code in your debugger to see exactly what is happening?

You don't say what you have set the DS and ES registers to. Presumably 0x10?

At a quick look I think you are moving far too much data (Image size x bytespersector/4). Or is image size measured in sectors?

Re: How is MOVSD supposed to work?

Posted: Wed Jun 08, 2016 3:50 pm
by Combuster
MOVSD uses SI/DI or ESI/EDI depending on the address size, so you must actually be running 32-bit mode for this to work as written. You can override it by using "a32 movsd" or "a16 movsd", the former being the one typically found in unreal mode code.

It it also possible that A20 is not actually enabled. I can verify neither scenario because I don't have enough CPU state.

Re: How is MOVSD supposed to work?

Posted: Wed Jun 08, 2016 4:05 pm
by neon
Hello,

Make sure that your kernel project has /align:512 in the linker settings. The boot loader assumes this in order to simplify loading and copying the image and will crash without it. Note that 512 is not arbitrary, it was selected to match the sector alignment on the disk.

Had a quick look at what ImageSize was since it has been awhile. It is indeed the number of sectors that were loaded for the file. Don't quite know why we did that. So the total number of bytes to copy is (ImageSize * SectorSize) bytes. We divide this by 4 to get number of dwords to use with rep movsd. We use movsd over movsb for performance. So assuming the file loading code works without error and returns the correct size (you can verify this by making sure that ImageSize = size_of_your_kernel_file / 512) then this part should work fine.

If you have not verified the actual value of ImageSize with respect to the size of your kernel file on disk, please do so. Also, what is the size of the kernel file you are trying to load?

Assuming the original poster is using the boot loader unmodified, the provided code does indeed run in 32 bit protected mode. The boot loader loads it to 0:3000h in real mode and copies it to 100000h in protected mode. All boot loader data should be below this address.

Re: How is MOVSD supposed to work?

Posted: Wed Jun 08, 2016 4:20 pm
by alexfru
Is edx zero before div?

Re: How is MOVSD supposed to work?

Posted: Thu Jun 09, 2016 11:38 am
by RobbieE
@iansjack:

It is because of single-stepping that I know that it is copying to both EDI and DI. I ran the following on bochs before and after several iterations.

Code: Select all

x /1dx 0x3000
x /1dx 0x0000
x /1dx 0x100000
BOTH 0x0000 and 0x100000 are being changed.

Also, when comparing with the image on the virtual floppy drive I have open in a HEX editor, I know that the kernel is definitely loaded correctly at 0x3000 and then being copied.

I originally wanted to include all the state data along with my original post but it seemed onerous given that I can't copy-paste from the debug window of Bochs, meaning that I have to type it all by hand. I include it now:

BEFORE copying kernel to 1MB:

Code: Select all

rax: 00000000_00000400 rcx: 00000000_00000400
rdx: 00000000_00000000 rbx: 00000000_00000004
rsp: 00000000_00090000 rbp: 00000000_00003000
rsi: 00000000_00003000 rdi: 00000000_00100000
r8 : 00000000_00000000 r9 : 00000000_00000000
r10: 00000000_00000000 r11: 00000000_00000000
r12: 00000000_00000000 r13: 00000000_00000000
r14: 00000000_00000000 r15: 00000000_00000000
rip: 00000000_0000098a 
eflags 0x00000046: id vip vif ac vm rf nt IOPL=0 of df if tf sf ZF af PF cf

es:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
cs:0x0008, dh=0x00cf9b00, dl=0x0000ffff, valid=1
        Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-conforming, Accessed, 32-bit
ss:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ds:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
fs:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
gs:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ldtr:0x0000, dh=0x00008200, dl=0x0000ffff, valid=1
tr:0x0000, dh=0x00008b00, dl=0x0000ffff, valid=1
gdtr:base=0x000000000000060e, limit=0x17
idtr:base=0x0000000000000000, limit=0x3ff
AFTER copying the kernel to 1MB:

Code: Select all

rax: 00000000_00000400 rcx: 00000000_0000019d
rdx: 00000000_00000000 rbx: 00000000_00000004
rsp: 00000000_00090000 rbp: 00000000_00003000
rsi: 00000000_0000398c rdi: 00000000_0010098c
r8 : 00000000_00000000 r9 : 00000000_00000000
r10: 00000000_00000000 r11: 00000000_00000000
r12: 00000000_00000000 r13: 00000000_00000000
r14: 00000000_00000000 r15: 00000000_00000000
rip: 00000000_0000098c 
eflags 0x00000006: id vip vif ac vm rf nt IOPL=0 of df if tf sf zf af PF cf

es:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
cs:0x0008, dh=0x00cf9b00, dl=0x0000ffff, valid=1
        Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-conforming, Accessed, 32-bit
ss:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ds:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
fs:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
gs:0x0010, dh=0x00cf9300, dl=0x0000ffff, valid=31
        Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
ldtr:0x0000, dh=0x00008200, dl=0x0000ffff, valid=1
tr:0x0000, dh=0x00008b00, dl=0x0000ffff, valid=1
gdtr:base=0x000000000000060e, limit=0x17
idtr:base=0x0000000000000000, limit=0x3ff
As @neon stated: ImageSize is exactly as in the tutorial, it holds the number of sectors that were loaded to 0x3000. The calculation before the CLD statement determines the number of iterations.

@combuster:
I'm definitely in 32-bit mode at the point the code snippet executes, further up the execution are the following two sections:

Code: Select all

		;------------------------------------------;
		;     Install the GDT                      ;
		;------------------------------------------;

		call	InstallGDT

		;------------------------------------------;
		;     Enable Gate A20                      ;
		;------------------------------------------;

		call	EnableA20_KKbrd

Code: Select all

		;------------------------------------------;
		;     Go into protected mode               ;
		;------------------------------------------;

EnterStage3:

		cli								; turn off all interrupts
		mov		eax, cr0				; set bit 0 in cr0 --> enter pmode
		or		eax, 1
		mov		cr0, eax

		jmp		08h:Stage3				; far jump to fix CS. Remember that the code selector is 0x8

		; Note: Do NOT re-enable the interrupts once in pmode. This will cause a triple-fault!
The memory address of the REP MOVSD command is reported by bochs as: 0008:000000000000098a

I read about the "a32" on another website and tried it but, alas, without any difference. Data is still being copied to both EDI and DI

@neon:
I've never had to program on such a low level before so I went through all the steps mentioned on the MSVS page very slowly, many of the settings I don't really understand fully. I have double-checked that my linker indeed has /ALIGN:512. It might be worthy of noting though that I'm using MSVS-2015 but since the problem is in stage2, I don't know if this is significant yet.

The kernel is completely identical to that of Demo6_1 and compiles to 4096 bytes.

@alexfru:
EDX is definitely 0x0 before the DIV, in fact the entire RDX is 0x0. I learnt that lesson a few chapters back. :lol:

Re: How is MOVSD supposed to work?

Posted: Thu Jun 09, 2016 12:00 pm
by Octocontrabass
Your code to enable A20 doesn't work.

Re: How is MOVSD supposed to work?

Posted: Thu Jun 09, 2016 1:10 pm
by iansjack
If A20 isn't enabled then (presumably - I'm not familiar with Bochs) when the OP thinks they are inspecting 0x100000 they are actually inspecting 0x0. This would make sense since the instruction can't possibly copy to two locations at the same time. Hence 0x100000 is never changed; it just seems that it is.

Re: How is MOVSD supposed to work?

Posted: Thu Jun 09, 2016 3:53 pm
by neon
It does sound like a possible A20 problem tbh. If you email me or upload and link to a floppy disk image of your system, I can test it over the weekend and perhaps see what is going on. The demos presented are compatible with all versions of MSVC with minor changes.

Assuming that the A20 line is the problem, you should be able to test it under Bochs by replacing call EnableA20_KKbrd with:

Code: Select all

mov ax, 0x2401
int 0x15

Re: How is MOVSD supposed to work?

Posted: Fri Jun 10, 2016 12:51 am
by RobbieE
@Everyone:

Based on the overwhelming consensus, this might be because my call to EnableA20_KKbrd did not result in Gate A20 being enabled. How did you all figure this out from what I posted? (Perhaps this knowledge could be added to the tutorial)

Chapter 9 of the tutorial series gives a whole range of different functions, in an included file, that enable the gate, I just used the one that was in the main demo:

Code: Select all

;*************************************************
;	A20.inc
;		- Various Routines that enable Gate A20
;
;	OS Development Series
;   http://www.brokenthorn.com/Resources/OSDevIndex.html
;*************************************************

%ifndef __A20_INC__
%define __A20_INC__

bits	16

;----------------------------------------------
; Enables a20 line through keyboard controller
;----------------------------------------------

EnableA20_KKbrd:

		cli
		push	ax
		mov	al, 0xdd					; send enable a20 address line command to controller
		out	0x64, al
		pop	ax
		ret

;--------------------------------------------
; Enables a20 line through output port
;--------------------------------------------

EnableA20_KKbrd_Out:

		cli
		pusha

        call    wait_input
        mov     al,0xAD
        out     0x64,al					; disable keyboard
        call    wait_input

        mov     al,0xD0
        out     0x64,al					; tell controller to read output port
        call    wait_output

        in      al,0x60
        push    eax						; get output port data and store it
        call    wait_input

        mov     al,0xD1
        out     0x64,al					; tell controller to write output port
        call    wait_input

        pop     eax
        or      al,2					; set bit 1 (enable a20)
        out     0x60,al					; write out data back to the output port

        call    wait_input
        mov     al,0xAE					; enable keyboard
        out     0x64,al

        call    wait_input
		popa
        sti
        ret

	; wait for input buffer to be clear

wait_input:
        in      al,0x64
        test    al,2
        jnz     wait_input
        ret

	; wait for output buffer to be clear

wait_output:
        in      al,0x64
        test    al,1
        jz      wait_output
        ret

;--------------------------------------
; Enables a20 line through bios
;--------------------------------------

EnableA20_Bios:
		pusha
		mov	ax, 0x2401
		int	0x15
		popa
		ret

;-------------------------------------------------
; Enables a20 line through system control port A
;-------------------------------------------------

EnableA20_SysControlA:
		push	ax
		mov	al, 2
		out	0x92, al
		pop	ax
		ret

%endif ; __A20_INC__

Re: How is MOVSD supposed to work?

Posted: Fri Jun 10, 2016 6:11 am
by neon
Hello,

Did you try what was suggested yet? I am not entirely convinced the problem is with the A20 gate - however what you described (where it appears to be addressing the same data at 1MB and 0) does certainly make it seem so (this would be due to address wrap around. However, this same effect can be achieved without A20 at the 1MB+16k mark due to real mode addressing limitations.)

I still want to take a look at the resulting image file if possible. I also want you to try the EnableA20_Bios method as a fall back.

Re: How is MOVSD supposed to work?

Posted: Fri Jun 10, 2016 9:28 am
by RobbieE
@neon

Changing to the EnableA20_Bios function call did indeed solve the problem so all of you who said it was a Gate A20 problem were correct. I'd like to ask all of you to please explain how you knew that. I feel so stupid because I was looking completely in the wrong place and pulling my hair out.

Re: How is MOVSD supposed to work?

Posted: Fri Jun 10, 2016 10:46 am
by kzinti
There are very few things that could explain what you were describing. Virtual Memory Mapping didn't apply, multiple overlapping segments didn't apply, and the offset between the two locations you mentioned was the one you would get with a disabled A20 (0x100000 = bit 20 set).

Re: How is MOVSD supposed to work?

Posted: Fri Jun 10, 2016 11:07 am
by neon
This is why we have multiple functions to enable A20 in different ways. In a real boot loader, you would use one, test if it worked, and if not, try another way. Different environments support different methods for enabling A20 - there is no real standard.

I suspected A20 due to the address wrap effect -- "BOTH 0x0000 and 0x100000 are being changed." -- is a pretty big giveaway.

Re: How is MOVSD supposed to work?

Posted: Fri Jun 10, 2016 8:45 pm
by SpyderTL
This is a hard lesson for software developers to learn, even professional ones.

Your computer is a machine. It is essentially a calculator that can be pre-programmed to calculate the same results, over and over, without fail. If you discover your program behaving in a way that defies logic, then you should immediately stop and verify each and every one of your assumptions. The chances are overwhelmingly that the machine is working correctly, and that you have made an incorrect assumption somewhere. Either the code that you typed is not the code that you meant to type, or the system does not work the way that you think it does.

Probably 98% of the users on this forum have made the exact same mistake. And a few months from now, you'll see someone ask the same question about why writing to 0x10000 overwrites memory at 0x0. It's so common that it should probably be added to the Beginner Mistakes wiki page, but I doubt anyone would read it. :)