Page 1 of 1

Introduction of nop 'fixes' rep movsb

Posted: Thu Oct 02, 2014 4:16 pm
by MrJonParr
Hi all,

I'm writing a bit of code in my bootloader that parses an ELF file (my kernel stub), loads it from disk and passes execution to it's entry point.

To do this, I have written this snippet of code:

Code: Select all

   mov   edi, KERNEL_FILE_ADDR_ABS
   mov   eax, DWORD [di+28] ; e_phoff field
   add   edi, eax

   mov   eax, DWORD [di + 8] ; p_vaddr
   mov   ecx, DWORD [di + 16] ; p_filesz
   mov   ebx, DWORD [di + 20] ; p_memsz
   mov   edx, DWORD [di + 4] ; p_offset
 ; Idea is we now copy p_filesz bytes from p_offset to p_vaddr
 ; copy from ds:si to es:di ecx bytes
   cld
   nop <---------------- This NOP here
   mov   esi, KERNEL_FILE_ADDR_ABS ; base of file
   add   esi, edx ; p_offset (from above)
   mov   edi, eax ; p_vaddr (from above)
   mov   ecx, ecx ; p_filesz (from above)
   rep   movsb ; copy the bytes
The issue I have is that when the above nop is not present, and when debugging the instructions when I step past 'rep movsb' bochs complains of "00016043556d[MEM0 ] Write outside the limits of physical memory (0x0000ca00bfa4) (ignore)" (Note this is only when I up the debug level in bochs itself) and appears to carry on with garbage registers.

If the nop is there, everything sails along happily and executions jumps to the kernel stub at the 1MB mark. If I remove it (a harmless nop eh?) then the end result is my entry point gets corrupted and ultimately bad things happen.

Does anyone have any advice on debugging this? I've increased the debug level but the physical write is what's confusing me, as I dumped the registers out (below) and they are identical between the two runs (with and without physical address error). Is there some alignment restriction on rep movsb that I'm missing? Reading about x86's alignment it seems only used to help optimise the loads and not any strict requirement. What worries me is that I'm corrupting memory somewhere or that the working version is actually just as corrupt as the other but luckily working.

P.S. Not looking for someone to debug my problems, but some pointers on debugging via bochs or where to look would be most appreciated :) I've spent the best part of 2 hours adding and removing a nop to see any possible differences...

Registers at the point just before the rep movsb:

Code: Select all

eax: 0x00100000 1048576
ecx: 0x000009c4 2500
edx: 0x00001000 4096
ebx: 0x00001206 4614
esp: 0x00090000 589824
ebp: 0x00000000 0
esi: 0x0000da00 55808
edi: 0x00100000 1048576
eip: 0x0000089d

Re: Introduction of nop 'fixes' rep movsb

Posted: Thu Oct 02, 2014 4:44 pm
by Gigasoft
To catch memory corruption, you can use the watch command. Type help watch for details. This takes a physical address, so use the page command to look it up if necessary.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 2:25 am
by embryo
MrJonParr wrote:

Code: Select all

   mov   edi, KERNEL_FILE_ADDR_ABS
   mov   eax, DWORD [di+28] ; e_phoff field
   add   edi, eax
Why do you use DI instead of EDI? It is very error prone at least and at most it can be (as far as I can remember) not supported by the hardware.
MrJonParr wrote:

Code: Select all

   mov   ecx, ecx ; p_filesz (from above)
What is the meaning of such move?

And about your issue. It can be an assembly translator problem when you use indirect addressing using 16-bit registers in protected mode. You can check if all commands are exactly the same when debugging using bochs. If any command is different - you need change your translator or just use ordinary 32-bit indirect addressing.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 2:35 am
by Combuster
Problems like these make me suspect you're writing code with the wrong BITS settings. How does your assembly file start, how do you compile, and in what processor mode/memory location is this supposed to run?

embryo wrote:it can be (as far as I can remember) not supported by the hardware.
Asian wisdom has it that not knowing that you know is best. Thinking that you know while you don't know is a disease. Go read the manual since this is wrong and very, very basic assembly.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 5:13 am
by MrJonParr
Hi all,

Thanks for the responses, I will attempt to debug this further in my lunch break and tonight but here's a bit more information.
Problems like these make me suspect you're writing code with the wrong BITS settings. How does your assembly file start, how do you compile, and in what processor mode/memory location is this supposed to run?
I've placed below the entire loader as it is now, but to explain it briefly it's loaded by the stage1 at 0x0:x0500 and compiled with "nasm -f bin stage2.asm -o stage2.o", I also produce the listing file but the bytes produced for the rep instructions appear to be identical except for addresses between the two versions.

I'm in work now, but I will try the watchpoint on the physical address tonight, I hadn't thought of that at all.
Why do you use DI instead of EDI? It is very error prone at least and at most it can be (as far as I can remember) not supported by the hardware.
Thanks for noticing that! It's been a while since I've done asm, this was simply an oversight. I've corrected all cases where DI should have been EDI, sadly however the problem still persists. I did note that the bytes produced were different though:

Code: Select all

<    511 0000037B 678B4508                   mov   eax, DWORD [di + 8] ; p_vaddr
<    512 0000037F 678B4D10                   mov   ecx, DWORD [di + 16] ; p_filesz
<    513 00000383 678B5D14                   mov   ebx, DWORD [di + 20] ; p_memsz
<    514 00000387 678B5504                   mov   edx, DWORD [di + 4] ; p_offset
---
>    511 0000037B 8B4708                     mov   eax, DWORD [edi + 8] ; p_vaddr
>    512 0000037E 8B4F10                     mov   ecx, DWORD [edi + 16] ; p_filesz
>    513 00000381 8B5F14                     mov   ebx, DWORD [edi + 20] ; p_memsz
>    514 00000384 8B5704                     mov   edx, DWORD [edi + 4] ; p_offset
So I'm positive that's a step forward at least.

Here is stage2.asm.

Code: Select all

   ;; Stage 2 Bootloader
   ;; We are loaded at 0x0050:0x0000 == 0x500
   org   0x500
   bits  16

   %define ROOT_TABLE_ADDR 0xA000 ;; 0000:BC00 == 0050:9B00 == 07C0:2400
   %define FAT_ADDR  0xBC00 ;; 0000:A000 == 0050:B700 == 07C0:4000
   ;; This address is only when in real-mode
   %define KERNEL_FILE_ADDR 0xC500 ;; Overwrite the root table
   ;; This address is only when in protected mode
   %define KERNEL_FILE_ADDR_ABS 0xCA00

   ;; To break here, b at 0x0050:0000
   jmp   start

   ;;
   ;; BIOS parameter block, local variables etc. all
   ;; used for filesystem/util.asm. This certainly
   ;; has room for improvement, but for the moment it works.
   ;;
   bpbBytesPerSector:	DW 512
   bpbSectorsPerCluster: 	DB 1
   bpbReservedSectors: 	DW 1
   bpbNumberOfFATs: 	DB 2
   bpbRootEntries: 	DW 224
   bpbTotalSectors: 	DW 2880
   bpbMedia: 	        DB 0xF0
   bpbSectorsPerFAT: 	DW 9
   bpbSectorsPerTrack: 	DW 18
   bpbHeadsPerCylinder: 	DW 2
   bsDriveNumber: 	        DB 0
   absoluteSector db 0x00
   absoluteHead   db 0x00
   absoluteNumber db 0x00
   absoluteTrack  db 0x00
   msgProgress  db ".", 0

   datasector     dw 0x0000

   %include "filesystem/util.asm"

   ;; String data
   welcome_msg db "Welcome to Stage 2 :)", 13,10, 0
   loadded_gdt_msg db "GDT has been loaded.", 13, 10, 0
   kernel_searching db "Searching for kernel...", 13, 10, 0
   kernel_loading db "Kernel found, Loading...", 13, 10, 0
   kernel_not_loading db "Kernel not found.", 13, 10, 0
   KernelName db "KERNEL  BIN", 13, 10, 0
   elf_magic_not_found_msg db "ELF magic not found in kernel.", 13, 10, 0
   elf_pf_x_msg db "PF_X ", 0
   elf_pf_w_msg db "PF_W ", 0
   elf_pf_r_msg db "PF_R ", 0

   elf_wrong_type_msg db "ELF not ET_EXEC.", 13, 10, 0
   elf_magic_found_msg db "ELF magic found in kernel.", 13, 10, 0
   error_occurred db "An error has occurred.", 13, 10, 0

   ;; Define the GDT
   ;;
   ;; Definition taken from http://www.brokenthorn.com/Resources/OSDev8.html
   ;;
   ;;    Bits 56-63: Bits 24-32 of the base address
   ;;    Bit 55: Granularity
   ;;        0: None
   ;;        1: Limit gets multiplied by 4K
   ;;    Bit 54: Segment type
   ;;        0: 16 bit
   ;;        1: 32 bit
   ;;    Bit 53: Reserved-Should be zero
   ;;    Bits 52: Reserved for OS use
   ;;    Bits 48-51: Bits 16-19 of the segment limit
   ;;    Bit 47 Segment is in memory (Used with Virtual Memory)
   ;;    Bits 45-46: Descriptor Privilege Level
   ;;        0: (Ring 0) Highest
   ;;        3: (Ring 3) Lowest
   ;;    Bit 44: Descriptor Bit
   ;;        0: System Descriptor
   ;;        1: Code or Data Descriptor
   ;;    Bits 41-43: Descriptor Type
   ;;        Bit 43: Executable segment
   ;;            0: Data Segment
   ;;            1: Code Segment
   ;;        Bit 42: Expansion direction (Data segments), conforming (Code Segments)
   ;;        Bit 41: Readable and Writable
   ;;            0: Read only (Data Segments); Execute only (Code Segments)
   ;;            1: Read and write (Data Segments); Read and Execute (Code Segments)
   ;;    Bit 40: Access bit (Used with Virtual Memory)
   ;;    Bits 16-39: Bits 0-23 of the Base Address
   ;;    Bits 0-15: Bits 0-15 of the Segment Limit

gdt_null_descriptor:
   dd    0
   dd    0
gdt_code_descriptor:
   dw    0FFFFh                 ; limit low
   dw    0                      ; base low
   db    0                      ; base middle
   ;;           0  - Access bit (virtual memory usage)
   ;;          1 - Readable/Writeable + Executable
   ;;         0 - Expansion direction
   ;;        1 - Code descriptor (1=Code, 0=Data)
   ;;       1 - System / Code/Data descriptor (set so code)
   ;;     00 - Ring 0 / Ring 3
   ;;    1 - Segment is in memory
   db    10011010b              ; access
   ;;        1111 - Bits 16-19 of Segment limit
   ;;       0 - Reserved for our OS
   ;;      0 - Reserved, must be zero.
   ;;     1 - 16/32bit, 1 = 32bit
   ;;    1 - 1 = 4KB, 0 = no multiplier
   db    11001111b              ; granularity
   db    0                      ; base high
gdt_data_descriptor:
   dw    0FFFFh                 ; limit low
   dw    0                      ; base low
   db    0                      ; base middle
   ;;           0  - Access bit (virtual memory usage)
   ;;          1 - Readable/Writeable + Executable
   ;;         0 - Expansion direction
   ;;        0 - Data descriptor (1=Code, 0=Data)
   ;;       1 - System / Code/Data descriptor (set so code)
   ;;     00 - Ring 0 / Ring 3
   ;;    1 - Segment is in memory
   db    10010010b              ; access
   ;;        1111 - Bits 16-19 of Segment limit
   ;;       0 - Reserved for our OS
   ;;      0 - Reserved, must be zero.
   ;;     1 - 16/32bit, 1 = 32bit
   ;;    1 - 1 = 4KB, 0 = no multiplier
   db    11001111b              ; granularity
   db    0                      ; base high
gdt_end_of_gdt:
gdt_description:
   dw    gdt_end_of_gdt - gdt_null_descriptor - 1 ;; Size of GDT
   dd    gdt_null_descriptor    ;; Base of GDT (note, ORG must be set to correct origin otherwise this pointer isn't adjusted correctly)

start:
   cli
   xor   ax,ax
   mov   ds,ax
   mov   es,ax
   mov   ax, 09000h
   mov   ss, ax
   mov   sp, 0FFFFh
   sti

   mov   si, welcome_msg
   call  printstring

   ;; This is where we start begin to enter protected
   ;; mode and enable larger memory addressing via A20.

   ;; Clear the interrupts
   cli
   ;; Load the GDT
   lgdt [gdt_description]
   sti

   ;; Enable A20 address line
   cli
   mov   al, 0xdd	            ; send enable a20 address line command to controller
   out   0x64, al

   ;; Search for kernel file on file system
   mov si, kernel_searching
   call printstring
   mov si, KernelName
   call printstring

   mov   cx, WORD [bpbRootEntries]
   mov   di, ROOT_TABLE_ADDR
   mov   si, KernelName
   call  fat12_find_file
   test  di, di                 ;; returns 0 di if not found.
   jnz    image_file_found
   jmp    image_file_not_found
image_file_not_found:
   mov si, kernel_not_loading
   call printstring
   cli
   hlt
   jmp $
   ;; Load the found kernel
image_file_found:
   mov si, kernel_loading
   call printstring

   ; Make sure [datasector] is setup for fat12_load_file.
   xor   cx, cx
   xor   dx, dx
   mov   ax, 0x0020             ; 32 byte directory entry
   mul   WORD [bpbRootEntries]  ; total size of directory
   div   WORD [bpbBytesPerSector] ; sectors used by directory
   xchg  ax, cx
   mov   al, BYTE [bpbNumberOfFATs] ; number of FATs
   mul   WORD [bpbSectorsPerFAT] ; sectors used by FATs
   add   ax, WORD [bpbReservedSectors] ; adjust for bootsector
   mov   WORD [datasector], ax  ; base of root directory
   add   WORD [datasector], cx

   ;; Now we've found our kernel, we have to load it into memory
   ;; and decode the ELF headers enough to execute it.
   mov ax, 0x0050
   mov es, ax
   mov ax, WORD [di + 0x1A] ; first cluster number
   mov bx, FAT_ADDR
   mov di, KERNEL_FILE_ADDR
   ;; Load kernel file at [es]0x0050:[di]KERNEL_FILE_ADDR
   call fat12_load_file

elf_test_magic:
   mov di, KERNEL_FILE_ADDR_ABS
   mov al, 0x7f
   cmp al, BYTE [di]
   jne elf_magic_not_found
   mov al, 'E'
   cmp al, BYTE [di+1]
   jne elf_magic_not_found
   mov al, 'L'
   cmp al, BYTE [di+2]
   jne elf_magic_not_found
   mov al, 'F'
   cmp al, BYTE [di+3]
   jne elf_magic_not_found
elf_test_type:
   mov ax, 0x0002 ; ET_EXEC
   cmp   ax, WORD [di+16] ; e_type is offset 0x10 in header
   jne elf_wrong_type
 ; get the e_phoff field for program header offset
 ; ident (16) + type (2) + machine (2) + version (4)
 ; + entry (4) 28
   mov   eax, DWORD [di+28] ; e_phoff field
   mov   edi, KERNEL_FILE_ADDR_ABS
   add   edi, eax
   ;; edi now points to program header
   ;; check it's PT_LOAD type
   mov   eax, 0x00000001 ; PT_LOAD=0x00000001
   cmp   eax, DWORD [di]
   jne   error
   mov   eax, DWORD [di + 8] ; p_vaddr
   mov   ebx, DWORD [di + 16] ; p_filesz
   mov   ecx, DWORD [di + 20] ; p_memsz
   mov   edx, DWORD [di + 24] ; p_flags

elf_test_executable:
   test  edx, 1
   jz    elf_test_writeable
   mov   si, elf_pf_x_msg
   call  printstring
elf_test_writeable:
   test  edx, 2
   jz    elf_test_readable
   mov   si, elf_pf_w_msg
   call  printstring
elf_test_readable:
   test  edx, 4
   jz    elf_magic_found
   mov   si, elf_pf_r_msg
   call  printstring

elf_magic_found:
   mov si, elf_magic_found_msg
   call printstring

elf_load_kernel:
   call  execute_kernel

elf_magic_not_found:
   mov si, elf_magic_not_found_msg
   call printstring
   jmp error
elf_wrong_type:
   mov si, elf_wrong_type_msg
   call printstring
   jmp error

error:
   mov si, error_occurred
   call printstring
   cli
   hlt
   jmp $

execute_kernel:

   jmp pmode_enable

pmode_enable:
   ;; Enable Protected Mode
   ;; by setting bit 0 in cr0
   mov   eax, cr0
   or    eax, 1
   mov   cr0, eax

   ;; Jump to protected mode execution section
   ;; using newly defined GDT code descriptor.
   jmp 0x8:protected_mode

   ;; HALT forever
   cli
   hlt
   jmp $

printstring:
   lodsb
   or    al, al
   jz    printdone
   mov   ah, 0x0E
   int   0x10
   jmp   printstring
printdone:
   ret

bits  32                     ;; 32-bit mode is active from here on out.

protected_mode:
   mov   eax, 0x10               ;; Setup data selectors
   mov   edx, eax
   mov   ds, eax
   mov   ss, ax
   mov   es, ax
   mov   fs, ax
   mov   gs, ax
   mov   esp, 0x90000

 ; lookup p_vaddr location to load kernel (may not be entry point)
 ; lookup p_filesz size of file on disk (in memory)
 ; lookup p_memsz if larger than p_filesz excess should be zeroed
 ; lookup p_flags, unused at this point in time (should be PF_X+PF_R)
   mov   edi, KERNEL_FILE_ADDR_ABS
   mov   eax, DWORD [edi+28] ; e_phoff field
   add   edi, eax

   mov   eax, DWORD [edi + 8] ; p_vaddr
   mov   ecx, DWORD [edi + 16] ; p_filesz
   mov   ebx, DWORD [edi + 20] ; p_memsz
   mov   edx, DWORD [edi + 4] ; p_offset

 ; Idea is we now copy p_filesz bytes from p_offset to p_vaddr
 ; copy from ds:si to es:di ecx bytes
 ; We do this here (in protected mode) and not when we had loaded
 ; the image as we cannot access above 64K at that point.
   mov   esi, KERNEL_FILE_ADDR_ABS ; base of file
   add   esi, edx ; p_offset (from above)
   mov   edi, eax ; p_vaddr (from above)
;   mov   ecx, ecx ; p_filesz (from above)
   rep   movsb ; copy the bytes

   ;; Copy the previously loaded kernel to the 1MB address
   ;; and begin it's execution (now we're in protected mode
   ;; with A20 gate enabled).
   ;; 1MB address is 0x100000
   mov   edi, KERNEL_FILE_ADDR_ABS
   mov   eax, DWORD [edi+24] ; e_entry field
   jmp   eax ; jump to elf entry point

halt:
   cli
   hlt
   jmp $
Thanks,
Jon.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 5:42 am
by Combuster

Code: Select all

   mov   ax, 09000h
   mov   ss, ax
   mov   sp, 0FFFFh
Nope.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 7:34 am
by MrJonParr
Combuster wrote:

Code: Select all

   mov   ax, 09000h
   mov   ss, ax
   mov   sp, 0FFFFh
Nope.
Ok thanks for that, I now zero SS to match the other segment registers. I believe that's now correct.

Thanks,
Jon.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 7:42 am
by Octocontrabass
MrJonParr wrote:Ok thanks for that, I now zero SS to match the other segment registers.
Why? What problem does this change solve?
MrJonParr wrote:I believe that's now correct.
I don't.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 9:42 am
by Gigasoft
You should place the stack where it won't overlap with anything else, such as your code, the loaded image or the EBDA.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 10:32 am
by iansjack
You also need to align it properly. Why do so many people set SP to the obviously incorrect value 0xFFFF?

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 11:10 am
by Antti
iansjack wrote:You also need to align it properly. Why do so many people set SP to the obviously incorrect value 0xFFFF?
That is a good question. It works but is like putting shoes on wrong feet.

In general, another simple thing is the direction flag. Even Linux 0.0.1 had it undefined in its boot sector. iansjack, you should fix that in your boot sector. Fortunately, BIOS is more likely to leave it cleared.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 12:47 pm
by MrJonParr
Antti wrote:
iansjack wrote:You also need to align it properly. Why do so many people set SP to the obviously incorrect value 0xFFFF?
That is a good question. It works but is like putting shoes on wrong feet.

In general, another simple thing is the direction flag. Even Linux 0.0.1 had it undefined in its boot sector. iansjack, you should fix that in your boot sector. Fortunately, BIOS is more likely to leave it cleared.
I can't comment on others, but personally I was following the tutorial series at brokenthorn http://www.brokenthorn.com/Resources/OSDev6.html which the value of 0xFFFF for SP n it's demo source code.

I've added the cld instruction prior to the rep movsb earlier, and cleaned up the spurious sti/cli code a little since the version posted here.

My understanding is now this, the SS is the segment register used in conjunction with SP register in instructions such as push, pop, ret etc. So I need to decide on a free region in memory where SS:SP should point (which will represent the top-of-stack) and set the registers appropriately - ensuring alignment is met. Reading about the EBDA I can query it's size via int12, or assume it's at most 128k before the video memory, but i'll look into that more later. I think I'll try and clean up these issues and see if it helps further debugging this problem.

Thanks again all,
Jon.

Re: Introduction of nop 'fixes' rep movsb

Posted: Fri Oct 03, 2014 3:05 pm
by MrJonParr
Well, I'm happy to say that I know why the program was corrupting so horribly, and also my suspicion of broken both with and without nop was correct.

Something about what Gigasoft (ty) mentioned about memory overlapping seemed promising, so after painfully stepping through the rep movsb in bochs several times I realised that the 'corruption' occurred at the point where the copy was ~0x800 bytes in, which was interestingly enough much like eip at that stage, and once the write had occurred that 'matched' the eip, the next instruction would be re-written into the copied data. It was at this point I realised 0x0010088C and 0x0000088C memory locations were aliasing, which I confirmed via writepmem in bochs.

So it turns out my code to enable A20 address line had a typo, and after fixing that up everything seems to be much smoother, even my interrupt code which was in it's initial stages managed it's first software interrupt routine to be executed :).

I'm still going to go through and examine the rest of the memory layout and ensure it's all correct, I may even write a test for the whether the A20's enabled after having plenty of debug-time to appreciate it's importance...

Cheers,
Jon.