Problem with rep movsb to >1mb, unreal mode, bochs

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
theseankelly
Posts: 20
Joined: Sat Oct 22, 2011 4:17 pm

Problem with rep movsb to >1mb, unreal mode, bochs

Post by theseankelly »

Hey All

I'm seeing a problem that's specific to Bochs (VirtualBox executes my code and the kernel executes just fine) but I'm fairly convinced there's an error in my code, and I'm just lucky VirtualBox works.

So, I have a loadkernel function that loads the kernel from a CD through int13 extensions to somewhere <1mb, and then uses a rep movsb to move it all up to 1mb. Naturally, the function requires that I be in unreal mode to execute properly. However, when I step through my code, the "rep movsb" line causes Bochs to complain with "write_virtual_checks(): write beyond limit, r/w" for every repetition. I've done my homework on this error, and all I've found is that it could have to do with a messed up gdt, or an overgrown stack. I'm not sure how to verify whether my GDT is set up properly, but I can jump to the kernel just fine using a data segment from VB, or in Bochs if I change the rep movsb to move it to somewhere else in lower memory (eg, from 0x1000 to 0x8000). Furthermore, my stage2 is loaded at 0x600, and my stack also starts at 0x600. At this point in stepping through my code, the SP is 0x5fc, so I should be fine on that front.

The fact that using rep movsb from 0x1000 to 0x8000 suggests that something is wrong with how I've entered unreal mode...but then again, why would VB work if I'm doing that wrong?

Anyway, here's the code for loadkernel routine:

Code: Select all

; loadkernel
; 16 bit, real mode
; loads kernel to specified location(s) KERNEL_LOC and KERNEL
; KERNEL_LOC is the place in < 1mb of memory to cache kernel
; KERNEL is the place in upper memory where the kernel will live
; (BIOS can't write above 1mb, so we have to write to KERNEL_LOC
; and then move it up to KERNEL - works because we're in unreal mode)
; assumes:    We're in unreal mode, kernel can fit in lower memory
; input:      none
; output:     ax = 0 on success, nonzero on fail
; destroyed:  eax, bx, cx, esi, edi
loadkernel:
            mov    eax, [kernsize]             ; set up DAP
            mov    word [numsect], ax
            mov    eax, [kernloc]
            mov    dword [lbanum], eax
            mov    word [destoff], KERNEL_LOC  ; load to < 1mb
            mov    word [destseg], ds
            mov    si, DAP
            mov    ah, 0x42
            int    0x13            ; do the read
            jc    .done
                                  ; now we have to move to > 1mb
            mov   esi,KERNEL_LOC  ; place we're moving from
            mov   edi,KERNEL      ; place we're moving to

            mov   cx, [sectsize]  ; number of bytes to transfer (1 sector)
            mov   bx, [kernsize]  ; loop variable - # of sectors

.loop:      xchg  bx, bx
            a32   rep movsb       ; copies # of bytes in cx, increments esi and edi
                                  ; 'a32' tells it to use esi/edi instead of si/di
            dec   bx              ; i--
            jnz   .loop

            mov   ax,0            ; error code = success
.done:      ret
I can verify in debugger that all variables (sectsize, kernsize, KERNEL, etc) are being set/interpreted correctly. To put this in context, here's my entire stage2.S, which sets up the GDT, unreal mode, and calls this function:

Code: Select all

; Stage2 Bootloader
; FS Agnostic
; Expects a stage1 or install utility to fill in kernloc, kernsize, sectsize
; 
; Loads kernel to KERNEL and jumpts into it

[bits 16]                 ; using 16 bit assembly
[org 0x0600]    

;--------------------------------------------------------
; Environment Variables
;--------------------------------------------------------
KERNEL_LOC    equ  0x1000
KERNEL        equ 0x100000

start: 
      jmp    real_start

      times 8-($-$$) db 0

kernloc    dd  0      ; LBA offset
kernsize  dd  0      ; size in SECTORS
sectsize  dd   0      ; size of a sector on drive in dl

real_start:  

      mov    si, st_loaded
      call  putstr
      
      mov    si, st_chkkern
      call  putstr
      cmp    dword [kernloc], 0
      jz    .fail
      cmp    dword [kernsize], 0
      jz    .fail
      mov    si, success
      call  putstr

      ; Enable A20 so we can access > 1mb of ram
      mov    si, st_a20        
      call  putstr
      call  enablea20     ; call the a20 enabler
      cmp   ax,0          ; did we succeed?
      jnz   .fail          ; if ax is nonzero, no
       mov    si, success 
      call  putstr
        
      ; Load GDT  
      mov    si, st_gdt 
      call  putstr
      xor   ax, ax
      mov   ds, ax
      lgdt  [gdt_desc]    ; load the gdt
      mov    si, success
      call  putstr

      push  ds            ; switch to unreal mode
      mov   eax, cr0       ; because we need to put kernel at 1mb 
      or    al,1          ; basically, switch to pmode 
      mov   cr0, eax
      mov   bx, 0x08      ;load a selector 
      mov   ds, bx   
      and   al, 0xFE
      mov   cr0, eax      ; switch back to "unreal"
      pop   ds            ; restore segment

      ; load kernel into memory
       mov    si, st_ldkern 
      call  putstr
      call  loadkernel
      cmp   ax, 0
      jnz   .fail
      mov    si, success
      call  putstr
  
      ; Switch to pmode
       mov    si, st_pmode
      call  putstr
      mov   eax, cr0      ; get current val
      or    al, 1         ; set that bit
      mov   cr0, eax      ; write it back
      
      jmp   0x10:start_pmode ; jump to clear pipeline of non-32b inst
      jmp .end 
.fail:mov    si, fail
      call  putstr 
.end:  sti
      hlt                 ; Something foul happened :(
      jmp   .end          ; just in case we get woken up
     

;------------------------------------------------------------
; Subroutines 
;------------------------------------------------------------
      
; enablea20
; 16bit, real mode
; function to enable the a20 gate on a processor
; (allows access to greater range of memory)
; assumes:    none
; input:      none
; output:     ax=0 on success, nonzero on failure
; destroyed:  ax
enablea20:
          call  wait_for_kbd_in   ; wait for kbd to clear

          mov   al, 0xD0          ; command to read status
          out   0x64, al

          call  wait_for_kbd_out  ; wait for kbd to have data

          xor   ax, ax            ; clear ax 
          in    al, 0x60          ; get data from kbd
          push  ax                ; save value
          
          call  wait_for_kbd_in   ; wait for keyboard to clear
          mov   al, 0xD1          ; command to write status
          out   0x64, al
          call  wait_for_kbd_in   ; wait for keyboard to clear
          pop   ax                ; get the old value 
          or    al, 00000010b     ; flip A20 bit
          out   0x60, al          ; write it back

          call  wait_for_kbd_in   ; double check that it worked
          mov   al, 0xD0          ; same process as above to read 
          out   0x64, al
          
          call  wait_for_kbd_out
          xor   ax,ax
          in    al, 0x60
          bt    ax, 1             ; is the A20 bit enabled?
          jc    .success
          
          mov   ax, 1             ; code that we failed
          jmp   .return 
         
.success: mov   ax, 0             ; code that we succeeded
.return:  ret


; wait_for_kbd_in
; 16 bit, real mode
; checks to see whether keyboard controller can be written to
; assumes:    none
; input:      none
; output:     none
; destroyed:  ax
wait_for_kbd_in:
          in    al, 0x64         ; read the port
          bt    ax, 1            ; see if bit 1 is 0 or not
          jc    wait_for_kbd_in  ; if it isn't, loop
          ret   


; wait_for_kbd_out
; 16 bit, real mode
; checks to see whether keyboard controller has data to read
; assumes:    none
; input:      none
; output:     none
; destroyed:  ax
wait_for_kbd_out:
          in    al, 0x64         ; read the port
          bt    ax, 0            ; see if bit 0 is 1 or not
          jnc   wait_for_kbd_out ; if it isn't, loop
          ret  


; loadkernel
; 16 bit, real mode
; loads kernel to specified location(s) KERNEL_LOC and KERNEL
; KERNEL_LOC is the place in < 1mb of memory to cache kernel
; KERNEL is the place in upper memory where the kernel will live
; (BIOS can't write above 1mb, so we have to write to KERNEL_LOC
; and then move it up to KERNEL - works because we're in unreal mode)
; assumes:    We're in unreal mode, kernel can fit in lower memory
; input:      none
; output:     ax = 0 on success, nonzero on fail    
; destroyed:  eax, bx, cx, esi, edi 
loadkernel:
            mov    eax, [kernsize]             ; set up DAP
            mov    word [numsect], ax
            mov    eax, [kernloc]
            mov    dword [lbanum], eax
            mov    word [destoff], KERNEL_LOC  ; load to < 1mb
            mov    word [destseg], ds  
            mov    si, DAP
            mov    ah, 0x42
            int    0x13            ; do the read
            jc    .done
                                  ; now we have to move to > 1mb 
            mov   esi,KERNEL_LOC  ; place we're moving from
            mov   edi,KERNEL      ; place we're moving to
 
            mov   cx, [sectsize]  ; number of bytes to transfer (1 sector)
            mov   bx, [kernsize]  ; loop variable - # of sectors
            
.loop:      xchg  bx, bx 
            a32   rep movsb       ; copies # of bytes in cx, increments esi and edi
                                  ; 'a32' tells it to use esi/edi instead of si/di
            dec   bx              ; i-- 
            jnz   .loop
            
            mov   ax,0            ; error code = success
.done:      ret


; putstr
;  16 bit, real mode
;  Prints a null terminated string to screen
; input:       string address to be in si
; output:     none
; destroyed:  ax, bx
putstr:
        mov    ah, 0x0E    ; function for printing
        mov    bh,  0x00    ; page number
        mov    bl, 0x07    ; color  
        
.ldchr:  lodsb              ; put a byte of the string into al
        cmp    al, 0
        je    .done       ; if it's null/zero, all done
        int    0x10        ; do the print
        jmp    .ldchr      ; go to next char  
  
.done:  ret


; start_pmode
; label in 32bit assembly used in the far jump to clear pipeline for switching from
; real16bit to protected32bit mode
[BITS 32]
start_pmode:
            mov ax, 0x08        ; need to load data segment into ds/ss
            mov ds, ax
            mov ss, ax
            mov esp, 0x090000   ; move stack pointer to 090000h, gives us a 65kb stack
                                ; this is probably a weird place for a stack long term
            jmp 0x10:KERNEL     ; jmp to kernel!

;----------------------------------------------------------
; Data
;----------------------------------------------------------

; Disk Address Packet
; (data structure used by int13 ah=42)
DAP:
          db    0x10      ; size of this packet
          db    0          ; always zero
numsect   dw    0          ; number of sectors to transfer
destoff   dw    0          ; segment and offset in mem
destseg   dw    0
lbanum    dd    0          ; lba to read
lbanum2   dd    0          ; extra space for lba offset


; GDT
gdt:      dq   0                           ; need a null segment
          dw  0xFFFF, 0, 0x9200, 0x00CF  ; data
          dw  0xFFFF, 0, 0x9A00, 0x00CF  ; code
gdt_end:
gdt_desc:
          dw  gdt_end - gdt - 1       ; first word is expected to be size of gdt-1
          dd  gdt                     ; then the gdt address

; Strings for printing status
st_loaded          db  'seanOS 2nd Stage Bootloader',13,10,0
st_chkkern        db  'Checking Kernel Info................',0
st_a20            db  'Enabling A20........................',0
st_gdt             db  'Loading GDT.........................',0
st_ldkern         db  'Loading Kernel......................',0
st_pmode          db  'Switching to PMode..................',0
; Success and fail strings;
success            db  'Done',13,10,0
fail              db  'Fail',13,10,0
Also, another red flag is that loading the kernel to 0x1000 when this second stage is loaded at 0x0600 is potentially cutting it close, but it's not the problem right now. Changing 0x1000 to 0x8000 doesn't help.

Thanks so much in advance.
Sean
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Problem with rep movsb to >1mb, unreal mode, bochs

Post by Brendan »

Hi,

There's multiple problems here.

First, the destination for "rep movsb" always uses ES:EDI and your code to enable unreal mode only changes the DS segment limit. Therefore ES should have a 64 KiB segment limit and the "rep movsb" should cause a general protection fault.

When the address size for "rep movsb" is 32-bit it copies bytes until ECX=0. You only load CX. This means that if ECX happened to contain the value 0x87654321 beforehand, then you'd load the sector size 0x0200 into CX without clearing the highest bits of ECX, and ECX would end up being equal to 0x87650200. That's a lot more bytes than you intend to copy. The easiest way to fix this would be "movzx ecx,word [sectsize]"

Also, you only load CX once. You copy the first sector with "rep movsb" which leaves ECX = 0; then when you try to copy the second sector (and third sector, fourth, etc) ECX is still zero so you copy nothing. The easy way to fix this would be to load ECX just before doing the "rep movsb" (inside the loop, not outside the loop). A better idea might be to copy all of the sectors with one "rep movsb" and get rid of the "while(bx > 0)" loop.

I personally wouldn't assume that the "extended read" BIOS function you're using won't change to protected mode (and screw up your GDT and "unreal mode" segment limits). For EDD 3.0 the BIOS may support loading data to a 64-bit address, and (for simplicity) it might switch to protected mode to load sectors even when the data is being loaded below 1 MiB. Instead, I'd have a "switch to unreal mode" routine; and call the "extended read" BIOS function, then call the "switch to unreal mode" routine, then copy the data.

Then there's loading all sectors in one go. There's no real restriction on the size of the EBDA. If the EBDA is 64 KiB (from 0x00090000 to 0x000A0000) and you start loading at 0x00010000 then if the kernel is larger than 512 KiB you start trashing the EBDA. Monolithic kernels tend to grow to be several MiB (e.g. Linux on my machine is about 6 MiB), so you can probably expect your kernel to (eventually) be so large that your boot loader trashes the entire EBDA, corrupts video display memory, then attempts to write to the legacy ROM space. Also there may be limits on how many sectors the BIOS function can handle. For example,
for the function you're using, Ralph Brown's Interrupt List says "number of blocks to transfer (max 007Fh for Phoenix EDD)". For 512-byte sectors, this means it would be a bad idea to try to load more than 63.5 KiB in a single read, because any more than that might cause problems on some systems.

I also wouldn't give up as soon as the "extended read" BIOS function returns an error. I'd probably have at least 3 retries; and I'd decrease the number of sectors you're attempting to read for each retry because for things like intermittent/random read errors trying to read less sectors at a time improves the chance of success. For example; if there's a 50% chance that loading one sector will fail, then there'd be a 75% chance that reading 2 sectors at a time will fail, a 87.5% chance that reading 3 sectors at a time will fail, a 93.75% chance that 4 sectors at a time will fail, etc.

Basically what you'd want is a loop that:
  • set the "number of retries so far" to zero
  • determine the number of sectors to read (e.g. "if(remaining_sectors < 0x7F) { sectors = remaining_sectors; } else { sectors = 0x7F; }")
  • use the "extended read" BIOS function to try to load those sectors
  • If there's an error:
    • increase "retries" by one, and give up if you've retried too much already
    • otherwise, do "sectors = (sectors + 1) / 2" to reduce the number of sectors you're attempting to read and try again
  • call the "switch to unreal mode" routine
  • calculate the number of dwords to copy (e.g. "dwords = sectors * bytes_per_sector / 4; ")
  • copy those dwords (e.g. "a32 rep movsd")
  • update some variables (e.g. "remaining_sectors -= sectors;")
  • loop back to the start if there's any sectors left to read
Note: for 2048-byte sectors (e.g. CDs), the "if(remaining_sectors < 0x7F)" part limits the size of the read to 254 KiB. If your buffer may be smaller than that, then you might need an additional check at the start - e.g. "if(sectors > buffer_size/bytes_per_sector) sectors = buffer_size/bytes_per_sector;".


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
theseankelly
Posts: 20
Joined: Sat Oct 22, 2011 4:17 pm

Re: Problem with rep movsb to >1mb, unreal mode, bochs

Post by theseankelly »

Hey Brendan

Thanks for taking the time for an exhaustive reply, and for even going beyond my immediate problem.

It sure sounds like the two things you pointed out about the segment and cx are going to solve two of my issues (another problem was that even though this works in virtualbox, it can only load one sector). And I guess my understanding of unreal mode was a bit shaky -- I thought loading a selector into only one segment register was enough. I'll make that adjustment.

Thanks for pointing out the other flaws -- I was definitely aware of the potential to overwrite the EBDA, but decided to ignore it for now in favor of getting a base case working first, since my kernel is currently tiny. Your solution will be noted when I get around to fixing that! And I suppose emulators have thus far shielded me from the possibilty of int13 ever failing :)

Sean
Post Reply