I have been working on this for about two days without any resolution in sight. The problem does not appear to be the software. The problem is in the following code section from our fat16 vbr:
;
; push disk address packet on stack
;
push word 0 ; hi 32 bits lba
push word 0
push eax ; low 32 bits lba
push es ; segment
push bx ; offset
push word 1 ; number of sectors
push word 16 ; size of address packet
mov si, sp ; si points to packet
;
; load sector
;
mov ah, 0x42
mov dl, [bootDrive]
int 0x13 ; 0x7cb2
This code section is innocent enough and works well--most of the time. It uses stack space to store the address packet to conserve image space and issues int 13h function 42h. This function was decided over function 2 due to concerns with large hard disk LBAs.
The problem occurs when the vbr attempts to call the BIOS with ES:BX=0x1000:0x7a00 which results in a crash in the Bochs BIOS ROM (>>PANIC<< prefetch: RIP > CS.limit). Nothing at all is located at this location -- does anyone know what can cause the BIOS to crash like this? The software is not able to continue as the int 0x13 call never returns. I can also hardcode the segment:offset buffer with this address and still result in a crash. (Interestingly enough, the load itself is successful.)
I assume that there is sufficient stack space based on the crash only occurring when ES:BX reaches a certain point. That is, when ES:BX are lower addresses the function works fine. The stack is located at 0x7c0:0 growing down; the function calls are at most 2 to 3 levels deep and thus can never reach the stack limit. Most of the time when it crashes it yields invalid opcodes however does eventually run out of stack space. This isnt the software though - as it never returns from the BIOS call.
I have read that BIOS calls should allow a stack of at least 1024 bytes (another source says 4096, but that seems excessive), whereas as you have only 512 bytes minus whatever is used before the interrupt. The symptoms certainly seem to be a classic case of the stack being overwritten.
Is it possible to try with a larger stack? That would at least rule out (or prove) stack collision as the cause.
It has came to consideration a few minutes after I headed off for the night that it could have indeed been a stack issue. After a few changes, the stack has been relocated to 0:0x2000 and the software works without error. The end result is a little ugly due to SS not sharing the same segment as DS, but it is running.