OSDev.org

Posted: **Fri May 12, 2017 10:21 am**

Using BIOS call int 13h function 8 to query for drive parameters appears to be corrupting memory.

I'm betting this is a stack or segment pointer problem, because if I move my stack the problem goes away. (It it possible for my code to corrupt data and/or stack used by BIOS?)

Big picture:

My very simple OS has it's own very simple boot loader. Switches to protected mode and runs 32 bit C code, then switches back to real mode and uses BIOS to dump a data buffer back to the boot disk.

The Real mode code sets the stack pointer to 0x7c00 (this is safe, provided I don't overflow, right? http://wiki.osdev.org/Memory_Map_(x86)).

If I dump memory immediately after setting up the stack, everything looks as expected. If I dump memory immediately after calling the int 13, I can see the memory corruption. (A few bytes in the data buffer I intend to dump are modified.)

Moving the stack from 0x7c00 to 0x6000 appears to fix the problem; but, I'm curious what I'm doing wrong. (And I want to make sure I've really fixed the problem, as opposed to pushed the corrupted memory somewhere where it hasn't caused problems --- yet.)

I wonder if I'm somehow trashing memory that is used by BIOS.

Here is the final real mode code

Code: Select all

/*******************************************************************
 *	
 * Return to real mode and dump the output data
 *
 * Code for return to real mode obtained here:
 *http://www.rohitab.com/discuss/topic/35103-switch-between-real-mode-and-protected-mode/
 ******************************************************************/
	
move_to_real:

	cli
	lgdt gdt16_descriptor
	ljmp $CODE_SEG_16, $p_mode16
	
.code16	
p_mode16:	 
	mov $DATA_SEG_16, %ax
	mov %ax, %ds
	mov %ax, %es
	mov %ax, %fs
	mov %ax, %gs
	mov %ax, %ss
	mov %ax, %di
	
	mov %cr0, %eax
	and $~1, %al
	mov %eax, %cr0
	
	ljmp $0, $real

real:	

	/* Sets %ax to 0.  (I'm not sure why we're not using mov.) */
	xor %ax, %ax

	/* Set all the segment registers to 0. */
	mov %ax, %ds
	mov %ax, %es
	mov %ax, %fs
	mov %ax, %gs

	mov %ax, %ss
//	mov $0x7bfc, %sp // $0x7c00 - $4
	mov $0x7000, %sp   // 0x6000 works.  0x7000 and 0x7c00 don't
	mov %sp, %bp 


	// Print debugging message indicating we are back in real mode.
	mov $0x0E, %ah
	mov $0x52, %al // 'R'
	int $0x10

	mov $0x4D, %al // 'M'
	int $0x10

	mov $0x20, %al // ' '
	int $0x10
	
	
/*******************************************************************
 *	
 * Dump the data buffer to the boot device.
 *
 ******************************************************************/

.set drive_type,              -4
.set max_sector_num,          -8
.set max_cylinder_num,       -12
.set max_head_num,           -16
.set num_drives,             -20
.set deubg_test, 		     -24
.set total_sectors_written,   -28

	// Make room on the stack for some local variables.
	add total_sectors_written, %sp  
	movw $0, total_sectors_written(%bp)
	
       // If I dump data here, everything looks normal:  data buffer is untouched.


	/////////////////////////////////////
	//
	// First, gather data about the drive
	//
	/////////////////////////////////////
	mov $0x08, %ah
	mov initial_dl, %dl
	int $0x13 
	
      // If I dump data here, data buffer has about 80 bytes changed.   I can't tell what the data here represents.
      // Its not obviously output from the interrupt.


	// Save the returned values before we mess them up.  (These
	// values probably need only be 16-bit; but, better safe than
	// sorry.)	
	mov %bx, drive_type(%bp)       // Drive type
	mov %cx, max_sector_num(%bp)   // max sector number
	mov %cx, max_cylinder_num(%bp) // max cylinder number
	mov %dx, max_head_num(%bp)     // max head number
	mov %dx, num_drives(%bp)       // num drives
	movl $0x41424344, debug_test(%bp)

     // continue on writing sectors to disk.

Posted: **Sat May 13, 2017 9:55 am**

kurmasz wrote:

Code: Select all

	// Make room on the stack for some local variables.
	add total_sectors_written, %sp  
	movw $0, total_sectors_written(%bp)

Just a note, the stack grows down, so shouldn't this be 'sub' instead of 'add'? I am not too familiar with the syntax of this assembler, so you may be doing something else here.

However, by adding to the stack, are you now overwriting a previously pushed value?

One more thing, when calling this function, IIRC, you need to preserve the ES:DI register pair, and possibly set them to zero. (I don't have my notes with me at the moment).

Ben.

Posted: **Sat May 13, 2017 1:09 pm**

Does it behave the same way on different machines/emulators?

Posted: **Sat May 13, 2017 9:51 pm**

What is the address of this buffer you are trying to dump? You say 80 bytes change but its unclear to me what the address of the buffer is you are looking at.

Secondly I can't tell from the code posted whether you are leaving protected mode where a protected mode IVT was set. If you deal with interrupts in protected mode and you used LIDT to setup an interrupt vector table you will need to restore the real mode IVT (length 0x03ff, base = 0x00000000) when switching back into real mode.

Posted: **Sun May 14, 2017 12:29 am**

I just notice this code is likely not doing what you expected:

Code: Select all

.set total_sectors_written,   -28

 // Make room on the stack for some local variables.
 add total_sectors_written, %sp  
 movw $0, total_sectors_written(%bp)

You define total_sectors_written as -28, but with the instruction add total_sectors_written, %sp you are using -28 as a memory operand, not an immediate value. In your case you are adding the value at DS:[-28] to SP. I think you meant it to be:

Code: Select all

add $total_sectors_written, %sp

Posted: **Mon May 15, 2017 9:03 am**

It turns out that @MichaelPetch was right: I was using total_sectors_written instead of $total_sectors_written. I knew I was messing up the stack somehow, I just couldn't find it. Annoyingly enough this isn't the first time I've been confused about when to use '$' in assembly. Something in my mental model is broken, I just can't put my finger on it: http://stackoverflow.com/questions/4359 ... 6_43592684

It's somewhat moot now, but to answer the other questions:

@BenLunt: total_sectors_written is a constant set to -28, so using add does move the stack down.

@BenLunt: Setting %es and %di do 0 doesn't change the behavior.

@DavidCooper: Yes, the behavior is consistent across machines/emulators. To be specific, every simulator/machine results in memory corruption; however, the precise location of the corruption changes between machines.

@MichaelPetch: The buffer is at the bottom of the data section: 0x9c20. When running vmware on linux, the corruption is from 0x9fd2 - 0xa051.

@MichaelPetch: I don't touch interrupts. I run cli at the beginning to turn them off and leave them off. If I'm messing with either interrupt table, it's entirely by accident.

Posted: **Mon May 15, 2017 11:29 am**

Regarding this:

kurmasz wrote: @MichaelPetch: I don't touch interrupts. I run cli at the beginning to turn them off and leave them off. If I'm messing with either interrupt table, it's entirely by accident.

You may not be using externally generated interrupts but you do use a software interrupt. In particular BIOS interrupt int 13h. This requires a proper real mode IVT to be in place that points to the lower 0x400 bytes (0x00000 to 0x003ff) that still contains the original real mode BIOS vectors. Int 13h itself may actually temporarily enable interrupts to complete the operation as well.

I assume from the second part of that comment that you have not changed the IVT yourself in protected mode. That is why it works. In the future though if you ever use LIDT to change the IVT in protected mode then the concern I raised will be a potential issue when switching back to real mode.

Posted: **Mon May 15, 2017 12:40 pm**

Hmm. It appears I spoke too soon. Fixing my incorrect stack pointer increment did make the immediate problem go away; but, a similar problem popped up in another place. This time, however, it only happens on real hardware. Never on the VM. (It is worth noting, however, that the VM boots from a floppy image, whereas I'm booting from a USB "hard drive" when testing on real machines.)

I have a large data buffer. My linker script places it after all other sections (including .bss), and the code limits the buffer to address 0x9fc00 (as suggested by http://wiki.osdev.org/Memory_Map_(x86) -- if I'm reading it correctly).

If I fill the buffer, then writes fail with error code 0xE. However, if I move the end of the buffer back to 0x9ec00, the problem goes away. I'm thinking I must still be stepping on the BIOS code or data somehow --- this time the code/data for hard disk access (as opposed to floppy).

Posted: **Mon May 15, 2017 1:37 pm**

kurmasz wrote: However, if I move the end of the buffer back to 0x9ec00, the problem goes away. I'm thinking I must still be stepping on the BIOS code or data somehow --- this time the code/data for hard disk access (as opposed to floppy).

Without seeing all your code it is hard to tell if there is another software bug or something unusual about the hardware you are trying to run on. I can make one general observation. The EBDA doesn't necessarily have to be be just the last 1k below 0xA0000 (it can be larger). Have you tried querying the 16-bit word value at 0x0040:0x000e or (physical address 0x0040e) of the BDA. On many modern BIOSes that 16-bit word represents the base address of the EBDA shifted right by 4. If you take the WORD value at 0x0040e and shift it left by 4 that should be the base of the EBDA. For instance if the value is 0x9dc0 that represents a base address of 0x9dc00. There were some antiquated BIOSes that may misreport this value. It would be curious to know what it is on your hardware.

Posted: **Mon May 15, 2017 2:35 pm**

Hard-coding the upper limit of the buffer at 0x9d800 fixed the problem on an i7-4790 (which returned 0x9d80), and an intelCore Duo (which returned 0x9ec0), but not on an older quad-core Athlon Phenom 9600B running BIOS 2.2.3 (which returns 0x9f00).

I'm not surprised by the partial success because, if stepping on the EBDA was the only problem, then I would have had this problem a long time ago. Interestingly enough, the Athlon writes a few tracks before the writes fail, which gives me a few places in my code to double-check.

I'm not asking you to debug the project for me; but, if you are curious about what I'm up to, the code is here: https://github.com/kurmasz/ICOS

Posted: **Tue May 16, 2017 10:25 am**

Assuming your issues really are caused by overwriting memory you're not supposed to touch, I have to ask, have you acquired the memory map from BIOS (or GRUB or some other multiboot compliant bootloader)? If not, I'd suggest you do that, otherwise you're just guessing what memory is ok to use, or even exists.

Posted: **Wed May 17, 2017 10:11 am**

When I run lsmmem in grub (on the Phenom), it lists 0x0 through 0x9f000 as "available RAM". That is consistent with what the BIOS reports for the EBDA. No other low addresses are listed --- although I know some addresses below 0x500 are also off limits. Is there anything else I should be looking for with expect to RAM that should be off limits?

Posted: **Wed May 17, 2017 12:36 pm**

kurmasz wrote:When I run lsmmem in grub (on the Phenom), it lists 0x0 through 0x9f000 as "available RAM". That is consistent with what the BIOS reports for the EBDA. No other low addresses are listed --- although I know some addresses below 0x500 are also off limits. Is there anything else I should be looking for with expect to RAM that should be off limits?

http://wiki.osdev.org/Memory_Map_(x86)

Also within that page is a link to the 0xE820 BIOS function which gives you info as well.. The 0xE820 of course may vary from PC to PC and even with the same PC depending on the amount of RAM installed. So you pretty much always have to use it, or use GRUB which uses 0xE820 itself (so you're still using it, just indirectly).

Posted: **Thu May 18, 2017 6:11 am**

Right, that page indicates that 0x7c00 through 0x9fc000 should be usable. Calling lsmmap in grub showed a similar range of unrestricted memory (just with a slightly different beginning to the EBDA). Thus, if I don't see anything listed as "off limits" by either the osDev Memory Map page or the map generated by the lsmmap grub command (which queries BIOS), then I should be covered, right?

OSDev.org

Calling int 13h seems to be corrupting memory

Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory

Re: Calling int 13h seems to be corrupting memory