QEMU 0.12.3 suddenly executes random code

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
mich
Posts: 18
Joined: Sun Nov 11, 2012 5:36 pm

QEMU 0.12.3 suddenly executes random code

Post by mich »

Hi,

first of I'm not new to OS development. Though I guess it's never too late to get stuck on a project.

I'm trying to get a basic file system working and for this would like to use the BIOS INT 0x13 functionality. But this is only available in real mode. Thus I wrote a little routine that drops to real mode in order to write the file system code in C then call a write_sector_with_bios() function ... you get the idea.

The code is working fine on hardware (Asus Eee PC 1000H Netbook) but doesn't in QEMU 0.12.3. This makes me sad.

So my question would you be so kind to look over my code and share your wisdom of whether this is a potential QEMU 0.12.3. issue or a bug in my code.

Technical details are contained in the files below:
Program versions: see Makefile
Detailed problem statement: see start.S

What I want to do is write an 'A' on screen via BIOS INT 0x10.
What happens is basically QEMU 0.12.3 is getting out of control and starts to execute random code at a random address. In version 1.2.0 it prints the A on screen just fine as on hardware, though.

Anyway I'd rather like to know whether its QEMU or me before building upon broken code.

start.S:

Code: Select all

#define MULTIBOOT_HEADER_MAGIC                  0x1BADB002
#define MULTIBOOT_PAGE_ALIGN                    0x00000001

#define MULTIBOOT_HEADER_FLAGS MULTIBOOT_PAGE_ALIGN

#define STK_SZ 0x800000 /* 8MiB */

#define REAL_ADDR 0x0500
#define REAL_STK  0x1000

.section .bss

	.comm stk, STK_SZ

.section .text

	.global start, _start
	.extern cpy_real_code

start:
_start:
	cli
	jmp boot

	.align 4
	.long MULTIBOOT_HEADER_MAGIC
	.long MULTIBOOT_HEADER_FLAGS
	.long -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)

boot:
	movl $(stk + STK_SZ), %esp

	pushl $0
	popf

	cld

	call cpy_real_code

	jmp drop_to_real_mode

halt:
	hlt
	jmp halt


drop_to_real_mode:
	cli

	/* load a gtd with 16bit code and data segments */
	lgdtl gdtd
	
	/* first init 32bit data segment and load 32bit code segment */
	mov $0x20, %ax
	mov %eax, %ss
	mov %eax, %ds
	mov %eax, %es
	mov %eax, %fs
	mov %eax, %gs
	ljmpl $0x18, $reload_cs
reload_cs:

	/* jmp to code < 1 MiB */
	/* FIXME: make this a /direct/ absolute far jmp ... but AT&T syntax is weird :/ */
	mov $REAL_ADDR, %eax
	jmp *%eax

.section .real

real_start:
	/* 9.9.2 Switching Back to Real-Address Mode */
	/* 9.9.2:1. Disable interrupts. */
	cli
	
	/* 9.9.2:2. If paging is enabled, perform the following operations:
	 * - Transfer program control to linear addresses that are identity
	 * mapped to physical addresses (that is, linear addresses equal
	 * physical addresses).
	 * - Insure that the GDT and IDT are in identity mapped pages.
	 * - Clear the PG bit in the CR0 register.
	 * - Move 0H into the CR3 register to flush the TLB.
	 */
	// NOTE: nop because paging not enabled
	
	/* 9.9.2:3. Transfer program control to a readable segment that has a
	 * limit of 64 KBytes (FFFFH). This operation loads the
	 * CS register with the segment limit required in real-address mode.
	 */
	ljmp $0x8, $(REAL_ADDR + L3 - real_start)
L3:
	.code16
	/*
	 * 9.9.2:4. Load segment registers SS, DS, ES, FS, and GS with a
	 * selector for a descriptor containing the following values,
	 * which are appropriate for real-address mode:
	 * - Limit = 64 KBytes (0FFFFH)
	 * - Byte granular (G = 0)
	 * - Expand up (E = 0)
	 * - Writable (W = 1)
	 * - Present (P = 1)
	 * - Base = any value
	 * The segment registers must be loaded with non-null segment
	 * selectors or the segment registers will be unusable in
	 * real-address mode. Note that if the segment registers are not
	 * reloaded, execution continues using the descriptor attributes
	 * loaded during protected mode.
	 */
	mov $0x10, %ax
	mov %eax, %ss
	mov %eax, %ds
	mov %eax, %es
	mov %eax, %fs
	mov %eax, %gs
	
	/* 9.9.2:5. Execute an LIDT instruction to point to a real-address
	 * mode interrupt table that is within the 1-MByte real-address
	 * mode address range.
	 */
	// NOTE: not neccessary because we never set a idt
	// NOTE: IVT already is base=0 limit=0x3ff
	// NOTE: setting IVT/IDT again to base=0 limmt=0x3ff here doesn't make any difference
	
	/* 9.9.2:6. Clear the PE flag in the CR0 register to switch to
	 * real-address mode.
	 */
	mov %cr0, %eax
	and $~0x1, %eax
	mov %eax, %cr0
	
	/* 9.9.2:7. Execute a far JMP instruction to jump to a real-address
	 * mode program. This operation flushes the instruction queue
	 * and loads the appropriate base-address value in the CS register.
	 */
	ljmpw $0, $(REAL_ADDR + L7 - real_start)
L7:
	/* 9.9.2:8. Load the SS, DS, ES, FS, and GS registers as needed by the
	 * real-address mode code. If any of the registers are not going
	 * to be used in real-address mode, write 0s to them.
	 */
	mov $0, %ax
	mov %eax, %ds
	mov %eax, %ss
	mov %eax, %fs
	mov %eax, %gs
	mov %eax, %es
	mov $(REAL_STK), %sp

	mov $0x0, %bh
	mov $0x07, %bl
	mov $'A', %al
	mov $0x0e, %ah

	/**************************
	 * PROBLEM MANIFESTS HERE *
	 **************************/
	/* FIXME: if I hlt here everything is OK. But as soon as I allow
	 * interrupts execution jumps to a random address and QEMU eventually
	 * crashes with "execution outside RAM at 0x0000a0000" and/or
	 * runs until %ip hits 4GiB
	 */
	//hlt

	sti

	int $0x10
	
	/* this hlt never gets reached :( */
stop:
	cli
	hlt
	jmp stop

.align 4
idtd:
	.word 0x3ff
	.long 0

.align 4
gdt:
	// null
	.byte 0, 0, 0, 0, 0, 0, 0, 0

	// real mode
	// code 0x08
	.byte 0xff, 0xff, 0, 0, 0, 0x9a, 0, 0x00
	// data 0x10
	.byte 0xff, 0xff, 0, 0, 0, 0x92, 0, 0x00

	// protected mode
	// code 0x18
	.byte 0xff, 0xff, 0, 0, 0, 0x9a, 0xcf, 0x00
	// data 0x20
	.byte 0xff, 0xff, 0, 0, 0, 0x92, 0xcf, 0x00
gdt_end:

.align 4
gdtd:
	.word gdt_end - gdt - 1
	.long REAL_ADDR + gdt - real_start

Other files needed for a full working example of the problem are as follows.

cpy_real_code.c:

Code: Select all

extern char REAL_START, REAL_END;

void cpy_real_code(void)
{
	volatile char *d = 0x500;
	const char *s = &REAL_START;
	unsigned n = &REAL_END - &REAL_START;
	while(n--)
		d[n] = s[n];
}
linker.ld:

Code: Select all

ENTRY(start)
SECTIONS
{
	. = 0x100000; /* 1MiB */

	.text ALIGN(0x1000) :
	{
		*(.text)
		*(.rodata)
		. = ALIGN(4096);
	}

	.rodata ALIGN(0x1000) :
	{
		*(.rodata*)
		. = ALIGN(4096);
	}

	.data ALIGN(0x1000) :
	{
		*(.data)
		. = ALIGN(4096);
	}

	/* FIXME: without the ALIGN directives .bss overlaps .real */

	.real ALIGN(0x1000) :
	{
	/* real mode code, which must be relocated when loaded with grub */
		REAL_START = .;
		*(".real")
		*(".real$")
		REAL_END = .;
		. = ALIGN(4096);
	}

	.bss ALIGN(0x1000) :
	{
		*(".bss")
		. = ALIGN(4096);
	}
}

Makefile:

Code: Select all

# GNU Make 3.81

#
# Setting the tools and their flags.
#

# gcc (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3
CC = gcc

# gcc (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3
AS = gcc

# GNU ld (GNU Binutils for Ubuntu) 2.22
LD = ld

# QEMU PC emulator version 0.12.3 (qemu-kvm-0.12.3), Copyright (c) 2003-2008 Fabrice Bellard
# QEMU emulator version 1.2.0, Copyright (c) 2003-2008 Fabrice Bellard
QEMU = qemu

VERBOSE = @

WFLAGS = -W -Wpadded -Winline -Wstrict-overflow -Wundef -Wwrite-strings \
#	-Wall -Wextra -Wdisabled-optimization -Wstrict-aliasing -Wconversion -Wcast-qual \
#	-Werror
DFLAGS = -g3
OFLAGS = -O3 -s
CFLAGS = -std=gnu99 -pedantic -nostdlib -fno-builtin -nostartfiles \
	-nodefaultlibs $(OFLAGS) $(WFLAGS) $(DFLAGS)


LDFLAGS = -s


ASFLAGS = -nostdlib -fno-builtin -nostartfiles -nodefaultlibs


SRC = $(shell find . -name "*.c" -or -name "*.S" -and -not -name "start.S")
DEP = $(SRC) $(shell find . -name "*.h")
FIRST_OBJ = start.o
OBJ = $(addsuffix .o, $(notdir $(basename $(SRC))))

VPATH = $(dir $(SRC))

BIN = system.elf


all: $(BIN)

system.elf: $(FIRST_OBJ) $(OBJ)
	@echo "Linking" $@ "from" $^
	$(VERBOSE) $(LD) $(LDGLAGS) -T linker.ld $^ -o system.elf
	@echo ""

.PHONY: qemu
qemu: system.elf
	$(QEMU) -kernel system.elf

%.o: %.c
	@echo "Compiling" $<
	$(VERBOSE) $(CC) $(CFLAGS) -c $<
	@echo ""

%.o: %.S
	@echo "Compiling" $<
	$(VERBOSE) $(AS) $(ASFLAGS) -c $< -o $@
	@echo ""


.PHONY: clean
clean:
	@echo "Deleting files"
	$(VERBOSE) $(RM) $(FIRST_OBJ) $(OBJ) $(BIN)
	@echo ""


#
# generate the make dependencies
#
Makefile: $(DEP)
	@echo "Generating dependencies and updating Makefile"
	@sed '/[#] 9445baa814592c63c617be9eb40a39cee949719a/q' < Makefile > depend
	@echo "# Make may overwrite this line and everything below." >> depend
	$(VERBOSE) $(CC) $(CFLAGS) -MM $(SRC) >> depend
	@mv depend Makefile
	@echo ""

# Do not delete the next line, stupid! Our depend gen hack depends on it.
# 9445baa814592c63c617be9eb40a39cee949719a
# Make may overwrite this line and everything below.
cpy_real_code.o: cpy_real_code.c

Then I also noticed that the objdump especially of the far jmps looks weird ... though I've seen the problem before and it seems to be bug in objdump. But in case it is not here is the objdump output.

$ objdump -D system.elf:

Code: Select all

system.elf:     file format elf32-i386


Disassembly of section .text:

00100000 <_start>:
  100000:	fa                   	cli    
  100001:	eb 0d                	jmp    100010 <_start+0x10>
  100003:	90                   	nop
  100004:	02 b0 ad 1b 01 00    	add    0x11bad(%eax),%dh
  10000a:	00 00                	add    %al,(%eax)
  10000c:	fd                   	std    
  10000d:	4f                   	dec    %edi
  10000e:	52                   	push   %edx
  10000f:	e4 bc                	in     $0xbc,%al
  100011:	00 20                	add    %ah,(%eax)
  100013:	90                   	nop
  100014:	00 6a 00             	add    %ch,0x0(%edx)
  100017:	9d                   	popf   
  100018:	fc                   	cld    
  100019:	e8 32 00 00 00       	call   100050 <cpy_real_code>
  10001e:	eb 03                	jmp    100023 <_start+0x23>
  100020:	f4                   	hlt    
  100021:	eb fd                	jmp    100020 <_start+0x20>
  100023:	fa                   	cli    
  100024:	0f 01 15 74 10 10 00 	lgdtl  0x101074
  10002b:	66 b8 20 00          	mov    $0x20,%ax
  10002f:	8e d0                	mov    %eax,%ss
  100031:	8e d8                	mov    %eax,%ds
  100033:	8e c0                	mov    %eax,%es
  100035:	8e e0                	mov    %eax,%fs
  100037:	8e e8                	mov    %eax,%gs
  100039:	ea 40 00 10 00 18 00 	ljmp   $0x18,$0x100040
  100040:	b8 00 05 00 00       	mov    $0x500,%eax
  100045:	ff e0                	jmp    *%eax
	...

00100050 <cpy_real_code>:
  100050:	b8 7a 10 10 00       	mov    $0x10107a,%eax
  100055:	55                   	push   %ebp
  100056:	2d 00 10 10 00       	sub    $0x101000,%eax
  10005b:	89 e5                	mov    %esp,%ebp
  10005d:	74 1b                	je     10007a <cpy_real_code+0x2a>
  10005f:	8d 90 00 10 10 00    	lea    0x101000(%eax),%edx
  100065:	8d 76 00             	lea    0x0(%esi),%esi
  100068:	0f b6 4a ff          	movzbl -0x1(%edx),%ecx
  10006c:	83 ea 01             	sub    $0x1,%edx
  10006f:	88 88 ff 04 00 00    	mov    %cl,0x4ff(%eax)
  100075:	83 e8 01             	sub    $0x1,%eax
  100078:	75 ee                	jne    100068 <cpy_real_code+0x18>
  10007a:	5d                   	pop    %ebp
  10007b:	c3                   	ret    
	...

Disassembly of section .real:

00101000 <REAL_START>:
  101000:	fa                   	cli    
  101001:	ea 08 05 00 00 08 00 	ljmp   $0x8,$0x508
  101008:	b8 10 00 8e d0       	mov    $0xd08e0010,%eax
  10100d:	8e d8                	mov    %eax,%ds
  10100f:	8e c0                	mov    %eax,%es
  101011:	8e e0                	mov    %eax,%fs
  101013:	8e e8                	mov    %eax,%gs
  101015:	0f 20 c0             	mov    %cr0,%eax
  101018:	66 83 e0 fe          	and    $0xfffe,%ax
  10101c:	0f 22 c0             	mov    %eax,%cr0
  10101f:	ea 24 05 00 00 b8 00 	ljmp   $0xb8,$0x524
  101026:	00 8e d8 8e d0 8e    	add    %cl,-0x712f7128(%esi)
  10102c:	e0 8e                	loopne 100fbc <cpy_real_code+0xf6c>
  10102e:	e8 8e c0 bc 00       	call   ccd0c1 <stk+0xbcb0c1>
  101033:	10 b7 00 b3 07 b0    	adc    %dh,-0x4ff84d00(%edi)
  101039:	41                   	inc    %ecx
  10103a:	b4 0e                	mov    $0xe,%ah
  10103c:	fb                   	sti    
  10103d:	f4                   	hlt    
  10103e:	cd 10                	int    $0x10
  101040:	f4                   	hlt    
  101041:	00 00                	add    %al,(%eax)
  101043:	00 ff                	add    %bh,%bh
  101045:	03 00                	add    (%eax),%eax
	...
  101053:	00 ff                	add    %bh,%bh
  101055:	ff 00                	incl   (%eax)
  101057:	00 00                	add    %al,(%eax)
  101059:	9a 00 00 ff ff 00 00 	lcall  $0x0,$0xffff0000
  101060:	00 92 00 00 ff ff    	add    %dl,-0x10000(%edx)
  101066:	00 00                	add    %al,(%eax)
  101068:	00 9a cf 00 ff ff    	add    %bl,-0xff31(%edx)
  10106e:	00 00                	add    %al,(%eax)
  101070:	00 92 cf 00 27 00    	add    %dl,0x2700cf(%edx)
  101076:	4c                   	dec    %esp
  101077:	05 00 00 00 00       	add    $0x0,%eax

0010107a <REAL_END>:
	...

Disassembly of section .bss:

00102000 <stk>:
	...

Disassembly of section .comment:
Thank you for your attention. I appreciate you took your time reading this.

P.S. I very recently converted from NASM (Intel-Syntax) to GAS (AT&T-Syntax) ... in case anyone is wondering about the weird/bad code constructs (esp. jmps and data declarations). Any hints and tips are appreciated.

EDIT:
I hereby release everything above under the following license:

Code: Select all

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
THIS SOFTWARE.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: QEMU 0.12.3 suddenly executes random code

Post by Brendan »

Hi,
mich wrote:So my question would you be so kind to look over my code and share your wisdom of whether this is a potential QEMU 0.12.3. issue or a bug in my code.
mich wrote:In version 1.2.0 it prints the A on screen just fine as on hardware, though.
I couldn't find any bugs in the code (which doesn't necessarily mean there are none).

I'd recommend trying it on more (real or emulated) computers. So far it fails on 33.333% of (real or emulated) computers. If you try it on 7 more computers then you might find out that it fails on 10% of computers or that it fails on 80% of computers (or anything between).

Also note that for most emulators you can configure them differently. For example, you could configure Bochs to emulate an 80486 with 8 MiB of RAM and very few devices, then reconfigure Bochs to emulate a quad-core "Core2" with 3 GiB of RAM and lots of devices. Sometimes subtle changes (like changing the amount of RAM, or changing the speed of CPUs) can make symptoms appear or disappear.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
mich
Posts: 18
Joined: Sun Nov 11, 2012 5:36 pm

Re: QEMU 0.12.3 suddenly executes random code

Post by mich »

Now also 1.2.0 did it. Then by chance I found out that adding .align 4 before ever function fixed the problem. Now it still won't work in 0.12.3 just like that, but when I make a disk image and install GRUB to that then boot from that (with the .align 4) it works in both 1.2.0 and 0.12.3

Works in Bochs and Hardware though. QEMU isn't exactly known for its stability anyway.

EDIT: Also thank you very much Brendan for looking over the code.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: QEMU 0.12.3 suddenly executes random code

Post by bluemoon »

I suggest you step into the code with gdb or bochs debugger, to verify that the offset or data and code are indeed expected values.
mich
Posts: 18
Joined: Sun Nov 11, 2012 5:36 pm

Re: QEMU 0.12.3 suddenly executes random code

Post by mich »

bluemoon wrote:I suggest you step into the code with gdb or bochs debugger, to verify that the offset or data and code are indeed expected values.
Done that and on the systems that work everything is as expected, however QEMU 0.12.3. now even ends up what seems to be loading an all 0 (zero) Kernel ... which I now don't want to investigate any deeper and just call it being done with QEMU 0.12.3. Because the hardware and emulator stats now say that my code works on 6 out of 7 systems - with the only odd ball being QEMU 0.12.3.

It also seems I've had and still have issues with GRUB not zeroing BSS correctly which caused some weird behaviour. With that sorted though QEMU 1.x is all good again.

Anyway thank you all for taking your time.
Post Reply