Page 1 of 7

[SOLVED] Partially written string literals

Posted: Sat Nov 07, 2015 10:58 am
by eisdt
Hello OSDev,

I've been struggling with this problem I run into when I try to run some code on my real hardware. I've tried lots of possible solutions but I won't exclude I could have been missing something obvious.

SSCCE

Code: Select all

.code16
.global _start

.equ ORG_SEGMENT, 0x0

.text
_start:
		cli
		jmp $ORG_SEGMENT, $(asmain) # setting up CS:IP

asmain:
		
		xor %ax, %ax
		mov %ax, %ds # data-segment needs to be set up for data access
		mov %ax, %es

		mov $msg, %si # load msg
		mov $0xE, %ah # interrupt service
		xor %bx, %bx  # page num -- don't care
		jmp putstring

putstring:
.loop:
		lodsb # load at %ds:(%si)
		test %al, %al
		jz .done
		int $0x10
		jmp .loop
		
.done:
		cli
		hlt

msg: .asciz "You shall not see this message on real hardware."

.space 510 - (. - _start)
.word 0xaa55

The accused problem is that the BIOS won't print all the string: it stops after a few characters, as though the loop terminated. What's puzzling is that the string is correctly printed on QEMU, although I do understand the environments are not really analogous and less assumptions should be made on real hardware.

I've sorted out many possibilities with some tests and it appears there may be something wrong with the addressing, despite I set up DS, because writing the same character constant in a loop works flawlessly. Furthermore, the issue also endures when writing to 0xB800.

I compile and link with:

Code: Select all

as test.S -o lol.o && ld --oformat binary -o main.img -Ttext 0x7C00 lol.o
Thanks for your attention.

----------------------
I managed to solve by writing to the VGA display directly rather than relying on the untrustworthy INT 0x10 that seems to cause some BIOSs to overwrite regions of the MBR, as turned out from intx13's significant job. More details here onward.

Re: Partially written string literals

Posted: Sat Nov 07, 2015 11:31 am
by Octocontrabass
eisdt wrote:although I do understand the environments are not really analogous and less assumptions should be made on real hardware.
What is the state of the D flag? What is the value of SS? What is the value of SP?

The BIOS is not guaranteed to leave any of these in a usable state, but you're using them.

Re: Partially written string literals

Posted: Sat Nov 07, 2015 11:54 am
by eisdt
Octocontrabass wrote: What is the state of the D flag? What is the value of SS? What is the value of SP?

The BIOS is not guaranteed to leave any of these in a usable state, but you're using them.
Good point about D, although it didn't solve. BTW, what do you mean by "what is the value"? I can't write a string, let alone integer values.

Also, may you tell me where I'm using SS/SP? I can't see any use of them, considering I'm using basic jmps rather than call/ret and no push/pop. BIOS INTs don't need them either, at least 0x10.

Re: Partially written string literals

Posted: Sat Nov 07, 2015 12:06 pm
by Octocontrabass
eisdt wrote:Also, may you tell me where I'm using SS/SP?

Code: Select all

int $0x10
I suggest you do some research and learn exactly what this instruction is doing.

Re: Partially written string literals

Posted: Sat Nov 07, 2015 12:10 pm
by iansjack
eisdt wrote:BIOS INTs don't need them either, at least 0x10.
That's an interesting point of view. I think you need to do a little more research on how the processor works.

Re: Partially written string literals

Posted: Sat Nov 07, 2015 3:46 pm
by Combuster

Re: Partially written string literals

Posted: Sun Nov 08, 2015 9:27 am
by eisdt
iansjack wrote:
eisdt wrote:BIOS INTs don't need them either, at least 0x10.
That's an interesting point of view. I think you need to do a little more research on how the processor works.
Yes I did miss some things there: thanks for pointing out. I think I didn't look into that because writing the same character in a loop works, despite the absence of a valid stack; I guess it's "luck" of having some remaining valid value.

Still, setting up a valid stack has not solved; the region used is 0x8E00 - 0x7E00.

Code: Select all

.equ STACK_SIZE, 0x1000
.equ STACK_END, STACK_SIZE + 0x7E00

.text
_start:
+		mov $STACK_SIZE, %ax
+		mov %ax, %ss
+		mov $STACK_END, %sp
I still think there's something wrong with the addressing of the string, but I can't see what.

Re: Partially written string literals

Posted: Sun Nov 08, 2015 9:47 am
by Octocontrabass
eisdt wrote:Still, setting up a valid stack has not solved; the region used is 0x8E00 - 0x7E00.
No it's not. You need to read more about how memory addressing works in real mode.

Re: Partially written string literals

Posted: Sun Nov 08, 2015 10:52 am
by eisdt
Octocontrabass wrote: No it's not. You need to read more about how memory addressing works in real mode.
Please try to be more verbose. If I understand correctly, the PA I'm actually referring is not 0x8E00 but

Code: Select all

SP = 0x8E00
SS = 0x1000
PA = 8E000 + 0x1000 = 0x8F000
To actually use 0x8E00, I should have 0x7E0:0x1000 for SS/SP, respectively. Is that right?

Re: Partially written string literals

Posted: Sun Nov 08, 2015 11:12 am
by Octocontrabass
eisdt wrote:

Code: Select all

SP = 0x8E00
SS = 0x1000
PA = 8E000 + 0x1000 = 0x8F000
Close. You switched the segment and offset in the last part.

Code: Select all

PA = 0x10000 + 0x8E00 = 0x18E00
eisdt wrote:To actually use 0x8E00, I should have 0x7E0:0x1000 for SS/SP, respectively. Is that right?
That certainly works, but I prefer to keep all of the segment registers the same value (as much as possible). To do that in your code, use 0x0000:0x8E00 for SS:SP.

Re: Partially written string literals

Posted: Sun Nov 08, 2015 11:34 am
by eisdt
Very well.

Code: Select all

xor %ax, %ax
...
+ mov %ax, %ss
+ mov $0x8E00, %sp

But this has not fixed the primary issue. Any ideas? I've honestly run out of them. Let me know if some details I haven't provided may be needed.

Re: Partially written string literals

Posted: Sun Nov 08, 2015 12:02 pm
by Octocontrabass
I don't see any other problems with your code (although I'm not fluent in AT&T syntax, so I may have missed something), so the next thing to check is the data on the disk you're using to test. Read back a copy of that data and compare it to what you wrote.

Re: Partially written string literals

Posted: Sun Nov 08, 2015 12:40 pm
by iansjack
You are assuming that register AH is preserved by the interrupt (and also SI). You may be right, but I can't find any reference that says this is guaranteed. (You also assume the same for BX, but this is guaranteed to be preserved by the interrupt.)

Re: Partially written string literals

Posted: Sun Nov 08, 2015 1:55 pm
by eisdt
Latest source code:

Code: Select all

.code16
.global _start

.equ STACK_SEGMENT, 0x8E00
.equ ORG_SEGMENT, 0x0

.text
_start:
		cli
		jmp $ORG_SEGMENT, $(asmain)

asmain: 
		xor %ax, %ax
		mov %ax, %ds
		mov %ax, %es 
		mov %ax, %ss

		mov $STACK_SEGMENT, %sp

		mov $'!', %al # Notice
		mov $0xE, %ah
		int $0x10
		
		mov $msg, %si # load msg
		jmp putstring

putstring:
		cld 
		mov $0xE, %ah # interrupt service
		xor %bx, %bx  # page num -- don't care

.loop:
		lodsb # load at %ds:(%si)
		test %al, %al
		jz .done

		int $0x10
		mov $0xE, %ah
		jmp .loop
.done:
		cli
		hlt	
		jmp .done

msg: .asciz "You shall not see this message on real hardware."

.space 510 - (. - _start)
.word 0xaa55
You may notice that I give to print a '!', to ensure the code is even executed. It does, apart from the message. It's as if SI pointed wrongly to some zero in memory, and thus there's nothing to do, because there's something I am lacking that on QEMU gets filled, but, on real hardware, it doesn't.
iansjack wrote:You are assuming that register AH is preserved by the interrupt (and also SI). You may be right, but I can't find any reference that says this is guaranteed. (You also assume the same for BX, but this is guaranteed to be preserved by the interrupt.)
AH is now restored after the INT, but why SI? It's not a parameter, and shouldn't be touched AFAIK.

There is a thing I'm wondering about: when using AS+LD, I tell LD where to put the text section. OTOH, on some bootsectors compiled with NASM (so a directly flat binary with no linking) this is not done. Could the string be loaded incorrectly due to this?

Re: Partially written string literals

Posted: Sun Nov 08, 2015 5:19 pm
by intx13
eisdt wrote:There is a thing I'm wondering about: when using AS+LD, I tell LD where to put the text section. OTOH, on some bootsectors compiled with NASM (so a directly flat binary with no linking) this is not done. Could the string be loaded incorrectly due to this?
In flat binary mode, nasm doesn't locate your sections at all. You have to tell nasm where that section will ultimately be loaded into memory. For example,

Code: Select all

section mycode vstart=0x7C00
xchg BX, BX
jmp $
This will assemble to a file containing just a few bytes that could be written into the MBR. The "vstart" argument is used to tell nasm that ultimately those few bytes will be located at address 0x7C00 (because that's where BIOS will load the MBR) and so nasm should keep that in mind when calculating addresses while assembling.

Are you still having the symptom of printing out some of the string and then hanging? If so, it's not an issue with the computed section address, since then you wouldn't be locating any of the string.