Problem with binary-file of a compiled c-program

mattman · Post by **mattman** » Fri Aug 24, 2007 11:18 am

Hello,

I wrote a small c-program to print 4 letters. I do not use any os-functions. The output is done by inline-assembling using the interrupt 10h.

I used borland's tcc and tlink to create the binary output file:

tcc -c shell.c
tlink /t /n /m shell.obj, out.bin

The c-program is defined as follows:

001 void printLetter( char p_byte ){
002
003 asm mov al, p_byte;
004 asm mov ah, 09h;
005 asm mov bh, 0;
006 asm mov bx, 112;
007 asm mov cx, 4;
008 asm int 10h;
009 }
010
011 void printString( char p_val[] ){
012
013 printLetter( p_val[1] );
014 }
015
016 int main( ) {
017
018 char* txt = "12";
019 printString( txt );
020 }

When I launch the program 4 letters are printed correctly, but
not the letter '2' as expected.

I tried to figure out what the problem is and disassembled out.bin:

printLetter:
00000000 55 push bp
00000001 8BEC mov bp,sp
00000003 8A4604 mov al,[bp+0x4]
00000006 B409 mov ah,0x9
00000008 B700 mov bh,0x0
0000000A BB7000 mov bx,0x70
0000000D B90400 mov cx,0x4
00000010 CD10 int 0x10
00000012 5D pop bp
00000013 C3 ret

printString:
00000014 55 push bp
00000015 8BEC mov bp,sp
00000017 8B5E04 mov bx,[bp+0x4]
0000001A FF7701 push word [bx+0x1]
0000001D E8E0FF call 0x0
00000020 59 pop cx
00000021 5D pop bp
00000022 C3 ret

main:
00000023 56 push si
00000024 BE0E00 mov si,0xe
00000027 56 push si
00000028 E8E9FF call 0x14
0000002B 59 pop cx
0000002C 5E pop si
0000002D C3 ret

datasegment:
0000002E 3132 xor [bp+si],si
00000030 00 db 0x00

printLetter and printString are ok. Call increases sp by a word and
bp is pushed onto stack at the function's beginning. Therefore [bp+0x4]
points to the right content.

If I open out.bin in a text-editor the string "12" is the last two chars
in file. This is similar to the linker's mapping output that locates the
datasegment at 2Eh as described in the disassembler's output above.

The thing I do not understand is what is done at address
00000024 BE0E00 mov si,0xe
00000027 56 push si
This is the value passed to printString as the address of parameter "txt".

What does 0xe mean? It neither can be the offset in the data-segment nor
is it a valid offset on stack. I think that is the reason why the printed
letter does not match the implemented one.

But what do I have to do to fix the bug ?

Can anybody help?

Thanks in advance.

Mathew

hailstorm · Post by **hailstorm** » Sat Aug 25, 2007 1:05 pm

Pretty strange indeed. But I think it has something to do with the memory model you are using. I am assuming you use the tiny model and there is a chance that ds has to be set to a correct value like cs+2...

Hope this helps.

mattman · Post by **mattman** » Sun Aug 26, 2007 10:14 am

Thank you for the reply.

I looked for some information about TINY-MODEL and FAR-addressing:

TINY_MODEL means the segment-size is limited to 64KB maximum.
The book says the upper part of FAR-addresses must be divisable by 16 and therefore the address specified in ds must be divisable by 16, too.

The last available address before 2Eh is 20h assuming 0000h as the
code-segment content.
Therefore the missing value "0xE" is passed to si.

I think that's what i was looking for. Thanks a lot.

But now I faced a second problem:

The c-binary is loaded by a small loader.asm - file from mbr of floppy disk to address 1000h; it is launched as follows:

// Code-Segment
001 mov ax, 1000h
002 push ax
// IP - Set to "_MAIN"
004 mov ax, 0023h
005 push ax
// Use last stack-entries
006 retf

But now I am not exactly sure how to initialize "ds":

The binary is loaded at 1000h. The string "12" is located
at 1000h + 2Eh = 102Eh then. The datasegment must be
divisable by 16. Therefore the last valid address for ds is 1020h
( 0xE is added as offset by si ).

My book says the ds-content is multiplied by 16 before adding the
offset in far-addressing. That means ds has to be initialized with
102h then:

001 mov ax, 102h
002 mov es, ax
003 mov ds, ax

According to my book the result would be built as follows:

102h is multiplied by 16: 1020h
The offset of text is added: 0xEh
-----------------------------------
102Eh

I tried this coding but somehow it does'nt work. I am not exactly sure if there is a mistake in my ds-initializing as described above or if there's another bug in my loader.asm.

Thanks a lot in advance.

Mathew.

jnc100 · Post by **jnc100** » Sun Aug 26, 2007 3:52 pm

The small memory model (under 16-bit dos) specifies that there should be one segment for code and one for data, unlike the tiny one, which expects them both to be located in the same segment. In this case a segment is a 64kiB region of memory.

Therefore, to use the small memory model, before executing your code you must choose a segment for both code and data. The loader will typically do this, and then set cs, ds, es, fs, gs and ss accordingly. E.g. your loader should do:

- choose some addresses, lets say 0x10000 for code and 0x20000 for data
- load the code section of the executable to 0x10000 and the data section to 0x20000
- pick somewhere for a stack, lets say 0x30000 - 0x40000
- now set the segment registers:
* cs should be 0x1000 (such that 0x1000 * 16 = 0x10000 or where we loaded the code) although you can't set it directly, more on this later
* ds, es, fs, gs should be 0x2000
* ss should be 0x3000
- set up the stack by setting sp to 0x0 (so that the first push will decrement it to 0xfffd, which in segment 0x3000 corresponds to linear address 0x3fffd)
- ip should be set to the offset of the entry point within the code segment (you could typically jump to it and set cs at the same time by pushing cs then ip and then far return to it)

How you actually determine where the the code and data segments start/end in your executable file depends on the file format produced by tlink with those options you specified. I don't know what they do as I don't have tlink lying around anymore. Whatever file format is used, your loader will need to understand it. Similarly for determining the entry point.

Regards,
John.

hailstorm · Post by **hailstorm** » Mon Aug 27, 2007 12:58 am

I totally agree with you John, and though it is best to follow your advise, it can be hard to be build a loader that does all these things (at first). So, let's keep it simple and see if we can help mattman with his problem, using his point of view.

Mattman, you're assumptions about the segment values are partially right. But you make one misstake. That is, you mix the offsets in the datasegment and the value for the datasegment wrong. You say that the file is loaded at 0x1000. The data thus also lies in this segment, but a few paragraphs further. The offset value of the string is @ 0x2e. This value should be shifted right by 4 (division by 16, so to speak). This leaves a value of 2 which can be added to the segment value (0x1000). So don't mangle the segment value itself!

The binary is loaded at 1000h. The string "12" is located
at 1000h + 2Eh = 102Eh then. The datasegment must be
divisable by 16. Therefore the last valid address for ds is 1020h
( 0xE is added as offset by si ).

Your reasoning should be like this:

The binary is loaded at 1000h:0000h. The string "12" is located
at 1000h:0000h + 2Eh = 1000:002Eh then. The datasegment must be
divisable by 16. Therefore the last valid address for ds is 1002h
( 0xE is added as offset by si ).

(1000h:002eh = 1002h:000eh)

I think you can work it out now, good luck!

OSDev.org

Problem with binary-file of a compiled c-program

Problem with binary-file of a compiled c-program

Initializing "ds"