OSDev.org

Posted: **Wed Jul 31, 2002 4:34 pm**

You gave me some help with trying to create spaces in assembly for my bootsector. I took your example code and tried it out in a simple bootsector. But when ever I run it on my machine, It just locks up.

Here's the example bootsector:

; how to space lines in the bootsector

[bits 16]
[org 0]

start:

mov si,message
call message

call newline

mov si,msg1
call message

newline:

push ax ; pushes the stack onto the ax register
mov ah,0x0E ; set function to teletype mode
mov al,0x0D ; load a carriage return
int 0x10 ; and print it
mov al,0x0A ; load a line feed
int 0x10 ; and print it
ret ; return

message db 'It works!',13,10,0
msg1 db 'It really works!',13,10,0

times 510-($-$$) db 0
dw 0xAA55

If you could help me out I would appreciate it.

Posted: **Wed Jul 31, 2002 4:41 pm**

I forgot to add the call message part to the bootsector.

message:

lodsb
or al,al
jz done

mov ah,0eh
mov bx,7
int 10h
jmp message

done:

ret

I added this and then tryed it again and it still locked up my computer.

Posted: **Wed Jul 31, 2002 10:29 pm**

[attachment deleted by admin]

Posted: **Thu Aug 01, 2002 10:36 am**

Thanks for the help. I have another question that has to do with the example of the bootsector you provided me. I know what the difference is between call and jump but what I want to know is, When should you use call and when should you use jump in the bootsector. And what about the labels that don't get jumped or called by anything.

for example:

exit:

jmp $

newline:

Will the label still work even though you haven't jump to it or called it. If so, How is this possible?

Posted: **Thu Aug 01, 2002 1:22 pm**

I'll address the first question last, as it will help the rest of it all make more sense.

The first thing you have to realize is that a label isn't really like a function name or a goto label in C or Pascal. All that a label is, is a name for a particular address, which the assembler calculates during the first pass of the assembler. As an example, I've assembled the printst.asm file so it generated a list file (using the command prompt "nasm printst.asm -l pritnst.lst"), which shows the details of assembly. Here's part of it where the code generation begins:

Code: Select all

    16                                  entry:
    17 00000000 EA[0500]0000                jmp base:start   ; make sure that the CS is 0000
    18                                  
    19                                  ; the real start of the code
    20                                  start:
    21 00000005 8CC8                        mov ax, cs
    22 00000007 8ED8                        mov ds, ax                  ; set DS == CS
    23 00000009 B80090                      mov ax, stackseg     ; set the stack to an arbitrary free area

if you look at the generated code on line 17, it shows that the actual output for address 0000000 is "EA[0500]0000" which is equivalent to

JMP 0000:0005

(Since the x86 is a little-endian processor, the assembler automatically reverses byte and word orders). Now if you look at address 00000005, it's at program line 21 - which is the first code line immediate after the label 'start'. in other words, the value of 'start' was automatically computed to be 0005, and that was what was the assemlber put in it's place.

When you use a D[B|W|D] or RESP[B|W|D] directive, the 'variable names' are actually labels, no different from the ones in the code; indeed, you can have a DB statement without any label at all (all that RESB and its relatives do is set aside a certain number of memory locations; DB and so forth are the same, except that they also initialize the memory to whatever you happen to put in it. All the label does is give a name to the firsdt of those location's address. The two are completely seperate, even though it is logically a single unit.

BTW, these should not be confused with equates, which are names for the specific values which they are assigned. The names created with the EQU directive is not labels, and don't allocate memory. Tha same goes for macro and struc definitions, as well (in fact, in NASM both equates and strucs are special cases of macros). Don't worry to much about this point for now; what you already need to know is confusing enough. ::)

The point I'm trying to make is that there is no direct connection between a label and how it is used. Any label can be used anywhere an address can go, or rather, anywhere a 16-bit (or 32-bit, in p-mode or unreal mode) immediate value can go. The assembler doesn't try to warn you or prevent you from doing that, or otherwise try to save you from yourself. The assembler assumes that if you, want to multiply a 16-bit address by the first two bytes of a string variable, then well your the human being, by gosh, and who is it to stand in your way? (this is what's called, "giving you enough rope to hang yourself with."

) It usually only stops if there is something it simply cannot make sense of, such trying to mov a 16-bit register vale into an 8-bit one. It will often warn you of really foolish things, and you can set option switches to have ot give you more or fewer warnings, but it won't give an error for anything that it can atually assemble.

One other thing you should understand is indirect addressing, which is more or less like pointer dereferencing. When an argument in NASM has square brackets around it, then the argument is treated as an address which is used to look up a value. So, for one example,

mov al, [es:di]

means "get the value at the address held in ES:DI and put it in AL". Keeping track of which values to use as immediate arguments and which to use as reference arguments can be a real pain.

Posted: **Thu Aug 01, 2002 1:55 pm**

Now that I've said that (whew!) I can explain about call and jump. All that a normal, or 'near' jump does is change the value of the instruction pointer to the value of the argument. A 'far' jump will also change the value of the Code Segment register, as well. So, to use an example from the earlier message,

jmp 0000:start

sets the value of the IP to that of the label 'start', and the value of the CS to 0000. Since the next executed instruction is always the one pointed to by the combination CS:IP, this has the effect of causing execution to continue where it jumped to.

There's also what are called short jumps, which, like conditional jumps, add an offset of between -126 and 127 bytes to the current IP address. These can be useful, as the use one less byte than a regular jump, but aren't really necessary to use.

Now, a near call is just the same as a near jump, except that it does one other thing first: it pushes the address of the next instruction onto the stack (far calls first push the value of CS, so that the segment context is stored as well). This is the main difference between a jump and a subroutine call: the call knows where it came from, and knows how to get back to there.

Getting back is what the return instruction is for. A RET will pop the word (or dword, in p-mode) off of the top of the stack into the IP (or EIP), while RETF will do that and then pop the following word into CS. This causes execution to resume at the instruction following the CALL. Since the are pushed onto the stack, rather than stored into a fixed location, an arbitrary number of calls can be nested and returned from (up to the size of the stack segment, that is).

This is one of the reasons that stack discipline is crucial. If a value is pushed onto the stack in a called function, and not popped back off before the RET, then the return will get corrupted as the CPU tries to execute instructions at what is likely to be an invalid location. The same will happen if the return address is accidentally popped off before the RET executes.

Like with labels, the CPU does not differentiate data types, only data sizes. It is your responsibility to know what is an address, a string, an integer, a floating-point value, etc.

Finally, in the case of

jmp $

as may have been explained to you already, $ is a special NASM directive meaning 'this address'. Thus, this is the same as:

exit:
jmp exit

but slightly more compact and specific. This directive, and a similar one '$$' (meaning 'the beginning local address of this program') are used to determine the number of bytes between the end of the code and data, and the end of the sector.

In closing, I repeat what I said yesterday: you really should see if you can pick up a copy of Assembly Language: Step by Step. It explains all of this material far better than I can, and is well worth the investment. It (along with Messmer's Indispensable PC Hardware Book) is one of the few programming books I can recommend unequivocally.

OSDev.org

I need your help Schol-R-LEA

I need your help Schol-R-LEA

Re:I need your help Schol-R-LEA

Re:I need your help Schol-R-LEA

Re:I need your help Schol-R-LEA

Re:I need your help Schol-R-LEA

Re:I need your help Schol-R-LEA