Page 1 of 1

ORG and near jumps in bootloader

Posted: Thu Nov 09, 2017 5:51 pm
by 4dr14n31t0r
I have trouble understanding the ORG preprocessor directive.

I know that near jumps are the same as far jumps but using as segment the value of cs register:

Code: Select all

jmp near 0x1234 --> jmp far cs:0x1234
I know that the value of cs register is undefined when the bootloader starts. However I checked it's value on qemu and it is always zero.
Whenever I do a far jump the result is exactly the same with or without

Code: Select all

org 0x7c00
But if

Code: Select all

jmp near 0x1234
is the same as

Code: Select all

jmp cs:0x1234
and cs is set to zero, then

Code: Select all

jmp near 0x1234
is the same as

Code: Select all

jmp 0x0000:0x1234
If the location where the far jump jumps doesn't change with(out) the org directive and the cs register is always set to zero, being jmp near 0x1234 the same as jmp 0x0000:0x1234, Why does it change when I use the near jump? What does ORG exactly does?
As Paul R says here:https://stackoverflow.com/a/3407190/5744858
ORG is used to set the assembler location counter
What is that location counter and how does it change the operand of the jmp instruction?

Note that actually I am not jumping to 0x1234. That location is just an example.

Re: ORG and near jumps in bootloader

Posted: Thu Nov 09, 2017 8:21 pm
by Brendan
Hi,
4dr14n31t0r wrote:I have trouble understanding the ORG preprocessor directive.
OK, let's start with a very simple example (with data and no code at all):

Code: Select all

    org 0x0000

    dw myThing              ;Store the address of the "myThing" label in the output file

myThing:
In this case, the assembler thinks that the output file will be loaded at offset 0x0000 within the segment, so it determines that "myThing" is 2 bytes after that at offset 0x0002, so the file will contain the data 0x0002 (or 0x02, 0x00 as bytes because 80x86 is "little endian").

What if we change the ORG?

Code: Select all

    org 0x1234

    dw myThing              ;Store the address of the "myThing" label in the output file

myThing:
In this case, the assembler thinks that the output file will be loaded at offset 0x1234 within the segment, so it determines that "myThing" is 2 bytes after that at offset 0x1236. The file will contain the data 0x1236 (or 0x36, 0x12 as bytes because 80x86 is "little endian").

Let's add a move instruction:

Code: Select all

    org 0x1234

    dw myThing              ;Store the address of the "myThing" label in the output file

myThing:
    mov si,myThing
Here, it's similar to before - the assembler thinks that the output file will be loaded at offset 0x1234 within the segment, so the value 0x1236 is moved into SI.

Let's try some jumps:

Code: Select all

    org 0x1234

variable:
    dw myThing              ;Store the address of the "myThing" label in the output file

myThing:
    mov si,myThing
    jmp 0x0000:myThing      ;Absolute far jump
    jmp word [variable]     ;Absolute indirect jump
In this case the first jump instruction will become "jmp 0x0000:0x1236". The second jump will become "jmp word [0x1234]", and because the value at offset 0x1234 is 0x1236 it'll end up jumping to offset 0x1236.

Every single thing I've mentioned so far depends on what ORG says; and if you change ORG everything else will change.

Let's try some cases where ORG is irrelevant:

Code: Select all

    org 0x0000

variable:
    dw 0x1236

myThing:
    mov si,0x1236
    jmp 0x0000:0x1236      ;Absolute far jump
    jmp word [0x1234]      ;Absolute indirect jump
    jmp myThing
For most of these cases the assembler uses the value you told it to use and doesn't calculate the value itself, so the ORG makes no difference. Of course if you add or remove anything you'll have to calculate all of the values yourself by hand (and if you get one wrong it will create bugs), so it's a massive code maintenance nightmare.

The last instruction ("jmp myThing") is a relative jump. For this the assembler determines the address of the target "myThing" (which depends on ORG and will be wrong if ORG is wrong) and then subtracts the address of the byte after the instruction (which also depends on ORG and will be wrong if ORG is wrong); but this subtraction cancels out. Essentially it's like this:

Code: Select all

    (bytes_from_start_of_file_to_target + ORG) - (bytes_from_start_of_file_to_address_after_instruction + ORG)
Which is the same as this:

Code: Select all

    bytes_from_start_of_file_to_target - bytes_from_start_of_file_to_address_after_instruction
..which gives the same value regardless of ORG because the ORG cancels out.

However, for "jmp 0x1234" it'd be:

Code: Select all

    0x1234 - (bytes_from_start_of_file_to_address_after_instruction + ORG)
..which does depend on ORG because the ORG isn't cancelled out.

Now...

If you tell the assembler that the output file will be loaded at offset 0x1234 within a segment (by using "ORG 0x1234") but the file is actually loaded at offset 0x0000 within a segment, then the assembler will get everything that depended on ORG wrong. For normal code (that uses labels to avoid a code maintenance nightmare) this means that all of your code will be broken when ORG is wrong (except for things like relative jumps which don't depend on ORG).


Cheers,

Brendan

Re: ORG and near jumps in bootloader

Posted: Fri Nov 10, 2017 4:57 am
by 4dr14n31t0r
Conclusion: Being $ the adress of the current instruction + ORG value, then

Code: Select all

jmp near 0x1234
is the same as

Code: Select all

jmp short 0x1234 - $
The problem was that I believed that

Code: Select all

jmp near 0x1234
was the same as

Code: Select all

jmp cs:0x1234
Why does people says that near jumps are jumps in the same segment when the cs register is not even used?

Re: ORG and near jumps in bootloader

Posted: Fri Nov 10, 2017 6:10 am
by iansjack
Because the segment register isn't used. So they can only be a jump to a location in the current segment.

(Well, of course it is used - it just isn't changed.)

Re: ORG and near jumps in bootloader

Posted: Fri Nov 10, 2017 8:21 am
by Schol-R-LEA
Brendan wrote:The last instruction ("jmp myThing") is a relative jump. For this the assembler determines the address of the target "myThing" (which depends on ORG and will be wrong if ORG is wrong) and then subtracts the address of the byte after the instruction (which also depends on ORG and will be wrong if ORG is wrong); but this subtraction cancels out. Essentially it's like this:

Code: Select all

    (bytes_from_start_of_file_to_target + ORG) - (bytes_from_start_of_file_to_address_after_instruction + ORG)
Which is the same as this:

Code: Select all

    bytes_from_start_of_file_to_target - bytes_from_start_of_file_to_address_after_instruction
..which gives the same value regardless of ORG because the ORG cancels out.
I am afraid that it was my previous statements which were misleading to the OP. I thought that for an unspecified

Code: Select all

    jmp myThing
NASM would assemble it to something like FF <16-bit absolute offset>, but if I understand what you are saying correctly, it is assembling to E9 <16-bit relative offset>.

Now then, looking at what this opcode reference says about the JMP instruction, I see that FF takes a mod r/m <size> argument, and none of the opcodes which JMP can assemble to are non-indexed 16-bit absolute addresses. The only code that would assemble to FF would be of the form JMP [<register> + <displacement>], unless I am still mistaken.

Which means that JMP <label> would, as you stated, assemble to E9 <16-bit relative offset>.

I am not sure where my confusion arose, though I have some ideas (I think it came from incorrect recollection of things I had read back in the 1990s, though why I am so damnably befuddled is anyone's guess).

So, I apologize for my incorrect statements in the previous thread, 4dr14n31t0r. I clearly am still failing to do enough due diligence in my answers here.

Re: ORG and near jumps in bootloader

Posted: Fri Nov 10, 2017 11:10 am
by 4dr14n31t0r
They key is that there is no jmp near absolute direct instruction:
http://x86.renejeschke.de/html/file_mod ... d_147.html
As you can see in that link, we only have 2 near jmp instructions that takes only 1 constant number:
E9 cw JMP rel16 Jump near, relative, displacement relative to next instruction.
E9 cd JMP rel32 Jump near, relative, displacement relative to next instruction.
However, both of them are relative jumps. If I want to use a jmp near absolute, it have to be indirect:
FF /4 JMP r/m16 Jump near, absolute indirect, address given in r/m16.
FF /4 JMP r/m32 Jump near, absolute indirect, address given in r/m32.

Re: ORG and near jumps in bootloader

Posted: Fri Nov 10, 2017 12:53 pm
by MichaelFarthing
This is fundamentally accurate, though actually it is possible to have a modRm that consists of a displacement only
ie [ void register ] + displacement.

The (16 bit) coding would be FF 16 34 12 for the example previously given of jmp 0x1234 (absolute in current cs segment).

The way this would actually be written in the source code would depend on the assembler, and never to my knowledge having sought to do it I don't know how to. Instinctively, however, it might have to be written jmp []+0x1234

Re: ORG and near jumps in bootloader

Posted: Sat Nov 11, 2017 7:09 am
by MichaelPetch
4dr14n31t0r wrote:I know that the value of cs register is undefined when the bootloader starts. However I checked it's value on qemu and it is always zero.
Just because one environment appears to be 0 it doesn't mean others won't. Back in the old days the El Torrito specification suggested the default segment used to transfer control to a bootloader was 0x07c0 (and not 0x0000). In some versions of Bochs if you boot as a floppy or hard drive the segment is 0x0000 and if you boot from a CD-ROM it is 0x07c0. There are real world BIOSes (usually much older ones) where the segment may not be zero.

Although 0x07c0:0x0000 and 0x0000:0x7c00 point to the same physical address there are situations where a bootloader can be written in such a way that the code may fail depending on the segment used. I wrote about such situation in this Stackoverflow Question/Answer. Effectively you can write your code to avoid the rarer situations where the actual value of CS matters or you can have your bootloader do a FAR JMP to set explicitly set CS. The worst thing you can do is copy the value of CS to DS,ES etc. I don't ever recommend doing this without a FAR JMP preceding it:

Code: Select all

mov ax, cs
mov ds, ax
mov es, ax 
If you do this then you propagate a potentially unwanted value from CS to the other segments (especially DS).

Re: ORG and near jumps in bootloader

Posted: Sat Nov 11, 2017 4:49 pm
by Octocontrabass
MichaelFarthing wrote:The way this would actually be written in the source code would depend on the assembler, and never to my knowledge having sought to do it I don't know how to. Instinctively, however, it might have to be written jmp []+0x1234
In NASM syntax, it's the same as any other effective address on any other instruction:

Code: Select all

mov ax, [0x1234]
jmp [0x1234]
I would expect other assemblers to also accept their usual syntax for effective addresses.

Re: ORG and near jumps in bootloader

Posted: Sun Nov 12, 2017 2:19 am
by MichaelFarthing
I was thinking in Intel manual at the time, where for some reason the displacement is written outside the square brackets.