Page 1 of 1

Is xor ax,ax and mov ax,0 both setting the ax register to 0?

Posted: Tue Jun 12, 2018 10:07 pm
by thewebparadox
because i've never seen xor before and mov ex1,ex2 moves ex2 to ex1 but i observed on some code that somebody did xor ax,ax and in comments said ; setting the ax register to 0

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Tue Jun 12, 2018 11:30 pm
by BenLunt
This is an assembly question more than an os development question, but okay.

I would look up the instruction in the intel manual, or search for "xor instruction". Most any page will tell you exactly what this instruction does.

Code: Select all

     10100110
 xor  00111011
     ---------
     10011101
The difference is that the "mov" instruction will not affect the flags register where the "xor" instruction will.

Ben
- http://www.fysnet.net/osdesign_book_series.htm

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Wed Jun 13, 2018 10:36 am
by Octocontrabass
There are also differences in size: XOR and SUB are one byte smaller than MOV for setting a 16-bit register to 0 (and three bytes smaller for a 32-bit register). This makes it a popular optimization for bootloaders, where space can be very expensive.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Wed Jun 13, 2018 2:01 pm
by Schol-R-LEA
To expand on Ben's answer a bit (thought as Ben said, it really isn't an OS question so much as a general assembly programming question): the XOR instruction performs a bitwise logical operation known as 'exclusive or', in which the value is true if and only if exactly one of the two logical values (in this case, stored as individual bits, with false == 0, true == 1) it is applied to are true.

XOR is one of several bitwise logical operations, the other common ones being AND, OR, and NOT. The general Boolean truth tables for the common logical operations are

Unary Not:

Code: Select all

 F | T
-------
 T | F
And:

Code: Select all

  | F | T
-----------
 F | F | F
-----------
 T | F | T
Regular (Inclusive) Or:

Code: Select all

  | F | T
-----------
 F | F | T
-----------
 T | T | T
Exclusive Or:

Code: Select all

  | F | T
-----------
 F | F | T
-----------
 T | T | F
Since these instructions operate on the bits in a data word, they perform these operation on the individual bits, paired up in the two pieces of data. So, if you have one byte x that holds the binary value 1100 1001 (splitting the nybbles just to make it easier to read)and another byte y that has the binary value 0001 1010, then the results would be

Code: Select all

NOT x
    1100 1001
    -----------
    0011 0110

AND x, y
    1100 1001
    0001 1010
    -----------
    0000 1000

OR x, y
    1100 1001
    0001 1010
    -----------
    1101 1011

XOR x, y
    1100 1001
    0001 1010
    -----------
    1101 0011
Now, here's the trick being used: if you XOR any value against itself, all the set bits cancel, clearing the entire datum. In other words,

Code: Select all

XOR n, n == 0, for all n
Note that these bitwise operators are not specific to assembly language; these same AND, OR, XOR, and NOT operations that are performed by the '&' (ampersand), `|` (vertical bar, or pipe), `^` (caret) and `~` (tilde) operators in C and related languages:

Code: Select all

    uint8_t  a, b, c, d, n, m, x, y;
    x = 0xC9;   //   == binary 11001001 == decimal 201
    y = 0x1A;   //   == binary 00011010 == decimal  26

    a = ~x;     //   == binary 00110110 == dec  54 == hex 36
    b = x & y;  //   == binary 00001000 == dec   8 == hex 08
    c = x | y;  //   == binary 11011011 == dec 219 == hex DB
    d = x ^ y;  //   == binary 11010011 == dec 211 == hex D3
    n = x ^ x;  //   == binary 00000000 == dec   0 == hex 00
    m = y ^ y;  //   == binary 00000000 == dec   0 == hex 00 
I hope this helps, because to be honest, this something you really ought to have down solid before jumping into OS-Dev.

As has already been said, the main reason this is sometimes used is because, on the x86 instruction set (and several others), the XOR operation (also sometimes called EOR or something similar on different instruction set architectures) is encoded in fewer bytes than the MOV operation, and can also be faster in some implementations, too (for example, some of the early 8088 models). Also, as Ben mentioned, it clears the Overflow and Carry flags in the FLAGS register, which MOV doesn't, and that's sometimes useful to do when clearing a register.

This isn't universal, however; for example, both the size and the number of cycles used by the equivalent instructions are the same either way on ARM CPUs, and in the MIPS CPU, the MOVE <regX>, <regY> pseudo-instruction is just an alias for the OR <regX>, $zero, <regY> instruction (the register $0 - also called $zero - is permanently set to zero), while CLEAR <regX> is just OR <regX>, $zero, $zero. The Flags/Status register issues are different, too; the CPSR (Current Processor Status Register) in the ARM design behaves differently from the x86 FLAGS register, while MIPS doesn't have a status register, period (at least not one used for this purpose).

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 6:47 am
by Brendan
Hi,
BenLunt wrote:The difference is that the "mov" instruction will not affect the flags register where the "xor" instruction will.
Octocontrabass wrote:There are also differences in size: XOR and SUB are one byte smaller than MOV for setting a 16-bit register to 0 (and three bytes smaller for a 32-bit register). This makes it a popular optimization for bootloaders, where space can be very expensive.
The other difference is whether or not the CPU (mistakenly) thinks that the instruction depends on the previous value of EAX.

For example, if you do an instruction like "div" (which is slow and updates EAX) followed by "mov eax,0" then all out-of-order 80x86 CPUs will know that the second instruction doesn't have to wait for the slow division to finish; and if you do "div" and then "xor eax,eax" some (older) CPUs will wait for the "div" to finish (and some newer CPUs won't).

Of course even on older CPUs (where there is a false dependency) "xor eax,eax" might still be faster in some cases (if the bottleneck is instruction fetch, and if EAX hasn't been changed for ages anyway).


Cheers,

Brendan

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 8:19 am
by Antti
Perhaps this is not very relevant but I used an "optimization" in my boot code if I wanted to set a register to zero without changing the flags. I was able to assume that a segment register was always zero so a "mov ?x, cs" instruction could be used. The instruction takes two bytes, like the xor instruction. This was mostly for return code paths, for example

Code: Select all

        ...
        jc .Err1
        ...
        jc .Err2
        ...

.Err1:  mov ax, cs              ; (only two bytes)
        ret                     ; return cf = 1

.Err2:  mov ax, 0x0001          ; (something else)
        ret                     ; return cf = 1
Please note that this is not optimized for speed.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 9:54 am
by Octocontrabass
Brendan wrote:Of course even on older CPUs (where there is a false dependency) "xor eax,eax" might still be faster in some cases (if the bottleneck is instruction fetch, and if EAX hasn't been changed for ages anyway).
It can also be faster where the CPU mistakenly thinks future instructions using AX, AH, or AL are dependent on the result of "mov eax,0" but not "xor eax,eax". In some cases, it's fastest to use "mov eax,0" followed immediately by "xor eax,eax" to prevent both past and future false dependencies.

Optimizations like this are too dependent on individual CPUs to make any general statements, so it's best to forget about speed unless you're optimizing for a specific CPU.
Antti wrote:Perhaps this is not very relevant but I used an "optimization" in my boot code if I wanted to set a register to zero without changing the flags. I was able to assume that a segment register was always zero so a "mov ?x, cs" instruction could be used.
Keep in mind that CS may not be 0 if you haven't used a far jump to explicitly set CS.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 11:14 am
by Schol-R-LEA
Octocontrabass wrote:
Brendan wrote:Of course even on older CPUs (where there is a false dependency) "xor eax,eax" might still be faster in some cases (if the bottleneck is instruction fetch, and if EAX hasn't been changed for ages anyway).
It can also be faster where the CPU mistakenly thinks future instructions using AX, AH, or AL are dependent on the result of "mov eax,0" but not "xor eax,eax". In some cases, it's fastest to use "mov eax,0" followed immediately by "xor eax,eax" to prevent both past and future false dependencies.
That's a situation I hadn't heard of before. Can anyone else confirm, and more importantly, does anyone know which processor models might behave this way, and under what circumstances? I don't believe you are lying - such are the idiosyncrasies of model-specific optimization - but 'extraordinary claims demand extraordinary proof' and all that.

If so, it is rather amusing to me that there would be situations where the best speed performance would come from duplicating instructions with identical results, reached in different ways. That's definitely not something you'd normally expect.

It would be somewhat interesting to know just when and where that is likely to happen, and what the downsides of optimizing for that case would be (aside from the obvious use of two instructions, I mean). I doubt it would be a particularly relevant matter anyway, as I can't see a case where clearing a register would be a critical-path bottleneck, but you never know.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 11:51 am
by simeonz
This SO post may be of interest. I haven't checked the Intel Pentium manuals yet, but the answer there claims that "xor reg, reg" had been recognized "zeroing idiom" since the Pentium (correction: Pentium Pro) days.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 12:01 pm
by Schol-R-LEA
simeonz wrote:This SO post may be of interest. I haven't checked the Intel Pentium manuals yet, but the answer there claims that "xor reg, reg" had been recognized "zeroing idiom" since the Pentium (correction: Pentium Pro) days.
I don't know about 'recognized', but it is a lot older than that. I recall reading about it in an old 8088 assembly book (Peter Norton's, I think, or maybe Lefore, both of which I tried and failed to get through back in the 1980s; I had somewhat better luck with The IBM Personal Computer from the Inside Out, but still didn't really get a good grasp of assembly until much later).

For that matter, I am pretty sure it got used in early PDP-11 Unix (both in assembly and in C), but I don't have a copy of Lions' Commentary on hand to check. EDIT: Hey, there's an authorized PDF version of both the commentary and the source code, nice! I will look to see if I was right. Further update: Nope, it isn't used there. Turns out that the PDP-11 had a dedicated Clear Register instruction, and XOR-clearing doesn't seem to have been used in the C code, either.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 12:35 pm
by Octocontrabass
Schol-R-LEA wrote:
Octocontrabass wrote:In some cases, it's fastest to use "mov eax,0" followed immediately by "xor eax,eax" to prevent both past and future false dependencies.
That's a situation I hadn't heard of before. Can anyone else confirm, and more importantly, does anyone know which processor models might behave this way, and under what circumstances? I don't believe you are lying - such are the idiosyncrasies of model-specific optimization - but 'extraordinary claims demand extraordinary proof' and all that.
That example is straight out of Agner Fog's microarchitecture optimization guide. It's in the section that describes the Pentium Pro, Pentium II, and Pentium III. The example scenario is optimizing for a balance of reasonably high speed on a variety of older CPUs without using MOVZX, which is quite slow on some CPUs.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 4:31 pm
by Sik
Schol-R-LEA wrote:
simeonz wrote:This SO post may be of interest. I haven't checked the Intel Pentium manuals yet, but the answer there claims that "xor reg, reg" had been recognized "zeroing idiom" since the Pentium (correction: Pentium Pro) days.
I don't know about 'recognized', but it is a lot older than that. I recall reading about it in an old 8088 assembly book (Peter Norton's, I think, or maybe Lefore, both of which I tried and failed to get through back in the 1980s; I had somewhat better luck with The IBM Personal Computer from the Inside Out, but still didn't really get a good grasp of assembly until much later).
I think it's "recognized" as in the instruction decoder handles it in a special way for the sake of optimization (i.e. the decoder recognizes it as a special case). That means that e.g. XOR EDX, EDX would immediately get rid of previous dependencies, but e.g. XOR EBX, ECX wouldn't, despite both being based on the same opcode.

Re: Is xor ax,ax and mov ax,0 both setting the ax register t

Posted: Thu Jun 14, 2018 5:19 pm
by simeonz
Sik wrote:I think it's "recognized" as in the instruction decoder handles it in a special way for the sake of optimization (i.e. the decoder recognizes it as a special case). That means that e.g. XOR EDX, EDX would immediately get rid of previous dependencies, but e.g. XOR EBX, ECX wouldn't, despite both being based on the same opcode.
Yes, but it sounded like I was referring to the popularity of the assembly programming technique. The SO post is much more detailed about the benefits of the idiom from the hardware standpoint, in some places so much so that it is hard to digest.
In the end however, as per the Pentium manual:
Zero-Extension of Short Integers
The MOVZX instruction has a prefix and takes 3 cycles to execute (a total of 4 cycles).
As with the Intel486 CPU, it is recommended to use the following sequence instead:
xor eax, eax
mov al, mem
If this occurs within a loop, it may be possible to pull the XOR out of the loop if the only
assignment to EAX is the MOV AL, MEM. This has greater importance for the Pentium
processor due to its concurrency of instruction execution.
Clearing a Register
The preferred sequence to move zero to a register is XOR REG, REG. This saves code
space but sets the condition codes. In contexts where the condition codes must be
preserved, use: MOV REG, 0.
The same is cited in the Pentium Pro manual. Now, that does not mean to say that Intel necessarily have the best understanding of the practical facets of their CPU usage, but probably the instruction was specifically catered for.