Comparisons on intel CPUs
On intel CPUs and other CPUs, the instruction cmp compares 2 numbers:
cmp ax,bx
This code compares the 16bit register AX to BX and sets the CF, OF, SF, ZF, and PF flags.
These instructions have been unchanged since the 8088 and 8086. Here is the Intel CPU manual for more details (5000 pages):
Intel 64 and IA-32 Architectures Software Developer’s Manuals
The flags set by cmp are:
- CF carry flag
- OF overflow flag
- SF sign flag
- ZF zero flag
- PF parity flag
After cmp, there is one or multiple jump instructions to complete the condition:
cmp ax,bx
ja above
; same as:
u16 a,b;
if (a > b) {}
else {}
; multiple jumps
cmp ax,bx
ja above
je equal
; same as:
u16 a,b;
if (a > b) {}
else if (a == b) {}
else {}
Some of the jump instructions are:
- ja,jae,jnb,jnbe jump if above or equal
- jb,jbe,jna,jnae jump if below or equal
- jc jump if carry
- je jump if equal
- jg,jge,jnl,jnle jump if greater or equal for signed ints
- jl,jle,jng,jnge jump if less or equal for signed ints
- jnc jump if not carry
- jne jump if not equal
- jno jump if not overflow
- jns jump if not sign
- jo jump if overflow
- js jump if sign
- jz jump if zero
ja is for unsigned ints and jg is for signed ints (0xFFFF (-1) < 0x0010 (16)).
It is not always necessary to use cmp to compare numbers. A loop with cmp can be written like this:
mov cx,10
loop:
dec cx
cmp cx, 0
jne loop
; same as
for(u16 c=10;c != 0;c--);
Same loop without cmp is written like this:
mov cx,10
loop:
dec cx
jcxz finish
jmp loop
finish:
dec and inc set the OF, SF, ZF, AF, and PF flags.
Any register can be used as a counter for loop:
mov bx,10
loop:
dec bx
jnz loop
Checking if a value is equal 0 can be done with cmp or the or instruction:
cmp ax,0
jz axIsZero
; or:
or ax,ax ; or is faster than cmp
jz axIsZero
or sets the SF, ZF, and PF flags.
Setting a register to 0 fast is done with xor, it also makes the code shorter:
xor ax,ax ; sets ax to 0
add sets the OF, SF, ZF, AF, CF, and PF flags. When OF is set, there is an overflow and it is handled with jo jump if overflow (or jno):
mov eax, 0FFFFFFFFh
add eax, 1
jno done
; handle overflow
...
done:
The C language doesn't have support for integer overflow detection, when an overflow is possible instead of writing:
unsigned int usum = ui_a + ui_b;
The program has to check the values before the operation to hanlde eventual overflows:
unsigned int usum;
if (UINT_MAX - ui_a < ui_b) {
/* Handle error */
} else {
usum = ui_a + ui_b;
}
On my 80486, I was using MCGA mode (320x200 1 byte per pixel) and to compute the address of pixel I was not using mul:
; ax is y, bx is x: (x,y) (bx,ax)
xchg ah,al ; AX := 256*y
add bx,ax ; BX := 256*y + x
shr ax,2 ; AX/4 AX := 64*y
add bx,ax ; BX := 320*y + x
; BX is now the pixel address
For the same computation written in C, gcc emits this in -O0:
movzx edx, WORD PTR [rbp-8]
mov eax, edx
shl eax, 2
add eax, edx
shl eax, 6
mov edx, eax
movzx eax, WORD PTR [rbp-4]
add eax, edx
In -O3, gcc and clang use lea. Originally lea purpose was to compute addresses in arrays with base address, index and element size.
; gcc
lea esi, [rsi+rsi*4]
shl esi, 6
lea eax, [rsi+rdi]
; clang
shl esi, 6
lea eax, [rsi + 4*rsi]
add eax, edi
lea computes fast but it doesn't set any flag so overflows can't be detected with jo or jno.
On the 80486 and earlier CPUs, these optmisations were significant, because the profile of all instructions changed. On todays CPUs, cache hits, speculative execution and branch prediction to fill the pipeline are more important. Danluu wrote a brief history of branch prediction on his website, I converted the page to gemtext:
A brief history of branch prediction
A brief history of branch prediction (http)
Related:
#Assembly