Comparisons on intel CPUs

On intel CPUs and other CPUs, the instruction cmp compares 2 numbers:

  cmp ax,bx

This code compares the 16bit register AX to BX and sets the CF, OF, SF, ZF, and PF flags.

These instructions have been unchanged since the 8088 and 8086. Here is the Intel CPU manual for more details (5000 pages):

Intel 64 and IA-32 Architectures Software Developer’s Manuals

The flags set by cmp are:

CF carry flag
OF overflow flag
SF sign flag
ZF zero flag
PF parity flag

After cmp, there is one or multiple jump instructions to complete the condition:

  cmp ax,bx
  ja  above

; same as:

u16 a,b;
  if (a > b) {}
  else {}

; multiple jumps
  cmp ax,bx
  ja  above
  je  equal

; same as:

u16 a,b;
  if (a > b) {}
  else if (a == b) {}
  else {}

Some of the jump instructions are:

ja,jae,jnb,jnbe jump if above or equal
jb,jbe,jna,jnae jump if below or equal
jc jump if carry
je jump if equal
jg,jge,jnl,jnle jump if greater or equal for signed ints
jl,jle,jng,jnge jump if less or equal for signed ints
jnc jump if not carry
jne jump if not equal
jno jump if not overflow
jns jump if not sign
jo jump if overflow
js jump if sign
jz jump if zero

ja is for unsigned ints and jg is for signed ints (0xFFFF (-1) < 0x0010 (16)).

It is not always necessary to use cmp to compare numbers. A loop with cmp can be written like this:

  mov cx,10
loop:
  dec cx
  cmp cx, 0
  jne loop

; same as
  for(u16 c=10;c != 0;c--);

Same loop without cmp is written like this:

  mov cx,10
loop:
  dec cx
  jcxz finish
  jmp loop
finish:

dec and inc set the OF, SF, ZF, AF, and PF flags. Any register can be used as a counter for loop:

  mov bx,10
loop:
  dec bx
  jnz loop

Checking if a value is equal 0 can be done with cmp or the or instruction:

  cmp ax,0
  jz  axIsZero

; or:
  or ax,ax     ; or is faster than cmp
  jz  axIsZero

or sets the SF, ZF, and PF flags.

Setting a register to 0 fast is done with xor, it also makes the code shorter:

  xor ax,ax ; sets ax to 0

add sets the OF, SF, ZF, AF, CF, and PF flags. When OF is set, there is an overflow and it is handled with jo jump if overflow (or jno):

  mov eax, 0FFFFFFFFh
  add eax, 1
  jno done
  ; handle overflow
  ...
done:

The C language doesn't have support for integer overflow detection, when an overflow is possible instead of writing:

unsigned int usum = ui_a + ui_b;

The program has to check the values before the operation to hanlde eventual overflows:

  unsigned int usum;
  if (UINT_MAX - ui_a < ui_b) {
    /* Handle error */
  } else {
    usum = ui_a + ui_b;
  }

On my 80486, I was using MCGA mode (320x200 1 byte per pixel) and to compute the address of pixel I was not using mul:

  ; ax is y, bx is x: (x,y) (bx,ax)
  xchg ah,al ; AX := 256*y
  add bx,ax  ; BX := 256*y + x
  shr ax,2   ; AX/4 AX := 64*y
  add bx,ax  ; BX := 320*y + x
  ; BX is now the pixel address

For the same computation written in C, gcc emits this in -O0:

  movzx   edx, WORD PTR [rbp-8]
  mov     eax, edx
  shl     eax, 2
  add     eax, edx
  shl     eax, 6
  mov     edx, eax
  movzx   eax, WORD PTR [rbp-4]
  add     eax, edx

In -O3, gcc and clang use lea. Originally lea purpose was to compute addresses in arrays with base address, index and element size.

; gcc
  lea     esi, [rsi+rsi*4]
  shl     esi, 6
  lea     eax, [rsi+rdi]

; clang
  shl     esi, 6
  lea     eax, [rsi + 4*rsi]
  add     eax, edi

lea computes fast but it doesn't set any flag so overflows can't be detected with jo or jno.

On the 80486 and earlier CPUs, these optmisations were significant, because the profile of all instructions changed. On todays CPUs, cache hits, speculative execution and branch prediction to fill the pipeline are more important. Danluu wrote a brief history of branch prediction on his website, I converted the page to gemtext:

A brief history of branch prediction

A brief history of branch prediction (http)

Coding in assembly in linux

#Assembly