The C16 CPU has 32 general purpose registers each 16bit, 32bit or 64bit wide.

I don't use the version with 16bit wide registers because the address space is too small and it needed banking RAM to address more than 64KB.

The ISAs for the 32bit and 64bit version are identical, a 64bit program can run on the 32bit CPU with small changes, mainly changing the addresses to 32bit.

This CPU is called C16 because all the opcodes are 16bit words, it makes the instruction decoder simple and C16 could read 128 bytes per clock cycles and decode 64 instructions.

The goal of the ISA is to have as many bits as possible for the jump (jmp) instruction and have as many as possible instructions with 2 register operands.

  • When the ISA has 64 registers, it takes 12 bits to select 2 registers and there are not enough instructions with 2 operands.
  • The jmp instruction has 14 bit signed immediate value because when it has 15 bits there are not enough instructions with 2 operands.

There are 2 stacks one for data and one for return addresses of functions. It is not possible to change the return address of a function when accessing the data on the data stack since the return address is not stored in the data stack.

The loadi instruction load 8bits to register r0, it is the only instruction for loading immediate data to registers.

The conditional jump instructions have 10bit immediate values and can jump +- 512 instructions.

Some instructions can be fused and executed as if it was one instruction. Loading the value 0xff000000 to r0 and r1 can be executed as:

  • write 0xff000000 to r0 and r1, one fused instruction instead of 3.
  xor   r0, r0
  loadi 0, 0xff
  mv    r1, r0

The ISA is able to be extended to support many features:

  • Variable load/store size (from 8bit access)
  • Variable compute size (from 8 bit)
  • Floating point
  • Vector load/store (multiple registers)
  • Vector compute
  • Inference and matrix operation accelerators
  • Multiple execution mode (machine, hypervisor, supervisor and application modes)
  • Pipelining with branch prediction
  • Superscalar architecture
  • Out of order execution
  • Virtual memory
  • Multi level cache
  • Multi core
  • Virtual machine

The performance will be poor if all these features are implemented, because opcodes will reused for integers, floats and vectors and the execution needs to be in-order when switching mode. If the execution is out-of-order and the program changes from integer to float, the CPU might execute instructions in the wrong mode. For these features, the opcodes need to be longer than 16bit or variable length.

The ISA specification is parsed to generate the assembler, the decoder for the virtual CPU and for the hardware CPU.

In current implementation, the instructions execute in between 2 to 5 clock cycle, the design is not pipelined and it run at 50mhz or 100mhz.

This page is about the 32bit c16 ISA.

Registers

r0 to r31 are the general purpose registers. flags, pc (program counter), rsp (return address stack pointer), dsp (data stack pointer), itb (interrupt table address), clkcnt (clock counter) and icnt (instruction counter) are address mapped, it saves bits in the opcode space.

Address map for the 32bit c16:

  • 0xFFFFFFF8 flags
  • 0xFFFFFFF0 pc
  • 0xFFFFFFE8 rsp return address sp
  • 0xFFFFFFE0 dsp data address sp
  • 0xFFFFFFD8 itb interrupt table address
  • 0xFFFFFFD0 RESERVED
  • 0xFFFFFFC8 clkcnt clock counter
  • 0xFFFFFFC0 icnt instruction counter

Flags register

The meaning of the bits in the flags register is:

  • 0 RESERVED
  • 1 RESERVED
  • 2 RESERVED
  • 3 RESERVED
  • 4 RESERVED
  • 5 RESERVED
  • 6 RESERVED
  • 7 RESERVED
  • 8 RESERVED
  • 9 RESERVED
  • 10 RESERVED
  • 11 RESERVED
  • 12 RESERVED
  • 13 RESERVED
  • 14 Clock counter available
  • 15 RESERVED
  • 16 RESERVED
  • 17 RESERVED

State flags:

  • 18 C Carry
  • 19 Z Zero
  • 20 S Sign
  • 21 O Overflow
  • 22 I IntEnable Interrupt enable
  • 23 SF SetFlags Set flags in instructions enabled/disabled
  • 24 RESERVED
  • 25 RESERVED

There are conditional jump instructions to check the carry, zero, overflow and sign flags.

ja (jump if above), jae (jump if above or equal, same instruction as jnc), jb (jump if below, same instruction as jc), jbe (jump if below or equal) are for unsigned integer comparisons.

jg (jump if greater), jge (jump if greater or equal), jl (jump if less), jle (jump if less or equal) are for signed integer comparisons.

Instructions

Some instructions have fix registers:

  • loadi loads data to r0
  • mula (multiply and add) use r1 as index and r2 as base address
  • test, bis, bic, bii and bext use r3 as index

Instructions decription

  • jmp is unconditional jump and has a 14bit immediate which is interpreted as a signed integer, it can jump +- 8192 instructions.

  • je, jz, jne, jnz, ja, jae, jb, jbe, jc, jnc, jo, jno, js, jns, jg, jge, jl, jle are conditional jump and have 10bit immediate value which is interpreted as a signed integer, they can jump +- 512 instructions. e is equal, z is zero, e and z are both the Z flag, n is not, a is above, b is below, c is carry, o is overflow, s is sign.

  • pushpc stores pc+2 to the stack, pushpc followed by jmp is a function call (short range).

  • test (test rd, rs) performs a bitwise AND operation on rd and rs shifted by r3.

  • bis (bis rd, rs - bit set) performs a bitwise OR operation on rd and rs shifted by r3.

  • bic (bic rd, rs - bit clear) clears bits in rd according to the pattern in rs shifted by r3.

  • bii (bii rd, rs - bit invert) r3[5:0] is the shift index, r3[13:8] is bitfield length, rs is the bit pattern. bii inverts bits in rd according to the pattern in rs shifted by r3[5:0] with bitfield length of r3[13:8].

  • bext (bext rd, rs - bitfield extract) extracts the bits between msb r3[13:8] and lsb r3[5:0] to rd.

  • loadi (loadi n, byte - load immediate) loads a byte to r0 at byte index n (0 to 3).

  • xchg (xchg rd, rs - exchange) swaps rd with rs.

  • add, sub, mul, and, or, xor (add rd, rs) are addition, substraction, multiply, bitwise and, bitwise or, bitwise exclusive or. They affect the C, Z, S, O flags.

  • nop is no operation.

  • load (load rd, rs) reads the value at address rs to rd.

  • store (store rd, rs) writes rs to address rd.

  • cmp (cmp rd, rs - compare) compares rd to rs and set flags.

  • inc (inc rs - increment) adds 1 to rs, the C, Z, S, O flags are affected.

  • dec (dec rs - decrement) substracts 1 to rs, the C, Z, S, O flags are affected.

  • br (br rs - branch) sets pc to rs.

  • call (call rs - function call) stores pc+2 to stack, sets pc to rs and updates rsp to rsp-4.

  • push (push rs) writes rs to stack and update dsp to dsp-4.

  • pop (pop rs) reads stack and stores value in rs and update dsp to dsp-4.

  • ldsp, lrsp (ldsp rs - load dsp/rsp) stores dsp/rsp to rs.

  • sdsp store rs to dsp.

  • mula (mula rd, v - multiply and add) shifts r1 by v, adds r2 and writes result to rd (rd = r1 << v + r2). v is an immediate value between 0 and 15. mula computes an address in an array from an index an the address of the first element in the array.

  • shl, shr (shl rd, rs) shifts left/right rd by rs.

  • asr (asr rd, rs) is arithmetic shift right and fills the higher bits with the original MSB.

  • shli and shri (shli rd, 1 - shift left or right) have a 3 bit immediate value and shift from 1 to 8 bits.

  • asri (asr rd, 2) is arithmetic shift right with 3 bit immediate value and fills the higher bits with the original MSB.

  • cli, sti masks/unmasks interrupts (software interrupts are not affected).

  • clf clears C, Z, O, S flags.

  • int (int n) is software interrupt. It stores pc+2 to stack, reads the address for interrupt n in the itb interrupt table, sets it to pc and updates rsp to rsp-4.

  • pushf (push flags) writes flags to stack and update dsp to dsp-4.

  • popf (pop flags) reads stack and stores value in flags and update dsp to dsp-4.

  • clsf, stsf (clear/set set flags) enables/disables changing C, Z, O, S flags in instructions affecting flags.

  • ret is function return, reads value at address rsp to pc and sets rsp+4.

  • iret is function return, reads value at address rsp to pc and sets rsp+4.

  • not (not rs) inverts bits in rs.

  • div and sdiv are unsigned and signed divisions

  • div10 divides by 10

  • lzcnt counts the leading zeros in a register

  • neg is arithmetic negation: rn = -rn

Interrupts

The interrupt table stores the address of the interrupt handlers, it has 32 entries and is located at address itb. Inerrupt 8 is the hardware interrupt and interrupts from 16 to 31 are free to use by software. Interrupt 0 to 15 are usable but it is not future proof to use them as they will be used in the future versions of the CPU (syscall, divide error, exeptions, timer, ...).

  • 0 RESERVED
  • 1 RESERVED
  • 2 RESERVED
  • 3 RESERVED
  • 4 RESERVED
  • 5 RESERVED
  • 6 RESERVED
  • 7 RESERVED
  • 8 hw interrupt
  • 9 RESERVED
  • 10 RESERVED
  • 11 RESERVED
  • 12 RESERVED
  • 13 RESERVED
  • 14 RESERVED
  • 15 RESERVED

Assembler

The assembler is called casm and takes one argument which is the file to compile.

The code is saved in c16.out (the emulator executes c16.out).

casm is a 2 pass assembler, in the first pass it places the code, data and labels and in the second pass it writes code and the address of data and labels.

Example source code:

  ; this is a comment
start:
  inc   r3
  js    alabel
  ; r4 = 0x400
  xor   r0,r0
  loadi 1, 0x4
  mv    r4, r0
  ; r0 = string length
  loadi len(string)
  ; short range function call
  pushpc
  jmp   print
alabel:
  jmp   start

; print function
print:
  ret

; string at address 0x400
ds.0x400 string "Hello C16"

It doesn't support declaring variables, just choose addresses in RAM and write comments in the code.

It would be great to have a C compiler for this ISA.