My system has 2 devices: the 32bit c16 CPU and Gpu1 and I call it the nova system.

C16 CPU Gpu1

The c16 is connected to Gpu1 with a 32 bit bus (point to point) and there is an interrupt line connected from the GPU to the CPU.

The GPU triggers an interrupt (interrupt 8 in the CPU) at the end of each frame, 60 times per second.

           ┏━━━━━━━━━┓     ┏━━━━━━┓    ┏━━━━━━┓
Buttons━━━▶┃ c16 CPU ┃◀━━━▶┃ Gpu1 ┃━━━▶┃ LCD  ┃
           ┃ RAM     ┃ 32b ┃ RAM  ┃ 18b┃      ┃
           ┗━━━━━━━━━┛     ┗━━━━━━┛    ┗━━━━━━┛

Address map from the CPU, starting from higher addresses:

  • 0xFFFFFFF8 CPU flags
  • 0xFFFFFFF0 pc
  • 0xFFFFFFE8 rsp return address sp
  • 0xFFFFFFE0 dsp data address sp
  • 0xFFFFFFD8 itb interrupt table address
  • 0xFFFFFFD0 RESERVED
  • 0xFFFFFFC8 clkcnt clock counter
  • 0xFFFFFFC0 icnt instruction counter
  • 0xE8000000 GPU default color, 128 sprites, 16 palettes, sprite pixel data
  • 0xE0060060 division check device in testbench (simulator only)
  • 0xE0060040 simulation controler
  • 0xE0060000 button states
  • bit 11: a
  • bit 10: b
  • bit 9: x
  • bit 8: y
  • bit 7: up
  • bit 6: down
  • bit 5: left
  • bit 4: right
  • bit 3: l
  • bit 2: r
  • bit 1: start
  • bit 0: select
  • 0xE0000010 Print to stdout (emulator only)
  • 0xE0000000 Boot ROM (emulator only)
  • 0x00000000 RAM

The system runs at 100mhz in an ARTIX7 FPGA, 512KB RAM is divided between the CPU and the GPU and the LCD resolution is 720x480 pixels.

I created an emulator of this system, it uses SDL to display the images from the GPU.

There are 3 demos, these demos have sprites on the 4 planes. The first one is stars scrolling vertically. Stars demo

In the second demo, the TREX from the offline chrome game is animated on screen. Trex demo

The third demo is the word NOVA and a big star scrolling down. Nova demo

Each sprite has 2 colors and there are 16 palettes, so the GPU can display 32 colors simultaneously.

Programing this system

In the hardware implementation, the CPU boots at address 0x0 and in the emulator, it boots at address 0xE0000000 and the boot rom jumps to address 0x0.

Usually my programs start with a jump instruction and the interrupt handler for the GPU interrupt is at address 0x2.

  jmp programStart

; HW INTERRUPT int 8
  iret
; END HW INTERRUPT

programStart:

Then the stacks are setup:

  ; setup rsp
  xor   r0, r0
  dec   r0
  loadi 0, 0xe8
  mv    r1, r0
  ; rsp 0x1e00
  xor   r0, r0
  loadi 1, 0x1e
  store r1, r0
  ; setup dsp
  ; dsp 0x1d00
  xor   r0, r0
  loadi 1, 0x1d
  sdsp  r0
  ; end setup rsp and dsp

Then, the interrupt handler is setup in the interrupt table:

  ; setup address for interrupt table (itb) and hw interrupt (int 8)
  ; setup itb
  mv    r0, r1
  loadi 0, 0xd8
  mv    r1,r0
  ; itb 0x1f00
  xor   r0, r0
  loadi 1, 0x1f
  store r1, r0
  ; setup int 8: address 8*4 = 0x20
  ; int 8 has address 0x2
  loadi 0, 0x20
  mv    r1, r0
  xor   r0, r0
  loadi 0, 0x2
  store r1, r0
  ; end setup hw interrupt
  • 0x1F00 interrupt table, int 8 address 0x1F20, int 8 handler is at address 0x2
  • 0x1E00 return stack (64 call depth)
  • 0x1D00 data stack (256 bytes)

In verilator simulations and in the emulator, the simulations are stopped by writting anything to the simulation controler (address 0xE0060040):

  ; stop simulation
  xor   r0, r0
  loadi 3, 0xE0
  loadi 2, 0x6
  loadi 0, 0x40
  mv    r5, r0
  xor   r0, r0
  loadi 0, 0x1
  store r5, r0