My system with the c16 CPU and Gpu1
My system has 2 devices: the 32bit c16 CPU and Gpu1 and I call it the nova system.
The c16 is connected to Gpu1 with a 32 bit bus (point to point) and there is an interrupt line connected from the GPU to the CPU.
The GPU triggers an interrupt (interrupt 8 in the CPU) at the end of each frame, 60 times per second.
┏━━━━━━━━━┓ ┏━━━━━━┓ ┏━━━━━━┓
Buttons━━━▶┃ c16 CPU ┃◀━━━▶┃ Gpu1 ┃━━━▶┃ LCD ┃
┃ RAM ┃ 32b ┃ RAM ┃ 18b┃ ┃
┗━━━━━━━━━┛ ┗━━━━━━┛ ┗━━━━━━┛
Address map from the CPU, starting from higher addresses:
- 0xFFFFFFF8 CPU flags
- 0xFFFFFFF0 pc
- 0xFFFFFFE8 rsp return address sp
- 0xFFFFFFE0 dsp data address sp
- 0xFFFFFFD8 itb interrupt table address
- 0xFFFFFFD0 RESERVED
- 0xFFFFFFC8 clkcnt clock counter
- 0xFFFFFFC0 icnt instruction counter
- 0xE8000000 GPU default color, 128 sprites, 16 palettes, sprite pixel data
- 0xE0060060 division check device in testbench (simulator only)
- 0xE0060040 simulation controler
- 0xE0060000 button states
- bit 11: a
- bit 10: b
- bit 9: x
- bit 8: y
- bit 7: up
- bit 6: down
- bit 5: left
- bit 4: right
- bit 3: l
- bit 2: r
- bit 1: start
- bit 0: select
- 0xE0000010 Print to stdout (emulator only)
- 0xE0000000 Boot ROM (emulator only)
- 0x00000000 RAM
The system runs at 100mhz in an ARTIX7 FPGA, 512KB RAM is divided between the CPU and the GPU and the LCD resolution is 720x480 pixels.
I created an emulator of this system, it uses SDL to display the images from the GPU.
There are 3 demos, these demos have sprites on the 4 planes. The first one is stars scrolling vertically.

In the second demo, the TREX from the offline chrome game is animated on screen.

The third demo is the word NOVA and a big star scrolling down.

Each sprite has 2 colors and there are 16 palettes, so the GPU can display 32 colors simultaneously.
Programing this system
In the hardware implementation, the CPU boots at address 0x0 and in the emulator, it boots at address 0xE0000000 and the boot rom jumps to address 0x0.
Usually my programs start with a jump instruction and the interrupt handler for the GPU interrupt is at address 0x2.
jmp programStart
; HW INTERRUPT int 8
iret
; END HW INTERRUPT
programStart:
Then the stacks are setup:
; setup rsp
xor r0, r0
dec r0
loadi 0, 0xe8
mv r1, r0
; rsp 0x1e00
xor r0, r0
loadi 1, 0x1e
store r1, r0
; setup dsp
; dsp 0x1d00
xor r0, r0
loadi 1, 0x1d
sdsp r0
; end setup rsp and dsp
Then, the interrupt handler is setup in the interrupt table:
; setup address for interrupt table (itb) and hw interrupt (int 8)
; setup itb
mv r0, r1
loadi 0, 0xd8
mv r1,r0
; itb 0x1f00
xor r0, r0
loadi 1, 0x1f
store r1, r0
; setup int 8: address 8*4 = 0x20
; int 8 has address 0x2
loadi 0, 0x20
mv r1, r0
xor r0, r0
loadi 0, 0x2
store r1, r0
; end setup hw interrupt
- 0x1F00 interrupt table, int 8 address 0x1F20, int 8 handler is at address 0x2
- 0x1E00 return stack (64 call depth)
- 0x1D00 data stack (256 bytes)
In verilator simulations and in the emulator, the simulations are stopped by writting anything to the simulation controler (address 0xE0060040):
; stop simulation
xor r0, r0
loadi 3, 0xE0
loadi 2, 0x6
loadi 0, 0x40
mv r5, r0
xor r0, r0
loadi 0, 0x1
store r5, r0