Using Verilator and other EDA tools
The EDA tools I'm using are: verilator, gtkwave and vivado
verilatoris a verilog hardware simulatorgtkwaveis signal viewer for signals generated byverilatorvivadois the Xilinx/AMD tool to compile verilog code and configure FPGAs
verilator and gtkwave are open souce and vivado is free for small FPGAs.
These tools are not well integrated together but they are good enough for small designs
and free.
Related: System with the c16 CPU and Gpu1 C16 CPU Gpu1
Install
gtkwave is available in apt:
apt-get install gtkwave
verilator is also available in apt but it is better to compile it from source and use the latest version to avoid issues.
apt-get install verilator
I compile verilator with these commands:
git clone https://github.com/verilator/verilator
sudo apt install autoconf flex help2man
cd verilator/
autoconf
./configure
make -j `nproc`
cd bin
./verilator --version
cd -
sudo make install
vivado needs a specific version of a linux distribution so I created a virtual machine with Ubuntu 24.04 LTS.
I installed vivado in my home directory. I don't update the system because after each update there is a risk vivado is not compatible with the updates, here is the compatibility table:
OS Versions 2025.1 2025.2 2026.1
Ubuntu Linux 24 24.04 LTS Yes Yes Yes
Ubuntu Linux 24 24.04.1 LTS Yes Yes Yes
Ubuntu Linux 24 24.04.2 LTS No Yes Yes
Ubuntu Linux 24 24.04.3 LTS No No Yes
I have 32GB RAM for the virtual machine which is enough for small design, when compiling
my design with vivado it takes less than 16GB RAM.
If the design is too big for the target FPGA, vivado doesn't stop, it tries to fit the design and takes more than 50GB RAM after running for 3 days.
vivado compiles small designs in less than 10 minutes using 8 cores in some steps.
Compiling verilog code
I usually compile the design with verilator and vivado because they don't issue the same errors,
when vivado generates a bit stream then the design is ok.
I simulate the verilog code with verilator by compiling it to an executable and then run the executable:
verilator --binary -j 0 -Wall -Wno-BLKSEQ --timing --trace-fst tb.v
./obj_dir/Vtb
# top is tb module
I prefer having a testbench written in verilog and the design under test is inside the testbench.
I use --trace-fst to save the signals in fst format instead of vcd. With fst, the traces are
smaller because fst is binary and compressed whereas vcd is text and not compressed.
I compile the design with vivado using batch mode:
source Xilinx/2025.1/Vivado/settings64.sh
vivado -mode batch -script flow.tcl
flow.tcl looks like this:
set outputDir ./out
file mkdir $outputDir
set_param general.maxThreads 8
read_verilog systop.v
read_xdc io.xdc
synth_design -top systop -part xc7a100tcsg324-2
write_checkpoint -force $outputDir/post_synth.dcp
report_utilization -file $outputDir/synth_report.txt
opt_design
place_design
write_checkpoint -force $outputDir/post_place.dcp
route_design
write_checkpoint -force $outputDir/post_route.dcp
write_bitstream -force $outputDir/stream.bit
quit
The compilation result is ./out/stream.bit which should be uploaded to the FPGA.
io.xdc looks like:
set_property BITSTREAM.GENERAL.COMPRESS True [current_design]
set_property -dict { PACKAGE_PIN E3 IOSTANDARD LVCMOS33 } [get_ports { clk }];
#100mhz
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports { clk }];
set_property -dict { PACKAGE_PIN K1 IOSTANDARD LVCMOS33 } [get_ports { rst }];
Use latest verilator from git repo
When I was using verilator 5.006 from apt, I encountered an error initializing an array with this code:
reg [15:0] mem[0:127];
always @ (posedge clk) begin
if (rst) begin
integer i;
for (i = 0 ; i < 128 ; i = i+1) begin
mem[i] <= 'hD800;
end
end
end
%Error-BLKLOOPINIT: fetchmem.v:25:16: Unsupported: Delayed assignment to array inside for loops (non-delayed is ok - see docs)
25 | mem[i] <= 'hD800;
| ^~
fetchmem_tb.v:2:1: ... note: In file included from 'fetchmem_tb.v'
... For error description see https://verilator.org/warn/BLKLOOPINIT?v=5.020
%Error: Exiting due to 1 error(s)
... See the manual at https://verilator.org/verilator_doc.html for more assistance.
This error is fixed in verilator 5.032 2025-01-01.
Writing verilog code
Unused signals have to used somehow:
Signal is not used: 'rdata1'
: ... note: In instance 'tb'
11 | reg [63:0] rdata1;
| ^~~~~~
... For warning description see https://verilator.org/warn/UNUSEDSIGNAL?v=5.020
... Use "/* verilator lint_off UNUSEDSIGNAL */" and lint_on around source to disable this message.
At the end of the tb file, add:
wire _unused_ok = &{1'b0,
rdata1,
1'b0};
The output type from modules have to be type wire, vivado requires it.
regs have to be declared outside always blocks, vivado requires it
.
I use the -Wno-BLKSEQ option for verilator because I'm ok with this:
https://verilator.org/guide/latest/warnings.html
BLKSEQ
This indicates that a blocking assignment (=) is used in a sequential block. Generally, non-blocking/delayed assignments (<=) are used in sequential blocks, to avoid the possibility of simulator races. It can be reasonable to do this if the generated signal is used ONLY later in the same block; however, this style is generally discouraged as it is error prone.
Big arrays take a lot of resources in the FPGA, they should be replaced with a RAM interface and they will be mapped to block RAM.
vivado doesn't support fopen and fread to load data, use $readmemh("data.mem", mem);
instead. data.mem is a text file with data in hexadecimal for one memory address per line.
I convert binary files to mem text file with hexdump:
hexdump -v -e '1/2 "%04x\n"' c16.out > c16.mem
blog post about converting binary files
vivado maps modules like this to block ram:
module blockram #(parameter WIDTH = 32, parameter ADDR_SIZE = 10/*bits*/)(
input clk,
input [ADDR_SIZE-1:0] addr,
input [WIDTH-1:0] wdata,
input wen,
output reg [WIDTH-1:0] rdata
);
reg [WIDTH-1:0] mem[0:(1<<ADDR_SIZE)-1];
initial begin
$readmemh("data.mem", mem);
end
// When there is a continuous assign like these:
// assign rdata = mem[addr];
// always @*
// rdata = mem[addr];
//
// Vivado doesn't use block ram to model the module
// It uses LUTs
always @ (posedge clk) begin
if (wen) begin
mem[addr] <= wdata;
rdata <= wdata;
end
else begin
rdata = mem[addr];
end
end
endmodule
The data is read 1 clock cycle after the address.