开源仿真工具Icarus Verilog中的verilog parser

前面提到用flex和bison开处理命令行参数，回顾一下：开源仿真工具Icarus Verilog中的命令行参数处理方法。

那么Verilog的parser又是怎么实现的呢？简单地说，与做命令行参数的parser方法相同，只是规则复杂很多。

Verilog源码解析

gpref定义关键字

我们可以看到，在Icarus Verilog的根目录有以下几个文件：

lexor_keyword.gpref
lexor_keyword.h
lexor.lex
parse_api.h
parse_misc.cc
parse_misc.h
parse.y

lex我们知道是flex的输入文件，那grepf是什么呢？我们先来看一下Makefile：

main.o: main.cc version_tag.h

lexor.o: lexor.cc parse.h

parse.o: parse.cc

parse.cc: $(srcdir)/parse.y
    $(YACC) --verbose -t -p VL -d -o $@ $<
parse.h: parse.cc
    mv parse.cc.h $@ 2>/dev/null || mv parse.hh $@

lexor.cc: $(srcdir)/lexor.lex
    $(LEX) -s -t $< > $@

lexor_keyword.o: lexor_keyword.cc parse.h

lexor_keyword.cc: $(srcdir)/lexor_keyword.gperf
    gperf -o -i 7 -C -k 1-4,6,9,$$ -H keyword_hash -N check_identifier -t $(srcdir)/lexor_keyword.gperf > lexor_keyword.cc || (rm -f lexor_keyword.cc ; false)

从makefile里，我们可以看出lexor_keyword.gpref是gpref的输入文件，目标是产生了lexor_keyword.cc。

那gpref是什么？我们来学习一下。通过搜索，很容易找到了官网https://www.gnu.org/software/gpref。

GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
GNU gperf is highly customizable. There are options for generating C or C++ code, for emitting switch statements or nested ifs instead of a hash table, and for tuning the algorithm employed by gperf.

原来gpref是一个哈希功能生成器。给定一个字符串列表，产生C或C++格式的哈希表，供调用时查询。所以lexor_keyword.gpref就是这个字符串列表。下图是gpref输入文件截图，可以看到里面定义了verilog关键字、版本、类型的表。

用flex和bison来分析Verilog语法

再来看看lexor.lex和parse.y，前者是词法分析，后者是语法分析。

在词法分析lexor.lex里，解析了Verilog注释（包括注释里的synthesis translate_on/off）、操作符、运算符、各种进制的数字、timescale、UDP、always、initial、敏感列表、Verilog系统函数等，当然还有刚才gpref里定义的关键字。

在语法分析parse.y里，完成了高级的语法解析。类似把单词（关键字）组成段落（always块），段落组成文章（module）。那如何组成文章，这就是开源仿真工具Icarus Verilog的工作原理里面提到的pform。来看看pform的定义：

# include  "netlist.h"
# include  "HName.h"
# include  "named.h"
# include  "Module.h"
# include  "Statement.h"
# include  "AStatement.h"
# include  "PGate.h"
# include  "PExpr.h"
# include  "PTask.h"
# include  "PUdp.h"
# include  "PWire.h"
# include  "verinum.h"
# include  "discipline.h"
# include  <iostream>
# include  <string>
# include  <list>
# include  <memory>
# include  <cstdio>

/*
 * These classes implement the parsed form (P-form for short) of the
 * original Verilog source. the parser generates the pform for the
 * convenience of later processing steps.
 */

class PGate;
class PExpr;
class PPackage;
class PSpecPath;
class PClass;
class PPackage;
struct vlltype;

看注释可以知道pform（parsed form的简称）就是经过解析后的verilog文件。具体的来说，就是用一堆的C++语言的class来封装verilog。这些class包括基本门电路、表达式、任务、函数、UDP、wire、verilog class、generate、package、modport、scope等。这些class在源码里以大写P开头的头文件中定义。

要组成pform，顶层的module的class不能少，定义在Module.h中。

总结

经过gpref、flex、bison处理后，parse.cc的源文件就有了。实际上compile的过程主要就是调用parser的过程。

另外，从源码中我们可以看到，目前Icarus Verilog已经开始支持部分SystemVerilog的语法，特别是在gpref中已经定义了IEEE Std 1800-2005的所有关键字，包括class、interface、coverage。在实现时pform中，已经看到了对SystemVerilog class和interface的建模，但coverage还没有。