A Simple C To Verilog Compilation Procedure For Hardware .

3m ago
288.14 KB
8 Pages

A Simple C to Verilog Compilation Procedure forHardware/Software VerificationJiang Long and Robert BraytonBerkeley Verification and Synthesis Research Center (BVSRC)Department of EECSUniversity of California at Berkeley{jlong, brayton}@eecs.berkeley.eduABSTRACTThe objective of this work is two-fold: (1) to build a simple trusted translator from C programs to a hardware description language (in this case Verilog) and (2) to illustrateits application to the formal verification of hardware andsoftware systems using highly developed hardware modelchecking methods. To achieve the first goal, we used theLLVM compiler infrastructure to compile the C programinto LLVM bytecode, and used a simple and straightforwardmethod to translate this into a Verilog circuit. The currentimplementation is able to handle arbitrary loops and staticarray access. In the experimental section, equivalence checking was done comparing OpenCores VHDL hardware modelsof FPUs against C-coded SoftFloat versions. This revealedseveral bugs in the OpenCores versions. For model checkingsafety properties, we experimented with bit-vector benchmarks from the 2015 Software Verification Competition andchecked SVA assertions derived from VERIFIER error callsin the C program. This also revealed discrepancies.be trusted. We use the LLVM compiler infrastructure and astraight-forward translation of its produced LLVM bytecodeto create a Verilog program from a single thread C program.At this point, an equivalence-checking Verilog (miter) modelcan be created using the actual hardware model. Figure 1shows the high-level C-to-Verilog translation flow and howit fits into a C-to-RTL equivalence checking methodologyusing hardware verification. Such a flow can also be usedin the software verification of C programs where assertionsare translated into SVA and compiled into logic using, forexample Verific[5].Verilog ModelC ModelLLVM BytecodeTrace replayand debugVerilogModel¿C1.INTRODUCTIONC RTL ?There has been an increasing interest in reasoning about andproving properties of C software programs. In the hardwaredomain, there is an increasing interest in using C as a hardware specification language. While the actual hardware istypically described in Verilog and synthesized from that, theC model represents the golden model which is used to compare behaviors. In both cases, a highly trusted translationfrom C to Verilog is desired.In this work, we focus on creating a translation tool thatis as simple as possible using trusted and well establishedintermediate tools to accomplish the task. We eschew optimization which can add complications and therefore increased potential for introducing bugs. This allows existinghardware formal verification flows such as equivalence andproperty checking of safety and liveness properties to be applied readily, with high confidence that the translation process has not introduced bugs and therefore the results canSequential equivalencecheckingFigure 1: C vs RTL equivalence checking1.1MotivationA language front-end to compile C programs into finitestate-machines can be a complex and delicate matter. Therehas always been concerns that the compilation proceduremay incorrectly reproduce the original C program semantics. This is a real concern as illustrated in the C programin Figure 2. It is one of the test cases in the Bit-Vectorcategory of the 2015 Software Verification Competition [8].In this category, only integer arithmetic and control flowstructures in the C language are allowed.This test is officially classified as “unreach”, i.e. the assertion is considered proven true by the competition committee. It is one of the initial regression tests imposed on anyentry in the competition. However, the assertion fails whenn 65536, in which case, n (n 1) overflows while theaddition inside the loop does not. Four, [20][25][28][27], out99

extern unsigned int VERIFIER nondet uint();int main() {unsigned int i, n VERIFIER nondet uint(), sn 0;for(i 0; i n; i ) {sn sn i;}VERIFIER assert(sn (n*(n 1))/2 sn 0);}Figure 2: sum02 ker[10]Beagle 1.1[27]Cascade 2.0[28]Seahorn[16]Ultimate Automizer [18]Ultimate ggyBuggyBuggyBuggyBuggy#Errors11111 1 1 1Table 1: 2015 Software Verification Competition:Bit-Vector categoryof eight of the competition participants produced the incorrect true result (Participants ([20][25] were recognized as thetop two winners in this category). The other four participants, [16] [15][18][10], had one or more incorrect results onother test cases. Thus none of participants was one hundredpercent correct.It is not known where the bugs got introduced into the various tool flows, but for any such tool, the very first step isto compile the C program into an internal representation ofthe C execution semantics. To many, this is a complex andtime-consuming, yet preliminary step.We believe simplicity results in fewer bugs. In this work, wefocus on a procedure to translate C to a circuit model thatis simple and less error-prone by construction. Our simple/trusted C-to-Verilog compilation should take the translation procedure out of the verification equation, and reliably bridge the gap between software and hardware. This allows existing hardware synthesis and verification techniquesto be applied reliably to the software domain.1.2Why VerilogWe chose Verilog as the intermediate representation for thecircuit model for the following reasons: It is almost the de facto standard for hardware description from which hardware is synthesized. Verilog’s always ff and always comb blocks can be usedto express a finite state machine model as sequentialand combination logic respectively. Verilog is defined through IEEE standards. The welldefined syntax and semantics make it easier to mapthe C semantics into Verilog constructs. Synthesis and verification tool flows for Verilog can beutilized readily after the translation.100 Validation of the translation procedure can be conducted by simulating the Verilog model and comparing against the C-code program using random inputvectors. In our case, we use Verilator[26] to compilethe generated Verilog code back to C . Verilog has enough constructs to capture the originalC constructs, which is important to retain the controlflow and word-level operators after the translation.In Verilog, computation resources must be statically resolvable at synthesis time. Thus, Verilog does not support nonconstant loop controls and infinite loops at the languagelevel. In contrast, C programs support more flexible control flows with non-constant loop control variables and dynamic resource management such as memory allocation/deallocation, run-time function call-stacks and run-time parallel threads. In this work, we limit the scope of the controlflow structure of C to single-thread C programs with arbitrary control flows but statically resolvable resources.1.3Contributions1. We document a simple procedure, based on LLVM, bywhich a subset of C (dynamic memory management,recursive functions and parallelism are excluded), canbe mapped into a Verilog module using the always ffand always comb blocks.2. We provide experiments for C-to-RTL equivalence checking and software property checking, showing that sucha flow allows existing hardware verification tool chainsto be used, producing promising results.1.4OrganizationIn Section 2, we review the circuit computation model, thelanguage characteristics of C, the LLVM[21] Bytecode, andVerilog. In Section 3, the overall tool flow and translationprocedure is described. Experimental results are presentedin Section 4 followed by discussions of related work in Section 5 and conclusions in Section 6.2. BACKGROUND2.1 Logic network and the CircuitA logic network N is a directed graph (V, E) with node setV and edge set E. Each node is labeled with a node typeand a width. A node type is one of {input, output, operator, flip-flop}. An input node has zero incoming edges; anoutput has zero outgoing edges, and a flip-flop has a singleincoming edge. An operator node is an arithmetic functionof its immediate fanin nodes.outputinputCombologicFFclkFigure 3: A single-clock synchronous circuit

Such a logic network can express the functionality of a synchronous sequential circuit shown in Figure 3, which contains inputs, outputs, combinational logic, flip-flops and asingle clock input.It has the following characteristics: The logic network topology is statically determinedand does not change during evaluation of the network. The circuit evaluates at every clock cycle and infinitelyover time.The goal is to map a C program into a logic network suchthat an evaluation of it produces the same input/outputbehavior as the original C program.2.2In Section 3, we demonstrate a method for mapping a Cprogram into a circuit model if the program does not haveany of the dynamic features, namely, dynamic memories,pthreads and recursive functions. Mapping into Verilog thenbecomes almost verbatim.Verilog HDLVerilog is an almost de facto standard used by industryfor specifying hardware designs. Table 2 summarizes itslanguage constructs. The always ff block is used to describe sequential logic where flip-flops are inferred, whilealways comb is for combinational logic. According to Verilog’s event-driven semantics, for the circuit in Figure 3, thealways blocks are evaluated every clock cycle, and the computation is infinite over time.Language elementsvariablescontrol flowsequential blockcombo blockoperatorsVerilog Constructswire /regif/else/casealways ff @(posedge clk)always comb , , , /, &, , etc.Table 2: Verilog language elements2.3resource allocation involved in their use, and thus not a fundamental barrier in translating to a circuit model. On theother hand, run-time function calls dynamically allocate resources for the stack space to hold local variables. Nestednon-recursive function calls can be eliminated at compiletime by function inlining. Dynamic threads also require runtime acquisition of local variable space and operator nodes.Although the behavior of loops can vary at run time, thecomputational resources for variable space can be staticallydetermined during compilation. Therefore if the resourcesrequired for a C program to execute can be statically determined, a corresponding circuit model for it exists.The C Programming LanguageThe C language has two major features. One is the control flow structure that defines the order of computation;the other is resource allocation, which can be dynamic atrun-time. A C program can be viewed as a circuit-like computation model with dynamic resource allocation. Conceptually, resources refer to those computation elements thateither hold the value of a variable, or implement some arithmetic function. Static resources are allocated at compiletime, while dynamic resources can be acquired and releasedduring run-time. Table 3 lists the major C features and theircategory, static or dynamic.Language Elementsvariablecontrol flowmemoryparallelismcall stackpointersC pthreadfunction callspointer caliasing2.4LLVM BytecodeC-to-Verilog is done in two steps, the first using the LLVMcompiler infrastructure to create a LLVM bytecode representation. The second is an almost verbatim translationfrom this to Verilog. The LLVM bytecode has a static single assignment (SSA) [14] form, in which each variable isassigned exactly once and is defined before it is used. InSSA form, use-def chains are explicit and each contains asingle element. A use is a location where a variable value isreferenced; while a def is the location where a variable valueis assigned. A use-def chain is used for data-flow analysis.A basic block is defined as a maximum sequence of instructions that has no branching or return statement, exceptthe last one. Such a SSA program is viewed as a controlflow graph (CFG) with basic-blocks as nodes, while edgesbetween basic-blocks are derived from branching/return instructions.int factorial(int n ) {int ret 1;for ( int i 1;i n;i )ret ret * n;return ret;}Figure 4: C functionFigure 5 is the control flow graph derived from compilingthe C function in Figure 4. The top block labeled entryis the starting point of the function execution. The f orloop in Figure 4 is converted to a loop of three basic blocksin Figure 5, labeled: f or.cond, f or.body, and f or.inc. Allloop constructs, f or, while, do while, are rewritten usingbranching instructions for uniform treatment. Such a LLVMbytecode program has the following characteristics:Table 3: C language elements Every symbol in the bytecode instructions is of anLLVM Value type, referred to as an SSA variable inthe rest of the paper.Pointers are simply aliases for objects in the C program.Although they can be complex to reason about, there is no Each basic-block is uniquely identified by an LLVMLabel, which is also an LLVM Value type.101

entry:br label %for.cond3.1for.cond:%ret.0 phi i32 [ 1, %entry ], [ %mul, %for.inc ]%i.0 phi i32 [ 1, %entry ], [ %inc, %for.inc ]%cmp icmp sle i32 %i.0, %nbr i1 %cmp, label %for.body, label %for.endTfor.body:%mul mul nsw i32 %ret.0, %nbr label %for.incFfor.end:ret i32 %ret.0for.inc:%inc add nsw i32 %i.0, 1br label %for.condLoop Free BytecodeFor loop free Bytecode, the CFG is acyclic. An evaluation ofthe function annotates the CFG graph with values at eachnode. This is the same as evaluating each node in topologicalorder from the entry block. Such a CFG can be viewed asa logic network, which maps directly into a Verilog moduleconsisting of a single always comb block.Figure 6 shows the skeleton of such a Verilog module, whichimplements the function y f (x). For each SSA variable, aVerilog bit-vector is declared, and for each LLVM Label, asingle bit is declared. For example, the 32-bit integer %re