# Timing-Abstract Circuit Design in Transaction-Level Verilog Steve Hoover Founder, Redwood EDA steve.hoover@redwoodeda.com ## Agenda - Motivation - The complexity crisis - How we manage complexity today - What's not working - Timing-Abstract Design in TL-Verilog - Results # Verilog ### Verilog was born of a different era... | Year | Processor | Clock | Transistors | HDL | IDE | |------|-----------|-------------------|---------------------|------------------------|----------------| | 1985 | i386 | 33MHz | 275K | Verilog<br>(to verify) | Emacs/<br>vi | | 2017 | AMD Epyc | 3.0GHz<br>(~100x) | 19.2B<br>(>70,000x) | Verilog | XEmacs/<br>Vim | We can't continue designing this way! ## SoC Methodology Manage complexity through modularity and reuse of IP building blocks. - IP utilized in different contexts, with different constraints for: - area - power - performance - test/debug infrastructure - clock frequency - RTL expresses *an* implementation, with particular constraints RTL is <u>not</u> good for IP! ## High-Level Synthesis Design algorithm-level and let tools generate RTL, under given physical constraints. - **Fantastic** for <u>some</u> designs. - For others, SystemC becomes RTL. Many designs require RTL details! ## The Need We need to be able to model cycle-level interactions in a way that is easier to manage. ## A Simple Pipeline - Let's compute Pythagoras's Theorem in hardware. - We distribute the calculation over three cycles. ## A Simple Pipeline - Timing-Abstract #### RTL: #### Timing-abstract: → Flip-flops and staged signals are implied from context. ## A Simple Pipeline - TL-Verilog #### TL-Verilog ``` |calc @1 $aa_sq[31:0] = $aa * $aa; $bb_sq[31:0] = $bb * $bb; @2 $cc_sq[31:0] = $aa_sq + $bb_sq; @3 $cc[31:0] = sqrt($cc_sq); ``` ## SystemVerilog vs. TL-Verilog #### **TL-Verilog** ``` // Calc Pipeline logic [31:0] a C1; logic [31:0] b C1; logic [31:0] a sq C1, a sq C2; logic [31:0] b sq C1, b sq C2; logic [31:0] c sq C2, c sq C3; logic [31:0] c C3; always ff @(posedge clk) a sq C2 <= a sq C1; always ff @(posedge clk) b sq C2 <= b sq C1; always ff @(posedge clk) c sq C3 <= c sq C2; // Stage 1 assign a sq C1 = a C1 * a C1; assign b sq C1 = b C1 * b C1; // Stage 2 assign c sq C2 = a sq C2 + b sq C2; // Stage 3 assign c C3 = sqrt(c_sq_C3); ``` ## Retiming -- Easy and Safe ``` |calc @1 $aa_sq[31:0] = $aa * $aa; $bb_sq[31:0] = $bb * $bb; @2 $cc_sq[31:0] = $aa_sq + $bb_sq; @3 $cc[31:0] = sqrt($cc_sq); ``` Staging is a physical attribute. No impact to behavior. ## Retiming in SystemVerilog ``` // Calc Pipeline logic [31:0] a C1; logic [31:0] b C1; logic [31:0] a sq CO, a sq C1, a sq C2; logic [31:0] b sq C1, b sq C2; logic [31:0] c sq C2, c sq C3, c sq C4; logic [31:0] c C3; always ff @(posedge clk) a sq C2 <= a sq C1; always ff @(posedge clk) b sq C2 <= b sq C1; always ff @(posedge clk) c sq C3 <= c sq C2; always ff @ (posedge clk) c sq C4 <= c sq C3; // Stage 1 assign a sq C1 = a C1 * a C1; assign b sq C1 = b C1 * b C1; // Stage 2 assign c sq C2 = a sq C2 + b sq C2; // Stage 3 assign c C3 = sqrt(c sq C3); ``` VERY BUG-PRONE! ## Operand Mux ## Operand Mux Retimed ## Code Size Results from Industry Examples ## Benefits of TL-Verilog Less code, fewer bugs! Less code *change*, fewer bugs! ## Typically: - ½ the code - $\sim$ 1/4 the change for reuse - ~<sup>1</sup>/<sub>6</sub> the code for HLM In certain real-world cases: - 1/200 the code change! ## More to TL-Verilog - Hierarchy - State - Validity - Clock gating - Transactions! (and more in proposal phase) ## makerchip.com # Be a part of it! #### Reach out to me at: steve.hoover@redwoodeda.com #### Learn more at: makerchip.com