

Improving RTL quality and reducing Backend cycles by using Synopsys RTL-Architect at the Chip top and Subsystem

Micro-Architecture Exploration and Refinement for Optimal PPA

Mohammad Javed(ST Microelectronics)Penugonda MAHESH(ST Microelectronics)Ashi Goyal(Synopsys)



# **Problem Statement**

Design Challenge



| FE                                           |                                            | BE                          |                      | Design Chall                   | enges                                                      |
|----------------------------------------------|--------------------------------------------|-----------------------------|----------------------|--------------------------------|------------------------------------------------------------|
| Architecture                                 | Primary                                    | PPA                         |                      | Target Frequency               | > 1Ghz                                                     |
| Exploration<br>Synopsys RTL-<br>Architect    | Exploration<br>Targets                     | Synopsys Fusion<br>Compiler | Performance          | Low Memory Latency             | 0/1 Wait State > 1 Ghz<br>Clock                            |
| Wait State                                   | – 0/1 Wait State –                         | Wait State                  |                      | Level 2 Cache                  | To reduce miss latency of DDR4 access                      |
| Performance                                  | > 1.0 Ghz                                  | Performance                 | Functional<br>Safety | ASIL-D/ASIL-B<br>Support       | Replication and well as ECC/Parity support                 |
| Memory Selection<br>(access time/power/area) | Low Latency<br>Low Power                   |                             | Power                | Power-Budget of<br>500mW       | Select Low-Power SRAM<br>put Clock-Gating                  |
| Area<br>(Relative comparison)                | x.yz mm²                                   | Area<br>(Final)             | Area                 | x.yzum2                        | Area Optimized SRAM<br>Selection                           |
| Power<br>(Relative comparison)               | Xyz mw<br>@0.825v_FF_125c                  | Leakage Power               | Security             | Native<br>Encryption/Decryptio | For inter-chip communication and                           |
|                                              | Leakage Profiling<br>all frequency Targets | Leakage Profiling           | Technology           | n<br>TSMC-N7                   | Safety Architecture for<br>TSMC Memories and<br>validation |

# Traditional Flow $\rightarrow$ Recommended Flow





Iterative, Risking Schedule & Quality..



### Predictable, Convergent & Scalable !!

# Synopsys RTL-Architect Features





# Subsystem Overview

• 10 Real Time Core

6

- 5XLockStep / 4XLockstep + 2-Split
- Clock Frequency more than 1.0 Ghz
- Low Latency SRAM.
- Level 2 Cache.
- Dedicated AES Engine.
- Process/Clock Monitors
- Shared SRAM for Real-Time Code.
- Functional Safety



## Real Time Subsystem

Challenges and Opportunity



- SoC Partitioning to Realizable Hard Macros for parallel execution and reduce TAT
- Shift-Left Methodology to perform the essential design implementation steps early
- Parallel Exploration for various design parameters and updated design options
- Power Analysis using RTL-Architect and Prime Power for Static and Dynamic Power

## **Activities Performed**





Memory Selection and Validation – **Done using Ad hoc Excel** 



Micro-Architecture Refinement of Interconnect and Other Critical IP(s) – **More than 50 iterations Tried** 



Fusion Compiler/RTL-Architect File-Set and Design Constraint Validation – **No Issue reported from BE** 



Timing Constraint and Exception Validation – **No** Issue at BE for Constraints Issues



Hold Fix analysis with Latch Insertion on critical hold path(s)



Internal IP Design update for Hold Fix.

\*

## Result...

## FE PPA Phase: 16 WW

| Design Version                   | TNS<br>(ns) | WNS<br>(ns)     | NUM   | Memory<br>Leakag<br>e* (mW) | Total<br>AREA<br>(um2) | Macro<br>Area<br>(um2) | Low<br>VT<br>Usage |
|----------------------------------|-------------|-----------------|-------|-----------------------------|------------------------|------------------------|--------------------|
| v01.05-1200<br>2022-WW47-1,x Ghz | -4020       | -0. <u>7</u> 07 | 29486 | 393                         | 2.57                   | 2.28                   | 4.7%               |
| v01.06-1200<br>2023-WW12-1. Ghz  | -126        | -0.174          | 4609  | 245                         | 2.65                   | 2.29                   | 4.6%               |
| v01.06-1000<br>2023-WW12-1.x Ghz | -0.04       | -0.04           | 1     | 245                         | 2.62                   | 2.29                   | 4.1%               |

Implementation Phase(Fusion Compiler): Status

- The BE implementation is Ongoing with target frequency more than 1 Ghz
- There is no show-stopper found in BE for the Real Time Cluster that need major design update.



| Clusters          | Placeable<br>Instances | Memory<br>Leakage*<br>(mW) | Total<br>AREA<br>(um2) | Macro<br>Area<br>(um2) | Low VT<br>Usage |
|-------------------|------------------------|----------------------------|------------------------|------------------------|-----------------|
| Cluster0(1.x Ghz) | 4.5M                   | 294                        | 2.63                   | 1.98                   | 4.8%            |
| Cluster1(1.x Ghz) | 4.5M                   | 294                        | 2.61                   | 1.98                   | 4.8%            |
| Cluster2(1.x Ghz) | 3.5M                   | 294                        | 2.50                   | 1.98                   | 4.74%           |
| Cluster3(1.x Ghz) | 1.2M                   | 325                        | 4.08                   | 3.92                   | 4.90%           |

#### Leakage power are from .libs and without uplift factor



# <sup>10</sup>Correlation(Fusion Compiler : RTL-Architect) at PPA Phase

#### RTL-Architect Max Timing Summary[1 violations]

|     | Total | Reg->Reg | In->Reg | Reg->Out | In->Out |
|-----|-------|----------|---------|----------|---------|
| WNS | -0.04 | -0.04    | 0.00    | 0.00     | 0.00    |
| TNS | -0.04 | -0.04    | 0.00    | 0.00     | 0.00    |
| NUM | 1     | 1        | 0       | 0        | 0       |

**RTL-Architect Runtime is almost 3-time faster than** Fusion Compiler init opto and 5-times faster than **Fusion Compiler final opto** 



Runtime(h)

life.augmented



Fusion Compiler Init Opto Max Timing Summary[98 violations]

|     | Total | Reg->Reg | In->Reg | Reg->Out | In->Out |
|-----|-------|----------|---------|----------|---------|
| WNS | -0.14 | -0.08    | -0.14   | -0.06    | 0.00    |
| TNS | -1.45 | -0.93    | -0.41   | -0.11    | 0.00    |
| NUM | 98    | 84       | 12      | 2        | 0       |

#### Fusion Compiler Final Opto Max Timing Summary[211 violations]

|     | Total | Reg->Reg | In->Reg | Reg->Out | In->Out |
|-----|-------|----------|---------|----------|---------|
| WNS | -0.12 | -0.12    | -0.07   | -0.09    | 0.00    |
| TNS | -7.41 | -6.82    | -0.43   | -0.16    | 0.00    |
| NUM | 211   | 183      | 25      | 3        | 0       |

Timing Summary[RTL-Architect-Fusion Compiler Final Opto] (211 violations)

| Range        | Num Violations |
|--------------|----------------|
| -0.01 - 0.00 | 121            |
| -0.1200.010  | 90             |

Out of 211 violations only 90 violations are between -0.120 ns to -0.010ns, 121 are less than 0.010 ns

## Correlation(Fusion Compiler: RTL-Architect) at PPA Phase snug life.augmented 85% endpoints within 5% of Fusion Total Area: ~5% initial\_opto endpoint slack Area Timing Run Time 3 x Fusion Compiler init\_opto 5 x Fusion Compiler final\_opto Congestion Power **Total Power** ~10% Similar Congestion **Individual Component** and Placement (Leakage, interna, Hierarchy

switching, Glitch) : ~15%







## Conclusion



Synopsys RTL-Architect Tool provides key technology to designer for improving RTL quality and other collaterals for BE implementation.





•Adopting the RTL-Architect tool has improved the design process by identifying challenges early and providing solutions, resulting in streamlined design processes, improved efficiency, and timely delivery of high-quality design collaterals.

•Additionally, the RTL-Architect timing/area/power results are well correlated with the Fusion Compiler, indicating the tool's effectiveness in improving the design process.



# THANK YOU

Our Technology, Your Innovation<sup>™</sup>