

## PrimePower : Smart Pruner & Smart Partition of Concurrent-CAPP

MediaTek : Angel Wang, Yu-Shuan Liao Synopsys : Marty Huang

# Agenda



- Background
- Power Profiling by CAPP to find Critical Window
- Pain Point
- Concurrent-CAPP Engine in PrimePower
- Smart Pruner on Concurrent-CAPP by Distribution System
- Smart Partition + Smart Pruner on Concurrent-CAPP by Distribution System
- The Power Profile
- Conclusion
- Future work

# Background



### **Power Sign-Off: A Critical Step in Chip Design**

As chip designs become more complex and power-hungry, ensuring reliable operation within specified power budgets is a paramount concern. Power sign-off is a crucial step in the design flow, where the peak power consumption of the chip is analyzed and verified against the available power delivery capabilities.

Finding the Peak Power Window:

- Identifies the most power-intensive operating scenarios
- Accounts for simultaneous switching of multiple blocks
- Considers impact of power-aware techniques
- Enables accurate analysis of power grid integrity

Accurate peak power analysis is essential for:

- Robust power delivery network design
- Reliable operation within thermal constraints
- Meeting power budgets and energy efficiency targets

# Power Profiling by CAPP to find Critical Window



- With the input RTL waveform and netlist, the PrimePower CAPP Distribution flow can estimate power consumption and generate a power profile.
- The power profile helps users identify the Critical Window within the pattern.



# Problem Statement: Longer Pattern, Longer Run Time



- The RTL Patterns used to identify critical windows are usually very long.
- The **Design Scale** recently involves more than one million instances.
- The large amounts of input data causes the CAPP analysis to take a long time to complete.



# Concurrent-CAPP Engine in PrimePower Elite



### Old CAPP





#### New orchestration of the Power Signoff Engines

- Multi-cycle concurrent event propagation
- Concurrent propagation and power computation
- No change in accuracy
- Support Distributed Power Analysis
- Glitch support : Planned for 2024.09





snuc



## The Performance improvement of Concurrent-CAPP + Smart Pruner

- According to the run time comparison between old and new method, the run time has decreased over 12X since using Concurrent-CAPP+ Smart Pruner.
- Upon closer examination of the detailed partition run time, there was room for improvement.

#### Run time of old/new method



#### Run time of each partition in new method



Even though the runtime had improved by 12X, there is still room for further enhancement, especially considering the imbalance in runtime across the 10 partitions, with partition9 being dominant.

snuc

## The Activity Statistics of the Partitions

Activity

snug



# Smart Partition + Smart Pruner on Concurrent-CAPP by Distribution System

B 4



snug

## The Performance improvement of Smart Partition





10

5

0

PrtitionO

Prition1\*

Prition2

Priition3\*

Priitiona\*

Prition5

Pritiono

PritionT



Prtition9<sup>5</sup>

Priition®



## The Power Profile : The Output (15us)

| 😑 🔽 - 🖓 - 🖌 🕒 🔓                                | 1,71    | 9,840,300 🔓 1,719,840,300 🛆 🔻 0 x 1ps 🔍 🕲 🕎 By: 📑 🕶 🕢 🕨 Go to: Concurre 💌 |
|------------------------------------------------|---------|---------------------------------------------------------------------------|
| = Concurrent-CAPP                              | -       |                                                                           |
| <pre>Pc(hunterelp_core_m)<br/>= 01d CAPP</pre> | 99.1mW  |                                                                           |
| Pc(hunterelp_core_m)                           | 98. OmW |                                                                           |

| Power Group                                                                                                | Internal<br>Power                                                           | Switching<br>Power                                          | Leakage<br>Power                              | Total<br>Power                                    | (%)                 | Attrs | Power Group                                                                                                                      | Internal<br>Power             | Switching<br>Power                                          | Leakage<br>Power                                     | Total<br>Power                          | (%)                  | Attrs |
|------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|-------------------------------------------------------------|-----------------------------------------------|---------------------------------------------------|---------------------|-------|----------------------------------------------------------------------------------------------------------------------------------|-------------------------------|-------------------------------------------------------------|------------------------------------------------------|-----------------------------------------|----------------------|-------|
| clock_network<br>register<br>combinational<br>sequential<br>memory<br>io_pad<br>black_box                  | 7.360e-03<br>0.0240<br>0.1499<br>3.801e-06<br>0.1098<br>0.0000<br>8.311e-08 | 0.4789<br>0.0000<br>1.316e-03<br>0.0000                     | 0.0130<br>0.0221<br>3.920e-05<br>9.539e-03    | 0.0477<br>0.6509<br>4.300e-05<br>0.1207<br>0.0000 | (14.49%)<br>(0.00%) | i     | clock_network<br>register<br>combinational<br>sequential<br>memory<br>io_pad<br>black_box                                        | 0.0240<br>0.1480<br>5.681e-06 | 0.4789<br>0.0000<br>1.316e-03<br>0.0000                     | 0.0130<br>0.0221<br>3.923e-05<br>9.539e-03<br>0.0000 | 0.0478<br>0.6491<br>4.491e-05<br>0.1207 | (14.52%)<br>( 0.00%) | i     |
| Net Switching Power<br>Cell Internal Power<br>Cell Leakage Power<br>Total Power<br>Peak Power<br>Peak Time | =<br>=<br>=<br>=                                                            | 0.4958<br>0.2910<br>0.0460<br>0.8328<br>2.4798<br>1734046.8 | (59.53%)<br>(34.94%)<br>( 5.52%)<br>(100.00%) |                                                   |                     |       | Net Switching Power<br>Cell Internal Power<br>Cell Leakage Power<br>Total Power<br>X Transition Power<br>CAPP Estimated Glitchin | =                             | 0.4959<br>0.2892<br>0.0460<br>0.8310<br>4.413e-04<br>0.0000 | (59.67%)<br>(34.80%)<br>(5.53%)<br>(100.00%)         |                                         |                      |       |
|                                                                                                            |                                                                             |                                                             |                                               |                                                   |                     |       | Peak Power<br>Peak Time                                                                                                          | =                             | 2.4752<br>1734046.8                                         |                                                      | Old                                     | CAPP                 |       |

SNUG TAIWAN 2024 13

snu

## The Power Profile : Power Waveform Comparison (15us)



SNUG TAIWAN 2024 14

sn



# Conclusions



- The Distribution mode and Concurrent-CAPP are suitable for long patterns, but if the pattern is too short, partitioning it may not gain runtime (performance) benefit.
- Obtaining power profiles for large-scale designs and extremely prolonged patterns becomes attainable, enabling comprehensive analysis and optimization.
- Leveraging Concurrent-CAPP + Smart Pruner + Smart Partition flow can yield an over 20X
  performance enhancement, substantially reducing execution time and significantly boosting
  productivity.

# Future Work



- Not yet supported Glitch Power
  - SNPS plans to support glitch power at the version of 2024.09
- Review to achieve better balanced partitioning in the flow for further performance improvement
  - Even after employing Smart Partition, the runtime improved by 1.72X, but the partitions remain unbalanced, needs to find out the key factor to do partition for getting better performance.



# THANK YOU

Our Technology, Your Innovation<sup>™</sup>