

## **Power Optimization Flows in Deep Sub-Micron Technologies for Computer Vision Systems** A Comprehensive Framework for Energy-Efficient Design

Ajay Mehta (Staff Engineer) Bhanusekhar Kalaga (Senior Staff Engineer) Vamsi Chakradhar (Staff Engineer) MediaTek Bangalore Pvt Ltd

## AGENDA

- INTRODUCTION
- FLOW PRACTICES
- SAIF BASED POWER OPTIMIZATION
- IN-DESIGN PRIME POWER
- RESULTS
- CONCLUSION
- **Q&A**









### **In-Design Prime Power**

Dynamic Power Optimization using In-Design Prime Power

4

snu

SOLUTIONS







### **Synthesis**

- Vectorization
- Ungroup Hierarchies
- Boundary Optimization

- Clock Gating
- Logic Restructuring
- Concurrent Clock & Data Optimization
- Clock Slew Setting

### **Physical Design**

- Enhanced Low Power Placement
- Other Fusion Compiler App Options





- Reduce clock tree cell & net length Reduce reg area
- <u>App Option:</u> compile.flow.enable\_multibit true

### Ungroup Hierarchies





- Regrouping at flat or hierarchical level
- <u>App Option:</u> compile.flow.autoungroup true

### **Boundary Optimization**

Propagation of constant:



#### Propagation of equal/opposite:



#### Phase inversion:



- Propagate unconnected port information.
- <u>App Option:</u>

compile.flow.boundary\_optimization true



**Clock Gating** 



- Cutoff high frequency functional clocks during ideal state.
- PnR controls placement of clock gating cell.

### Logic Restructuring

Composition: Absorb in complex gate.



Decomposition: Driven by smaller gate



Rewiring: Feed at last stage



• <u>App Option:</u>

opt.common.advanced\_logic\_ restructuring\_mode –value <power or timing\_power>

### Concurrent Clock & Data Optimization



- Fix timing violation with useful skew
- App Option:
  - compile/clock\_opt/route\_opt.flow.
    enable\_ccd true
    clock\_opt.flow.enable\_clock\_power\_
    recovery -value auto

### MEDIATEK

sn

### **Clock Slew Setting**

- Need to set an optimum value
- Too relaxed: clock slope increase short circuit power
- Too tight: over splitting of clock tree, more cells added

<u>App Option:</u>
 set\_max\_transition -clock\_path <slope>
 [get\_clocks \*] -scenarios [get\_scenarios

### Enhanced Low Power Placement (eLPP)

- Apply net weights to direct the placer to shorten nets of high activity
- Allow placement to get better balance between power and timing

<u>App Option:</u> place.coarse.enhance\_low\_power\_ effort -value <low/medium/high>

### Other Fusion Compiler App Options

- Enable power optimization: place\_opt/ clock\_opt/ route\_opt.flow. enable\_power -value true
- <u>Enable power recovery</u>: clock\_opt/route\_opt.flow.enable\_ clock\_power\_recovery -value auto
- <u>Configure app options and tool flow:</u> set\_qor\_strategy -stage pnr -metric total\_power -mode extreme\_power



## SAIF BASED POWER OPTIMIZATION

## **RTL SAIF Flow**





- For RTL SAIF flow, fusion compiler optimize dynamic power more accurately than for without SAIF flow.
- <u>Challenge</u>: RTL SAIF cannot cover whole netlist's changes throughout optimization steps. So, need to release multiple PnR experiments for PTPX, as power report is not enough to finalize trial before PTPX signoff.

## **RTL SAIF Setup**





## **RTL SAIF Sanity Check**



### Check coverage:

| report_activity –driver > ac | tivity.rpt |
|------------------------------|------------|
|------------------------------|------------|

| Essential activ<br>Activity Type | ity is complete<br>primary-input | seq-pin         | icg-pin              | comb-pin   | no-func       | total   |
|----------------------------------|----------------------------------|-----------------|----------------------|------------|---------------|---------|
| simulated                        | 817 ( 99.9%)                     | 325614 ( 93.2%) | 7422 (100.0%) 238666 | 4 ( 65.6%) | 8066 ( 81.2%) | 2728583 |
| annotated                        | 0 ( 0.0%)                        | 0 ( 0.0%)       | 0 ( 0.0%)            | 0 ( 0.0%)  | 0 ( 0.0%)     | Θ       |
| inferred                         | 0 ( 0.0%)                        | 0 ( 0.0%)       | 0 ( 0.0%)            | 0 ( 0.0%)  | 0 ( 0.0%)     | Θ       |
| derived                          | 1 ( 0.1%)                        | 0 ( 0.0%)       | 0 ( 0.0%) 8313       | 6 ( 2.3%)  | 0 ( 0.0%)     | 83137   |
| calculated                       | 0 ( 0.0%)                        | 0 ( 0.0%)       | 0 ( 0.0%)            | 0 ( 0.0%)  | 0 ( 0.0%)     | 0       |
| default                          | 0 ( 0.0%)                        | 23764 ( 6.8%)   | 0 ( 0.0%) 116775     | 0 (32.1%)  | 1865 ( 18.8%) | 1193379 |
| total                            | 818 (100.0%)                     | 349378 (100.0%) | 7422 (100.0%) 363755 | 0 (100.0%) | 9931 (100.0%) | 4005099 |

Better coverage, better power optimization by the tool

Missing Objects : (report\_activity -rtl -print\_objects {default {seq-cell tri-cell}} > saif\_missing.rpt

Usually includes DFT/TMBIST registers added in synthesis



## IN-DESIGN PRIME POWER

## In-Design Prime Power (IDPP)



### Need:

As netlist change during PnR optimization stage, due to which fusion compiler cannot optimize dynamic power correctly based on RTL SAIF flow.





## In-Design Prime Power Setup



MEDIATEK





### Results

### Seq-pin Coverage



## MEDIATEK SNUG Power Comparison





### Partition1 QoR Summary:

| APR<br>RESULT | Strategy | Std Cell<br>Count | Congestion             | Setup<br>(WNS/TNS/NVP) | Hold<br>(WNS/TNS/NVP) |
|---------------|----------|-------------------|------------------------|------------------------|-----------------------|
|               | RUN_IDPP | 3962254           | H: 0.298%<br>V: 0.458% | -0.033/-3.97/1560      | -0.079/-15.47/5696    |
|               | RUN_SAIF | 3958351           | H: 0.175%<br>V: 0.382% | -0.038/-3.67/1526      | -0.105/-20.54/7627    |

### Partition2 QoR Summary:

| APR<br>RESULT | Strategy | Std Cell<br>Count | Congestion           | Setup<br>(WNS/TNS/NVP) | Hold<br>(WNS/TNS/NVP) |
|---------------|----------|-------------------|----------------------|------------------------|-----------------------|
|               | RUN_IDPP | 1920688           | H: 0.11%<br>V: 0.13% | -0.151/-49.79/1940     | -0.063/-38.353/3001   |
|               | RUN_SAIF | 1906054           | H: 0.11%<br>V: 0.17% | -0.151/-28.064/1246    | -0.077/-21.150/2734   |





### Conclusion





For run based on In-Design Prime Power flow, dynamic & total power has improved by 10-20%



QoR are comparable for both SAIF-based & InDesign Prime Power run



So, for power critical design, In-Design Prime Power based power optimization flow is recommended







## THANK YOU

Our Technology, Your Innovation<sup>™</sup>