

## Improve CTS QoR by H-tree-only (Non-Mesh) Regular MSCTS for complex floorplans and notches design

Presenter: Luan Pham - Ngoc Le Author: Luan Pham – Ngoc Le – Phuong Le Quest Global Vietnam Agenda









## **About Quest Global**



## We are Quest Global

## We strive to be the most trusted partner for solving the world's hardest engineering problems

## Who we serve





Aerospace and Defense



Automotive



Communications



Energy



Hi-Tech



MedTech and Healthcare



Rail



Semiconductors

## Semiconductors





Silicon Engineering and Platform Engineering

#### **Highlights**





6

## End-to-end Semiconductor Capabilities Out



snug





## Abstract



 In VLSI (Very Large Scale Integration) design, a clock plays a crucial role in synchronizing the operations of various components within a digital circuit. And various clock structures or methodologies are used to implement the clock scheme efficiently.

This paper introduces H-tree-only Regular Multisource Clock Trees (MSCTS) with htree\_sessions, which is
new feature of Fusion Compiler (FC) aimed at simplifying the implementation. H-tree-only MSCTS offer a
streamlined approach to clock distribution, particularly in complex floorplans and notches, where traditional
methods may encounter challenges. Beside, the htree\_sessions feature, which is available from version U2022.12-SP3, provides a framework for automating the generation of H-tree structures and multiple clock
sources tailored to diverse design requirements. This feature ensures efficient clock signal distribution while
minimizing timing skew and increase CRPR, especially in high complexity floorplan. Through testing and
validation, this paper demonstrates the effectiveness and ease of implementation of H-tree-only Regular
MSCTS using the htree\_sessions feature, offering a robust solution for clock distribution in a CPUs highperformance computing, complex floorplan and high density operating at 1.5GHz.



### Problem statement Section Subtitle

## **Challenging Points**

1

2

3



Many designs have notch complex floorplan,

**So, Designers face difficulty in improving the Quality of Results (QoR) at CTS** Designing clock networks manually in complex floorplans can be quite challenging due to various layout constraints such as routing congestion, timing closure, and signal integrity issues.

#### Improve QoR at CTS needs experienced engineer

Improving QoR CTS (Optimal Routing Clock Tree Synthesis) indeed requires a deep understanding of clock tree synthesis techniques, timing closure, and physical design challenges

#### Manual implementation of clock building consumes significant amount of time

Building a clock manually can indeed be a time-consuming task, especially if you're crafting it from scratch with intricate details. From designing the mechanism to assembling the components, it requires precision and patience

## Using H-tree-only (Non-Mesh) Regular MSCTS

1

2

3



#### This approach can improve the Quality of Results (QoR) in CTS

With this feature, user can build multiple global trees at different parts of the floorplan and achieve better clock QoR especially latency and skew. With this feature, FC tools to generate initial clock tree structures based on the floorplan constraints. These tools can help in achieving timing closure and optimizing the clock distribution network.

#### This feature can save time for engineers and also saves human resources

This feature saves time and effort for engineers and helps ensure the optimal performance of the clock distribution network.

Can save human resources through its design simplicity and potential for automation

#### This technique can be automated to build clocks efficiently

The feature automatic insertion of tap drivers in H-tree-only Regular MSCTS streamlines the clock tree synthesis process by efficiently balancing timing, power, and area considerations while adhering to design constraints.



## Algorithm htree\_sections of tool U 2022 Section Subtitle

## Introduction to Multisource Clock Tree Synthesis (MSCTS)

Multisource CTS, or Multisource Clock Tree Synthesis, indeed presents an innovative approach to clock distribution technology. Traditionally, clock distribution in integrated circuits has relied on a single source for propagating the clock signal throughout the chip. However, as chip designs become more complex and demand higher performance, conventional clock distribution methods encounter challenges such as clock skew, jitter, and power consumption.

Multisource CTS addresses these challenges by introducing multiple sources for clock distribution strategically placed across the chip. By distributing the clock signal from multiple points, Multisource CTS mitigates issues related to skew and jitter, resulting in improved timing, reduced power consumption, and enhanced performance scalability.





### Flow of H-tree only (Non-Mesh) **Regular MSCTS** Clock Root

#### **Tap Drivers**

1

2

3

These are specialized buffers typically inserted at the root nodes of each H-tree, connecting directly to the clock source. Their primary function is to isolate the H-tree from variations in the clock source itself.

#### Build global clock tree (H-tree)

Once the buffers and taps are inserted, the next step is to route the clock nets from the sources to the Tap Driver

#### Perform tap assignment

Utilize an algorithm to assign tap drivers to the selected candidate locations. The tap assignment algorithm aims to optimize clock skew, minimize insertion delay, and balance the load across the clock network.



(H-tree)

Tap drivers

Subtrees



# Algorithm htree\_session of FC tool version U 2022



#### set\_regular\_multisource\_clock\_tree\_options -htree\_sections

User can provide "tap configs along with section boundary" or tap locations directly. The H-tree structure will be divided into segments or subsections, possibly to optimize the clock tree for specific regions of the chip or to address routing constraints more effectively.

synthesize\_regular\_multisource\_clock\_trees

 Derives boundary based on sink distribution and insert the tap drivers based on inputs

 Explores different tap driver locations for H-tree compatibility and inserts them

- Builds symmetric H-tree

1

2







Compare QOR CTS RESULT: Normal CTS and H-tree-only RMSCTS use htree\_section

#### Normal CTS ■ RMSCTS ■ %diff 101 74 23.2 22.4 5.825 5.81 0.1 5.1 4% 27% 37% %0 Hold TNS (ns) Setup TNS (ns) Runtime (h) Total Power (W) Normal CTS 8.1 101 5.825 22.4 ■ RMSCTS 5.1 74 5.81 23.2 %diff -37% -27% 0% 4%

**Comment**: Setup and hold timing are improved so much:

- Setup : -27%
- Hold : -37%
- Skew and latency are reduce about 6%.
- Total power is same.







#### Compare QOR **ROUTE\_OPT** RESULT: Normal CTS and H-tree-only RMSCTS use **htree\_section**





- Setup : -22%
- Hold : -98%
- Skew and latency are reduce about 1%. -
- Runtime reduce -3% after PnR -
- Total power is same. -





Define regular MSCTS settings

set net CKnet\_name ; **# set net need to build H-tree-only Regular MSCTS** set clk\_name clka ; **# set main clock to build H-tree-only Regular MSCTS** 

#### **#Define regular MSCTS settings**

set\_regular\_multisource\_clock\_tree\_options \
 -clock \$clk\_name \
 -topology htree\_only
 -prefix MSCTS
 -net [get\_nets \$net] \
 -tap\_lib\_cells \$CKbuf \
 -htree\_routing\_rule \$htree\_ndr \
 -htree\_lib\_cells \$CKbuf
 -htree\_layers "M12 M13" \
 - htree\_sections [ list \
 [ list\_section\_name LuanP\_A -prefix MSCTS\_htree -tap\_locations {{525.1 552.5}} \
 [ list\_section\_name LuanP\_B -prefix MSCTS\_htree -tap\_locations {{511.2 964.5}} \
 [ list\_section\_name LuanP\_C -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree -tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D -prefix MSCTS\_htree\_tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D\_D\_prefix MSCTS\_htree\_tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D\_D\_prefix MSCTS\_htree\_tap\_locations {{905.3 1035.5}} \
 [ list\_section\_name LuanP\_D\_D\_prefix M

report\_regular\_multisource\_clock\_tree\_options







 The FC tool automatically inserts tap drivers, build H-tree, perform tap assignment.

#### #Insert tap drivers, build H-tree, perform tap assignment

synthesize\_regular\_multisource\_clock\_trees -from tap\_synthesis -to htree\_synthesis

**# Highlight sink distribution from tap assignment** source highlight\_multisource\_clock\_subtrees.tcl highlight\_multisource\_clock\_subtrees -clock clka







## Conclusions



- Overall, this new feature <u>-htree\_section</u> of RMSCTS, which available from ver U-2022 of Fusion Compiler is very useful for us.
- Advantage:
  - QoR improved:
    - Latency and skew at CTS step improved 6%
    - Timing violation reduced 37%
    - Run time reduce 3% after PnR
    - Power almost same.
  - This feature will be most effective for complex floorplan because sinks are divided into section.
  - Basically, implement clock tree manually need to take care by experience engineer and it also take time.
     But with new technique, engineer only need to provide full combo 3 commands and FC can handle it.

#### Disadvantage:

– Need several trials to define good boundary section. To overcome it, we share an experience:

- Each branch of H-tree will have the specific number of End points -> The boundary of corresponding section must cover all of the target End points (FF) area.

## Reference

Body Slide Subtitle

[1] IC\_Compiler\_II\_MCTS\_overview\_2018.06[2] Fusion Compiler Tool Commands [U-2022]





## THANK YOU

YOUR INNOVATION YOUR COMMUNITY