

# Application-level Hybrid Emulation for Software-Defined-Systems

Tim Kogel, Sr. Director, Technical Product Management Malte Dörper, Principal, Product Management Leonard Drucker, Scientist, Engineering Synopsys, Inc.



#### SoC Design and Verification Flow sn **Performance & Power** – Architecture Analysis, Optimization & Verification Post-**Platform Architect** ZeBu Shift left HAPS Silicon SoC PPA Verification System PPA Validation **PPA Sign-Off** Architecture Analysis & Optimization **Software Enablement** – Bring-up, Validation & Optimization Virtual Post-ZeBu Virtualizer HAPS Silicon Host HW/SW Verification SW Sign-Off Pre-RTL SW Development Validation & Tuning





### Synopsys HW-Assisted Verification Use Cases





#### **SYNOPSYS**°

### Hybrid Emulation and Prototyping Use-Cases



#### ZeBu – Virtualizer Hybrid

- Early Driver/Firmware/Application development
  - Complete SoC model using VDK and synthesizable RTL IPs
  - Power and performance validation over billions of application cycles
- Software driven System Validation with Real-world IO
  - Early Driver/Firmware/Application development
  - Modular and scalable validation from IP to system level
- Architecture and Performance Analysis
  - Efficient architecture analysis and smart performance monitoring
  - Offline and online analysis of performance data collected on ZeBu
  - Faster PPA optimization, helping shift left verification cycle



ZeBu - Platform Architect Hybrid



#### **SYNOPSYS**<sup>®</sup>

5

### Virtualizer Overview

### **VDK – Virtualizer Development Kit**

- Pre-RTL virtual model of hardware board
- ARMv8 & v9 Starting point VDKs available
- Fast and efficient debug and test

### Virtualizer Studio

- Efficient model and VDK creation
- Largest model library

|                |                                    |                                            |                                                                       |                            |                            |           | tes               |
|----------------|------------------------------------|--------------------------------------------|-----------------------------------------------------------------------|----------------------------|----------------------------|-----------|-------------------|
|                | - Run Creation VDK Debug Window    | Hep                                        | \$                                                                    | 6                          | Quick Access               | • 🖬 🖬 🏘   | G <b>85</b> 0 6 M |
|                | 👔 ARMDemo2.vdksys 😒                |                                            |                                                                       |                            |                            | - 8       |                   |
|                | UDK - ARMDemo2 Build:              | VDK - ARMDemo2 Build: success              |                                                                       |                            |                            |           | /ARMv             |
| arm            | Design la 2 3 - CPU_SS/CPU/CCI     |                                            |                                                                       |                            |                            |           | E                 |
|                |                                    | 🗱 SpecFlow 🙎 Interfaces 🕅 Parameters 📓     | Building Block 🛞 Documentation                                        |                            |                            |           |                   |
|                | type filter text                   | Tables 🔅 🗧 📄                               | ▼ Add New Entry                                                       |                            |                            |           |                   |
|                | E C ARMDemo2                       | type filter text                           | Specify new instance name or reuse existing instance     Name:     In | iterface:                  |                            | -         |                   |
|                | VIRTIO                             | E  Memory Map CCI.PVBUS_M - /CPU_SS/CPU/CC |                                                                       | ze:                        |                            | <u> </u>  |                   |
|                | 🔅 SYSRST                           | VIRTIO.PVBUS_M - /VIRTIO                   | Start, j                                                              | 261                        |                            |           |                   |
| Synopsys° 🚽    | C RAM B INTERUCTOR                 |                                            |                                                                       |                            |                            |           |                   |
|                | A 🧧 CPU_SS                         | 🖃 🖪 Reset Tree                             | type filter text                                                      |                            |                            |           |                   |
|                | □ <del>0</del> 600                 | SYSRST.RST - /SYSRST                       | RAM.MEM - JRAM                                                        | Start Address              | End Address<br>0x3fff_ffff | 0x40(     |                   |
|                | CCI<br>CLUSTER 0                   | SYSCLK.CLK - /SYSCLK                       | UART_0.apb8usTrgtPort - /Periphs/UART_0                               | 0x4000 0000                | 0x4000 0fff                | 0x400     |                   |
|                | CLUSTER_1                          |                                            | FileIO.Memory - /Periphs/FileIO VIRTIO.PVBUS - /VIRTIO                | 0x4100 0000<br>0x4200 0000 | 0x4100 0fff<br>0x4200 ffff | 0x00      | · · · ·           |
|                | GlobalCounter                      |                                            | GIC.PVBUS_S - /CPU_SS/CPU/GIC                                         | 0x6000 0000                | 0x4200 1111<br>0x6311 1111 | 0x040     |                   |
|                | 🜞 SimulationSpeedLim               |                                            | DRAM.MEM - /DRAM                                                      | 0x8000 0000                | 0x1111 1111x0              | 0x800     |                   |
| ARC. tensilica | E Periphs                          |                                            |                                                                       | 010000 0000 0000 1         |                            | 20007 001 |                   |
|                | UART_PHY                           |                                            |                                                                       |                            |                            |           | 0                 |
| Synopsys 🦲     | FileIO<br># Extend design using Sp |                                            |                                                                       |                            |                            |           | 🐉 Brea            |
|                |                                    |                                            |                                                                       |                            |                            |           | Sta               |
| ModelLibrary   |                                    |                                            |                                                                       |                            |                            |           |                   |
| Model Library  |                                    |                                            | •                                                                     |                            |                            | F         |                   |
|                |                                    |                                            |                                                                       |                            |                            |           |                   |
|                | Console 23 🏥 System Connecti       | vity 🛛 Zasks & Problems                    |                                                                       | 🕹 🗘 😫 🖬 🗟                  | = 🔒 🛃 -                    | 🖻 • 🖻 🗖   |                   |
|                | CDT Build Console [ARMDemo2]       |                                            |                                                                       |                            |                            |           | <u> </u>          |
| Synopsys®      |                                    |                                            |                                                                       |                            |                            |           |                   |
|                |                                    |                                            |                                                                       |                            |                            |           |                   |



### Synopsys Hybrid Technologies

integrated with Virtual, Emulation & Prototyping tools

- Complete tool stack, integration between ZeBu, HAPS-100, Platform Architect & Virtualizer
- Advanced technologies
   FastMem server, Checkpoint / Restore
- **High productivity** and **performance** for authoring & runtime
- Large set of pre-integrated models (Arm FastModels, ARC, Tensilica, CEVA & other 3rd party models)
- Integrated System Level Debug with Software and Hardware debuggers





## Fast Creation of Hybrid Emulation Setups

#### Authoring Automation



- Faster Hybrid setup
- Configure both hybrid adaptors & ZeBu testbenches
- Correct by construction

### **Accelerated Memory Execution**

Advanced memory sharing technology





- Accelerates sharing of hybrid memory regions
  - Memory segmented into pages, with page ownership tracking
  - Automatic coherency of shared mem regions
  - Efficient data move
  - Speedup: ~5-10x
- Supports address interleaving & multi-bank
- Supports multiple use-cases
  - FastMem
  - Distributed FastMem
  - Extra large FastMemX

### Higher Hybrid Debug Productivity

#### ZeBu views in Virtualizer Studio



#### • RTL browser

- Access to all DUT signals
- Runtime monitor
- System Clock Status
- XTORs Readiness Status
- Runtime control
- Waveform dump (FWC, QiWC, Dynamic Probe)
- Triggers
- Forces

10

### Hybrid Emulation for faster SW Development

Presented by NXP and Synopsys at SNUG World 2021



- Bring-up SW on Application Cores before RTL is finalize & stable
- · Identify early SW bugs
- · Identify early bugs in HW RTL by being able to execute more realistic tests
- · Software Engineers are very satisfied by the speed improvement

SNUG 2021



- Increasing demand for software bring-up on emulator for complex embedded multicore SoC
- Hybrid emulation technology provides shift-left approach and helps to reduce risk and cost
- Fast setup with large library of models and transactors in Virtualizer and ZeBu
- Hybrid FastMem technology enables high simulation speed for shared memory access

### Automotive Supply Chain Enablement

Presented by BOSCH at Synopsys VP-Day 2023





- Enabled testing of all critical IPs coming from Tier-1 and Tier-2
- Verify critical functionality of the HW, timing properties, and performance on IP and system level
- Ensuring correct system integration and sanity
- Avoided costly HW fixes and late SW taskforces
- Early understanding of the specification
- Flexible support for regression and interactive use globally

SYNOPSYS

### State-Of-The-Art Hybrid

Key Use-cases deployed over 10+ Years

| Use-case                          | Components                                                                                                                 |  |
|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------|--|
| SW Bring-up<br>and<br>Development | Scope: IP/Subsystem<br>ZeBu: IP RTL<br>VDK: Core + System Components                                                       |  |
| System<br>Bring-up                | Scope: SoC<br>ZeBu: System Components RTL<br>VDK: Core                                                                     |  |
| Performance<br>Benchmarking       | Scope: IP/Subsystem<br>ZeBu: IP RTL + Core RTL<br>VDK: Core + System Components                                            |  |
| Power<br>Profiling                | Scope: IP/Subsystem<br>ZeBu: IP RTL + Power Intent (UPF<br>VDK: Core + System Components<br>Empower: Dynamic Power Profile |  |



#### **SYNOPSYS**°

**Snu** 

### Synopsys HW-Assisted Verification Use Cases





## New Requirement: Pre-silicon Application Benchmarking



State-of-the-art Hybrid OS & driver bring-up, some power monitors New Hybrid Requirements

"I want to be able to run my application SW benchmarks" "I need pre-silicon analysis of SW-driven power & performance"



### Leverage Available Software before Silicon

Industry changes supporting Shift-Left

- Common use of open-source for OS stacks
  - Products use open-source operating systems and software stacks
  - Linux is prevalent, Android is prevalent in automotive, mobile, consumer
  - Early access to SW stacks  $\rightarrow$  earlier system validation
- Freely available application Software
  - Apps and benchmarks are established in many markets
  - Easily added to pre-silicon environments
- Multitude of product configurations
  - Products designed for reconfigurability to adjust to customer
  - Many SKUs defined by SW
  - Pre-Silicon testing needs to include ability to change





ΔυτοΣΔR







#### Software Defined Systems → Hardware managed by Software



Sil



### Emerging "Application-Level" Hybrid Emulation

### The Transition to "Software Defined Systems"

And its impact on Hybrid Emulation



- The state-of-the-art hybrid (noted as **purple boxes**) needs to be enhanced with SW (noted as **yellow** boxes).
- For the hybrid to be useful for benchmarking software defined systems, all the relevant SW needs to be available
- Latest performance advances in virtual and emulation enable execution of application-level workloads
- Observability capabilities in Virtual and Hybrid enable correlation of SW to power and performance bottlenecks





### Achieve Speed and Insight



Application-Level Hybrid Emulation with Synopsys Virtualizer and ZeBu

|             | Speed                                                                | Insight                                                                                                      |  |  |  |
|-------------|----------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|--|--|--|
| Virtualizer | Execution speed 100's – 1000's<br>MIPS                               | Capture function traces across end-user<br>apps, to identify SW functions with<br>adverse impacts on system. |  |  |  |
| ZeBu        | Execution speed 3MHz – 10's<br>MHz; from emulation to<br>prototyping | Capture power across billions of cycles,<br>to find "anomalies", e.g. high power with<br>low performance.    |  |  |  |











Billions cycles power profile



DUT

ZeBu

DDR Controller



#### **SYNOPSYS**°

### Using Hybrid Platform to study AI Architecture







- Passing the image to MobilNetV2 returning that the 945<sup>th</sup> Neuron gets the maximum probability of 0.878826
- Label on the ImageNet dataset corresponding to 945<sup>th</sup> Neuron is "Bell Peppers"

root@genericarmv8:/mnt/dropbox# ./arm\_files/ExecuteNetwork -c CpuAcc -f armnn-binary -m /mnt/dropbox/MobileNetV2/MobileNet.a
armnn -d ./MobileNetV2/img.txt
Warning: DEPRECATED: The program option 'model-format' is deprecated and will be removed soon. The model-format is now autom
atically set.
Info: ArmNN v33.0.0
Couldn't find any of the following OpenCL library: libOpenCL.so libGLES\_mali.so libmali.so
Info: Initialization time: 46.47 ms.
Info: Optimization time: 653.14 ms
===== Network Info =====
Inputs in order:
Inputs in order:
InputLayer, [1,3,224,224], Float32
OutPutLayer, [1,1000], Float32

0.000005 0.000041 0.000039 0.000099 0.000027 0.000023 0.000007 0.000008 0.000011 0.000029 0.000006 0.000008 0.000002 0.000 05 0.000071 0.000701 0.000105 0.000022 0.000041 0.000020 0.000003 0.000027 0.000102 0.000125 0.000051 0.000029 0.000102 0.0 0068 0.000250 0.000059 0.000657 0.001603 0.000160 0.000516 0.000584 0.041188 0.000147 0.878826 0.000010 0.000020 0.010310 0 000077 0.001368 0.002429 0.000028 0.000164 0.002593 0.000043 0.000107 0.000090 0.000029 0.000027 0.000002 0.000012 0.000336 0.000074 0.000099 0.000206 0.000023 0.000007 0.000019 0.000026 0.000019 0.000004 0.000028 0.000022 0.000013 0.000013 0.0000 5 0.000007 0.000005 0.000014 0.000013 0.000016 0.000017 0.000009 0.000009 0.000028 0.000168 0.002334 0.000034 0.000391 0.00 011 0.000005 0.000029 0.000014 0.000010 0.000017 0.000021 0.000006 0.002488 0.000003 Info: Inference time: 5932.09 ms

### **GPU Benchmark Case Study**

Example: GFXBench Aztec Ruins

- Hybrid execution performance requirements
  - 1 minute of real time execution
  - Relevant observation interval starts 20 seconds into the benchmark
- Observations
  - High-power sometimes not correlated with high performance
  - High-performance sometimes not correlated with high power
- Root-cause examples
  - Power bugs, where certain GPU domains do not get disabled
  - Power optimizations causing high performance under lower power



#### SNUG EUROPE 2024 24

### Shift Focus to Accelerate System Success

- Hybrid well established for SW bring-up
- State-of-the-art Hybrid Emulation
  - A faster platform for HW prototyping
- Focus shifting to "Software-Defined Systems"
  - System = software + hardware
  - Bring-up and optimization of user applications







# THANK YOU

Our Technology, **Your** Innovation<sup>™</sup>