Optimize power, performance, and area (PPA) for a HighTide2 design. Maximize cell utilization and clock frequency while maintaining a clean flow (no DRC violations).
You are optimizing the PPA (power, performance, area) of the design at designs/$0. The goals are:
Important: HighTide is a benchmark suite — the RTL is a fixed input. Never suggest modifying the upstream Verilog/RTL. All optimizations must be scoped to flow parameters (config.mk), timing constraints (constraint.sdc), physical design files (io.tcl, pdn.tcl), and FakeRAM configuration.
Key lesson: utilization and clock period are coupled. Tighter clock constraints cause synthesis and repair_timing to insert more buffers, increasing the effective cell area. A design that fits at 80% utilization with a relaxed clock may overflow at 80% with an aggressive clock. Optimize utilization first at the current clock, then tighten the clock and re-check utilization.
First, gather the current metrics from the most recent build.
Find the metrics report (check bazel-bin/designs/$0/ then artifacts/$0/):
cat bazel-bin/designs/$0/logs/*/base/6_report.json 2>/dev/null
cat artifacts/$0/logs/*/base/6_report.json 2>/dev/null
If no artifacts exist locally, fetch from the Nautilus PVC:
./tools/fetch_artifacts.sh --keep <platform> <design>
If no build exists at all, run the flow:
bazel build //designs/<platform>/<design>:<design>_final
Record baseline metrics:
report_clock_min_period — the true minimum achievable clock period (more reliable than computing from WNS)Read the design configuration:
designs/<platform>/<design>/config.mkdesigns/<platform>/<design>/constraint.sdcdesigns/<platform>/<design>/BUILD.bazel (if it exists)designs/<platform>/<design>/pdn.tcl (if it exists)designs/<platform>/<design>/io.tcl (if it exists)Artifact path convention: Throughout this skill, paths like logs/*/base/ refer to whichever artifact location has the files. Check bazel-bin/designs/$0/ first, then artifacts/$0/.
The goal is to increase CORE_UTILIZATION as high as possible while maintaining a routable design with zero DRC violations.
Compare the target utilization (from config.mk) to the achieved utilization (from the metrics report). If there is a large gap, the die is oversized.
Also check the placement density — if PLACE_DENSITY is much lower than the target utilization, placement may be too spread out.
Raise CORE_UTILIZATION in steps (e.g., +5% at a time). For each step:
CORE_UTILIZATION in BUILD.bazel (or config.mk)PLACE_DENSITY to match — it should be slightly above the utilization fraction (e.g., if utilization is 60%, density ~0.65-0.70)bazel build //designs/<platform>/<design>:<design>_finalAs utilization increases, congestion will become the bottleneck. See .claude/skills/shared/congestion-analysis.md for the fix priority and diagnostic approach.
After each build, check the global routing congestion report — this is the most reliable numeric indicator of whether utilization can be pushed further:
grep -A 15 "Final congestion report" logs/*/base/5_1_grt.log 2>/dev/null
.claude/skills/shared/congestion-analysis.md before increasing utilization furtherA significant increase in flow runtime compared to the baseline is a strong indicator that the design is over-constrained — the tools are spending excessive time on repair_timing iterations, detailed routing retries, or placement optimization that cannot converge well.
How to detect runtime blowup:
What to do when runtime blows up:
When congestion blocks further utilization increases, generate heatmaps to identify specific problem areas. See .claude/skills/shared/image-generation.md for Tcl scripts and Docker commands. The most useful heatmaps:
The goal is to find the highest Fmax by tightening the clock period until timing violations appear, then backing off slightly.
Read the timing report and extract report_clock_min_period:
grep "period_min\|fmax" reports/*/base/6_finish.rpt 2>/dev/null
This gives the true minimum period directly — use it as the starting point for clock tightening instead of computing from WNS.
Also determine where timing margin exists:
Clock tree insertion delay can cause misleading timing results when IO constraints assume ideal clocks.
Find clock insertion delay from the CTS log or finish report (report_clock_skew section)
Compare to IO delay budget: IO delays are typically clk_period * clk_io_pct. If clock insertion delay is a significant fraction of this budget, IO paths have unrealistic constraints that will limit apparent Fmax.
Separate IO timing from core timing: For benchmarking, the core register-to-register Fmax is what matters. If IO paths are the bottleneck:
clk_io_pct (e.g., 0.3–0.8) to give IO paths more slackset_clock_uncertainty to model expected skewUse report_clock_min_period from the finish report as the starting target, then binary search:
clk_period in constraint.sdcclk_period toward the period_min value (minus a small margin)Set TNS_END_PERCENT = 100 in config.mk to ensure the flow tries hard to close timing.
Remember the util/clock coupling: after tightening the clock significantly, re-check that the design still fits. CTS repair_timing inserts buffers that increase cell area. You may need to lower utilization when the clock gets aggressive.
As the clock period gets tighter, the CTS and routing stages will spend increasingly more time on repair_timing iterations. If a run is taking significantly longer than the baseline (2-3x+), the clock target is likely too aggressive — kill it, back off to the previous period, and declare that as the achievable Fmax.
Power is generally a secondary concern for benchmarking, but some quick wins:
Check for IR drop issues — Generate an IR drop heatmap (see .claude/skills/shared/image-generation.md). If there are hotspots, create or adjust pdn.tcl to add power stripes (see designs/asap7/gemmini/pdn.tcl).
ABC area optimization (ABC_AREA = 1) reduces cell count, which also reduces dynamic power.
Review power report in the JSON metrics (finish__power__total). Power will naturally decrease as area decreases (higher utilization = smaller die = shorter wires = less capacitance).
For visual diagnosis, generate images using OpenROAD's save_image command.
See .claude/skills/shared/image-generation.md for the full Docker/Xvfb setup, Tcl scripts, and heatmap variants (routing congestion, placement density, RUDY, IR drop).
After each round of changes, re-run the flow and compare metrics:
bazel build //designs/<platform>/<design>:<design>_final
Present results as a comparison table:
| Metric | Baseline | Current | Change |
|--------------|----------|----------|---------|
| Utilization | 35.0% | 55.0% | +20% |
| Die Area | 12500 | 8200 | -34% |
| Fmax (GHz) | 1.25 | 1.42 | +14% |
| WNS (ns) | 0.05 | 0.01 | -0.04 |
| Power (mW) | 45.2 | 38.7 | -14% |
| DRC errors | 0 | 0 | clean |
| GRT overflow | 0 | 0 | clean |
| Runtime (s) | 255 | 260 | +2% |
Continue iterating until either: