Skill-Datei

Kernel Trace Analysis

Name: Kernel Trace Analysis
Author: zhiding512

Profile GPU kernels using rocprofv3 to collect ATT instruction-level traces, then analyze the trace data using hotspot_analyzer.py to identify top-K stall hotspots (VMEM-load, VMEM-wait, LDS/SMEM-wait, barrier, MFMA stalls) mapped back to source lines, and produce an actionable optimization plan. Usage: /kernel-trace-analysis <cmd> Can also analyze an existing dispatch dir directly: /kernel-trace-analysis --dir <path>

zhiding5120 Sterne09.04.2026

Beruf
Kategorien: Debugging

Skill-Inhalt

Profile and analyze GPU kernel ATT traces to identify stall hotspots and produce an optimization plan.

Arguments

Argument	Description
`<CMD>`	Command to profile. Example: `python bench_pa.py --batch 32`
`--dir <path>`	Skip collection; analyze existing `ui_output_agent__dispatch_` directory
`--topk N`	Show top-N hotspots (default: 15)

Hotspot Analyzer Script

The hotspot analyzer is located at scripts/hotspot_analyzer.py. It reads a ui_output_agent_*_dispatch_* directory and reports top-K stall hotspots.

Workflow

Mode A: Analyze existing dispatch directory

Verwandte Skills

Kernel Trace Analysis | Skills Pool

# Write hotspot_analyzer.py (see above), then:
python /tmp/hotspot_analyzer.py <dispatch_dir> --topk 15 --mode both
python /tmp/hotspot_analyzer.py <dispatch_dir> --topk 5 --mode src --detail --context 4

touch /tmp/trace_ts
rocprofv3 --stats --kernel-trace -f csv -- <CMD> 2>&1
find . -maxdepth 3 -name "*stats*" -newer /tmp/trace_ts -type f 2>/dev/null

sqlite3 results.db "
SELECT ks.KernelName, COUNT(*) calls,
       ROUND(AVG(kd.end-kd.start)/1000.0,1) avg_us
FROM rocpd_kernel_dispatch kd
JOIN rocpd_info_kernel_symbol ks ON kd.kernel_symbol_id=ks.id
GROUP BY ks.KernelName ORDER BY avg_us DESC LIMIT 20;"

cp ~/Documents/input.yaml /tmp/trace_input.yaml

Kernel Trace Analysis

Arguments

Hotspot Analyzer Script

Workflow

Mode A: Analyze existing dispatch directory

Kernel Trace Analysis

Arguments

Hotspot Analyzer Script

Workflow

Mode A: Analyze existing dispatch directory

Mode B: Full collection workflow

Step 1: Kernel Discovery

Step 2: Configure input.yaml

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags