Use this skill when diagnosing what a trading strategy is good at, where it loses money, which regime it fits, why sibling variants differ, or what targeted improvement to test next. Use it for general strategy review requests that need regime slicing, post-decision drift, trace comparison, behavior analysis, or head-to-head analysis of completed backtest artifacts under artifacts/runs/.
Use this skill to diagnose strategy behavior from completed backtest artifacts in this repo.
Preferred inputs:
artifacts/runs/Preferred artifact sources:
artifacts/runs/<run_id>/report/report.jsonartifacts/runs/<run_id>/report/summary.jsonartifacts/runs/<run_id>/logs/trace.jsonlartifacts/runs/<run_id>/logs/default.jsonlFollow this order. Do not jump to strategy changes before the diagnosis steps are done.
Answer these independently:
If a user asks a vague question like “is this strategy good,” translate it into those four questions first.
Use completed full runs when possible. Slice report.json and trace.jsonl into windows/regimes instead of launching many narrow backtests.
Reason:
Use scripts/slice_backtest_windows.py for calendar slicing.
At minimum classify:
Use scripts/classify_regimes.py.
Read references/regime_definitions.md before changing thresholds or definitions.
If the strategy does not expose ADX/DMI-style features, adapt the regime classifier inputs instead of forcing the current defaults.
For each regime or slice, compare:
Do not treat raw sell fill count as strategy frequency. Prefer:
Use:
scripts/summarize_trade_behavior.pyscripts/slice_backtest_windows.pyscripts/build_regime_matrix.pyscripts/analyze_trade_clustering.py when cooldown-style churn is a live hypothesisRead references/metric_definitions.md if the user asks why a metric was chosen.
For each weak regime, inspect:
Compare actual decision changes from trace logs, not just final backtest summaries.
Use scripts/compare_variants.py to locate timestamps where sibling variants diverge.
For important events, measure forward return after:
Default horizons:
1, 3, 6, 12, 24 barsUse scripts/compute_post_decision_drift.py.
This is the main guardrail against telling superficial stories from aggregate PnL alone.
If the likely issue is repeated re-entry into unresolved noise, run a clustering pass before proposing a cooldown.
At minimum compute for completed trades in the target regime:
Use scripts/analyze_trade_clustering.py.
Default:
cluster_window_bars=200-5, 6-10, 11-20, 21+If losses are concentrated in second/third-or-later re-entries or in short-gap buckets, say that explicitly. That is the evidence needed before recommending a cooldown as the first experiment.
If the user has sibling strategies, compare them directly:
Use scripts/compare_variants.py.
If two variants differ at only a handful of timestamps, say so explicitly. Do not overstate the significance of performance deltas when decision deltas are sparse.
Final diagnosis should include a table with:
Use scripts/build_regime_matrix.py to join regime labels with realized step returns.
Use scripts/build_diagnosis_report.py to assemble a markdown summary after the intermediate CSV/JSON artifacts exist.
Map observed failure modes to targeted next experiments:
Prefer one or two concrete next experiments, not a large idea list.
When possible, produce:
diagnosis_summary.jsonregime_slices.csvregime_labels.csvregime_matrix.csvdecision_drift.csvvariant_deltas.csvtrade_behavior.csvtrade_clustering.csvtrade_clustering_rank_summary.csvtrade_clustering_gap_summary.csvtrade_clustering_summary.jsontrade_clustering_snippet.mddiagnosis_report.mdPlace outputs under a strategy-local docs folder or a temporary analysis folder under artifacts/.
/Users/zhaoyub/Documents/Tradings/SoionLabPYTHONPATH=src for Python commandssrc/analyze/backtest/reporter.pyscripts/classify_regimes.py
scripts/slice_backtest_windows.py
scripts/compute_post_decision_drift.py
scripts/compare_variants.py
scripts/summarize_trade_behavior.py
scripts/build_regime_matrix.py
scripts/analyze_trade_clustering.py
scripts/build_diagnosis_report.py
Read only what you need:
references/regime_definitions.md
references/metric_definitions.md
references/trace_fields.md