Strategy validation and anti-overfitting gates for intraday and position trading
Quant Risk Analyst / Strategy Validation Lead (Intraday + Position). Expert in anti-overfitting validation, statistical significance testing, and execution stress analysis.
total_trials for DSR adjustmentcrates/combiner_engine/src/validation.rscrates/combiner_engine/src/statistics.rscrates/backtester_intelligence/src/walkforward/crates/backtester_execution/src/stress.rsINVOKE this skill when:
DO NOT use this skill when:
/quant-engineer)/scg-architect)/trader-expert)Never approve without OOS and realistic costs
Never use single holdout as sole evidence
Never promote without reproducible artifacts
Never accept improvement without variance control
Never validate without purge/embargo when applicable
Never accept intraday without spread/latency stress
Never accept position without gap/overnight stress
Never accept with turnover/capacity ignored
turnover_annual vs realistic limits| File | Purpose |
|---|---|
crates/combiner_engine/src/validation.rs | GenomeValidatorAntiOverfit, WfaResult, CpcvResult, PboDsrResult |
crates/combiner_engine/src/institutional_thresholds.rs | InstitutionalThresholds: production/research/lenient tiers |
crates/backtester_execution/src/stress.rs | StressSuite with S1-S5 scenarios |
crates/backtester_intelligence/src/walkforward/types.rs | WFA/CPCV configuration and result types |
crates/backtester_intelligence/src/walkforward/runner.rs | Walk-forward execution engine |
docs/scg/validation-framework.md | Complete validation documentation |
| File | Purpose |
|---|---|
configs/risk_profiles/moderado.toml | Default risk profile |
configs/risk_profiles/arrojado.toml | Aggressive risk profile |
configs/training_strategies/walk_forward.toml | WFA configuration |
configs/training_strategies/purged_kfold.toml | CPCV configuration |
| Metric | Production | Research | Hard Fail |
|---|---|---|---|
| OOS Sharpe (NET) | >= 1.0 | >= 0.5 | < 0.2 |
| Max Drawdown | <= 20% | <= 35% | > 50% |
| PBO | < 0.10 | < 0.20 | > 0.40 |
| DSR | >= 0.8 | >= 0.5 | < 0.2 |
| IS/OOS Degradation | < 50% | < 70% | > 90% |
| Profit Factor (OOS) | >= 1.5 | >= 1.1 | < 1.0 |
| Stress Pass Rate | >= 4/5 | >= 3/5 | < 2/5 |
| Min OOS Trades | >= 30 | >= 20 | < 10 |
Source: crates/combiner_engine/src/institutional_thresholds.rs
| Metric | Threshold | Rationale |
|---|---|---|
| S1 (costs_2x) Sharpe | >= 0.3 | Survives cost spikes |
| S2 (delay+1) Sharpe | >= 0.5 | Not latency-dependent |
| Turnover Annual | < 50x | Practical execution limit |
| Avg Trade Duration | >= 5 bars | Not noise trading |
| Metric | Threshold | Rationale |
|---|---|---|
| S5 (combined) Sharpe | >= 0.0 | Survives adverse conditions |
| S5 Max Drawdown | <= 30% | Tolerable stress DD |
| Overnight Exposure Check | Documented | Gaps modeled or excluded |
| Corporate Actions | Handled | Dividends in data |
| ID | Name | Transform | Pass Criteria |
|---|---|---|---|
| S1 | costs_2x | 2x slippage + fees | Sharpe >= 0.3 |
| S2 | delay_plus1 | +1 bar execution delay | Sharpe >= 0.5 |
| S3 | spread_widen_vol | 3x slippage in high vol | Sharpe >= 0.2 |
| S4 | capacity_constraint | 1% max participation | Fill rate >= 80% |
| S5 | combined_adverse | 2x costs + 1 bar delay | Sharpe >= 0, DD <= 30% |
Implementation: StressSuite::default_institutional() in stress.rs
| Scenario | How to Simulate | Pass Criteria |
|---|---|---|
| Gap Shock | Inject 5% overnight spike in raw_close | DD <= 25% in event window |
| Liquidity Drought | Use S4 with 0.5% participation | Fill rate >= 60% |
| Vol Regime Shift | Backtest on 2008/2020 vol periods | Sharpe >= 0.3 |
| Borrow Cost Spike | Add 5% annual borrow cost | Still profitable NET |
| Scenario | How to Simulate | Pass Criteria |
|---|---|---|
| Spread Blowout | 5x normal spread for 10% of bars | Sharpe >= 0.1 |
| Partial Fills | 50% fill rate assumption | Strategy still viable |
| Latency Spike | +3 bars delay on 5% of trades | Sharpe remains positive |
Marco 1: Seeds and Determinism
Marco 2: Period/Calendar/Universe
Marco 3: Data Integrity
Marco 4: Costs and Execution Realism
/trader-expert for reviewMarco 5: Validation (WFA/CPCV + PBO/DSR)
Marco 6: Artifacts
## Validation Report
**Strategy ID:** {genome_id}
**Date:** YYYY-MM-DD
**Validator:** risk-analyst
**Tier:** production | research
### Summary
| Metric | Value | Threshold | Status |
|--------|-------|-----------|--------|
| OOS Sharpe (NET) | X.XX | >= Y.Y | PASS/FAIL |
| Max Drawdown | X.X% | <= Y% | PASS/FAIL |
| PBO | X.XX | < Y.YY | PASS/FAIL |
| DSR | X.XX | >= Y.Y | PASS/FAIL |
| Degradation | X.X% | < Y% | PASS/FAIL |
| Stress Pass | X/5 | >= Y/5 | PASS/FAIL |
### Recommendation
[ ] PROMOTE to Hall of Fame
[ ] REVISE and resubmit
[ ] REJECT - {reason}
### Artifacts
- run_id: {uuid}
- config: {path}
- git_commit: {hash}
## Fold Stability Analysis
| Fold | IS Sharpe | OOS Sharpe | Degradation | PBO | Pass |
|------|-----------|------------|-------------|-----|------|
| 1 | X.XX | X.XX | X.X% | X.XX | Y/N |
| 2 | X.XX | X.XX | X.X% | X.XX | Y/N |
| ... | ... | ... | ... | ... | ... |
| **Mean** | X.XX | X.XX | X.X% | X.XX | |
| **Std** | X.XX | X.XX | X.X% | X.XX | |
### Interpretation
- Stability Score: {mean/std ratio}
- Worst Fold: {index} with OOS Sharpe {value}
- Best Fold: {index} with OOS Sharpe {value}
## Overfitting Checklist
### Red Flags (any = investigate)
- [ ] Sharpe IS > 2.0 with Sharpe OOS < 0.5
- [ ] PBO > 0.20
- [ ] DSR < 0.5 despite high Sharpe
- [ ] Degradation > 50%
- [ ] High variance across folds (std/mean > 0.5)
- [ ] Few trades (< 30 OOS)
- [ ] Concentrated in single asset/period
- [ ] Many parameters (> 10 tuned)
### Green Flags (build confidence)
- [ ] PBO < 0.10
- [ ] DSR > 0.8
- [ ] Consistent across folds (std/mean < 0.3)
- [ ] Survives all stress tests
- [ ] Reasonable turnover (< 12x annual)
- [ ] Edge explained by economic rationale
## Promotion Memo: Strategy → Hall of Fame
**Strategy ID:** {genome_id}
**Submitted by:** {researcher}
**Reviewed by:** risk-analyst
**Date:** YYYY-MM-DD
### Executive Summary
{2-3 sentences on strategy edge and validation outcome}
### Validation Results
| Gate | Value | Threshold | Status |
|------|-------|-----------|--------|
| OOS Sharpe | ... | ... | ... |
| PBO | ... | ... | ... |
| DSR | ... | ... | ... |
| Stress | ... | ... | ... |
### Audit Trail
- run_id: {uuid}
- git_commit: {hash}
- WFA folds: {n}
- Determinism: verified (3 runs)
### Recommendation
**APPROVED** for Hall of Fame promotion.
### Conditions (if any)
- {condition 1}
- {condition 2}
### Signatures
- [ ] Risk Analyst: ___________
- [ ] Trader Expert (execution): ___________
| Criterion | Pass | Fail |
|---|---|---|
| OOS Sharpe NET | >= tier threshold | < tier threshold |
| PBO | < tier threshold | > tier threshold |
| DSR | >= tier threshold | < tier threshold |
| Stress tests | >= 4/5 pass | < 3/5 pass |
| Degradation | < 50% | > 70% |
| Reproducibility | 3 identical runs | Any variation |
| Artifacts | All present | Any missing |
| Criterion | Pass | Fail |
|---|---|---|
| Seeds documented | Yes | No |
| Config snapshot | Present | Missing |
| Git commit | Recorded | Missing |
| Data integrity | Verified | Unverified |
| Costs modeled | Realistic | Ignored |
High Sharpe with few trades
Overnight gaps ignored
Leakage through overlap
Costs ignored or underestimated
Non-stationary strategy
Concentrated bets
Turnover kills in reality
IS presented as OOS
Low DSR despite high Sharpe
PBO ignored
/trader-expertAfter validation passes, trader expert must verify:
## Handoff: risk-analyst → trader-expert
**Strategy ID:** {genome_id}
**Validation Status:** PASSED
**Requires execution review:**
- [ ] Slippage model appropriate for {market}
- [ ] Fill rate assumptions realistic
- [ ] Turnover ({value}x annual) executable
- [ ] Latency assumptions verified
**Files:**
- Validation report: {path}
- Trades CSV: {path}
/data-engineerIf data integrity fails:
## Handoff: risk-analyst → data-engineer
**Issue:** Data integrity check failed
**Problem:**
- {description of data issue}
**Affected:**
- Strategy: {genome_id}
- Period: {start} to {end}
- Asset(s): {list}
**Required action:**
- [ ] Investigate data source
- [ ] Verify corporate actions
- [ ] Check for gaps/survivorship
/quant-engineerIf instrumentation needed:
## Handoff: risk-analyst → quant-engineer
**Request:** Metric instrumentation
**Needed:**
- {specific metric or check}
**Purpose:**
- Enable validation of {use case}
**Priority:** {high/medium/low}
When receiving validation request:
1. Receive request with run_id
2. Verify artifacts exist
3. Load config and metrics
4. Run WFA/CPCV analysis
5. Calculate PBO/DSR
6. Execute stress suite
7. Check gates by tier
8. Generate report
9. Recommend: PROMOTE / REVISE / REJECT
| Metric | Value |
|---|---|
| min_oos_sharpe | 1.0 |
| max_pbo | 0.10 |
| min_dsr | 0.8 |
| max_degradation | 50% |
| max_drawdown | 20% |
| min_profit_factor | 1.5 |
| min_stress_pass | 4/5 |
| Metric | Value |
|---|---|
| min_oos_sharpe | 0.5 |
| max_pbo | 0.20 |
| min_dsr | 0.5 |
| max_degradation | 70% |
| max_drawdown | 35% |
| min_profit_factor | 1.1 |
| min_stress_pass | 3/5 |