Documents the 9 learning feedback loops, SpreadBandit Thompson Sampling, adaptive ensemble, confidence tracking, and baseline tracker. Use when debugging learning behavior, tuning reward attribution, investigating model weight decay, or understanding how fills translate into parameter updates.
Every fill is labeled training data. The system has 5 core learning components that all receive 1:1 reward attribution from each fill event:
FILL EVENT
|
+-> SpreadBandit.update_from_pending(reward) # Spread selection learning
+-> AdaptiveEnsemble.update_performance(ir,brier) # Model weight learning
+-> KappaEstimator.on_own_fill(ts, price) # Fill intensity learning
+-> ReconcileOutcomeTracker.record_fill(oid,edge) # Reconcile action learning
+-> PreFillClassifier.record_outcome(adverse,mag) # Adverse selection learning
| # | Loop | Trigger | Updates | File |
|---|---|---|---|---|
| 1 | Kappa from own fills |
on_own_fill()| Fill intensity (Hawkes) |
estimator/mod.rs |
| 2 | AS markout queue | Fill → pending outcomes | Pending fill list | orchestrator/handlers.rs |
| 3 | AS outcome feedback | Markout resolved | record_outcome() on classifier | orchestrator/handlers.rs |
| 4 | Calibration progress | Periodic | calibration_controller update | orchestrator/handlers.rs |
| 5 | Sigma update | Trade/L2 events | Realized volatility | orchestrator/handlers.rs |
| 6 | Regime update | Trade/L2 events | HMM belief state | orchestrator/handlers.rs |
| 7 | Quote outcome tracking | Fill + 30s expiry | Fill rate bins, edge estimation | learning/quote_outcome.rs |
| 8 | Spread bandit update | Fill event | Context-arm posterior | learning/spread_bandit.rs |
| 9 | Ensemble weight update | Fill event | IR-based model weights | learning/adaptive_ensemble.rs |
See references/feedback-loops.md for detailed loop descriptions.
81 contexts (3 regimes x 3 positions x 3 vols x 3 flows) x 8 arms (multipliers: 0.85-1.40).
factor=0.995, half-life ~138 obs, only when n >= 10max_obs < 3baseline_adjusted_edge_bps (actual edge minus EWMA baseline)Key methods: select_arm(context), update_from_pending(reward), best_arm(context)
Solves survivorship bias by tracking ALL quotes (filled AND unfilled).
P(fill) = alpha / (alpha + beta)E[edge] = P(fill) x E[edge|fill]argmax(expected_edge) via grid searchepnl_at_registrationKey methods: register_quote(), on_fill(), expire_old_quotes(), optimal_spread_bps()
Softmax over Information Ratio with water-filling floor.
w[i] = exp(IR[i] / T) / sum(exp(IR[j] / T))min_weight via iterative water-fillingir_new = 0.995 * ir_old + 0.005 * ir_measuredmin_predictions_for_weight = 20Key methods: update_performance(), compute_weights(), weighted_average(), summary()
EWMA baseline subtraction centers rewards around zero for RL/bandit.
ewma = 0.99 * ewma + 0.01 * rewardcounterfactual = actual - baseline (or actual if not warmed up)min_observations = 10Detects systematic edge miscalibration.
(predicted_edge_bps, realized_edge_bps)should_recalibrate() when |ewma_bias| > 1.5 bpsQUOTE CYCLE:
SpreadBandit.select_arm(context) → pending selection
QuoteOutcomeTracker.register_quote() → pending quote
Quote published with spread_bps
FILL EVENT (handlers.rs):
QuoteOutcomeTracker.on_fill() → resolve pending, update fill rate bins
SpreadBandit.update_from_pending(reward) → update cell posterior
AdaptiveEnsemble.update_performance() → update IR, recompute weights
KappaEstimator.on_own_fill() → update Hawkes intensity
ReconcileOutcomeTracker.record_fill() → update action EV estimates
EXPIRY (30s timeout):
QuoteOutcomeTracker.expire_old_quotes() → mark as Expired, update bins
All components persist via #[serde(default)]:
SpreadBanditCheckpoint: cells with (context_idx, arm_idx, mu_n, kappa_n, alpha, beta, n)QuoteOutcomeCheckpoint: bins with (lo_bps, hi_bps, observed_fills, observed_total)BaselineTracker: (ewma_reward, n_observations)AdaptiveEnsemble: HashMap of ModelPerformance (IR, Brier, n_predictions, weight)| Component | File |
|---|---|
| SpreadBandit | learning/spread_bandit.rs |
| QuoteOutcomeTracker | learning/quote_outcome.rs |
| BaselineTracker | learning/baseline_tracker.rs |
| AdaptiveEnsemble | learning/adaptive_ensemble.rs |
| EdgeBiasTracker | learning/confidence.rs |
| DecisionEngine | learning/decision.rs |
| CompetitorModel | learning/competitor_model.rs |
| Fill integration | orchestrator/handlers.rs |