Systematically compare theoretical predictions with experimental or observational data
Codex shell compatibility:
gpd on PATH.GPD_ACTIVE_RUNTIME=codex uv run gpd ....
</codex_runtime_notes><codex_questioning>
Why a dedicated command: Theory-experiment comparison is not just plotting two curves on the same axes. It requires rigorous treatment of units, uncertainties, systematic effects, and statistical significance. A "good agreement" by eye may be a 3-sigma discrepancy when uncertainties are properly accounted for. Conversely, a visually poor fit may be statistically acceptable when systematic uncertainties are included.
<context> Comparison target: $ARGUMENTS </context> <!-- [included: compare-experiment.md] --> <purpose> Systematically compare theoretical predictions with experimental or observational data. Handles unit conversion, uncertainty propagation, statistical testing, and discrepancy analysis. <process>The principle: Agreement between theory and experiment must be quantified. "Looks about right" is not physics. The comparison must state: (1) what was predicted, (2) what was measured, (3) what the uncertainties are on both sides, (4) whether the agreement is statistically significant, and (5) if not, what the discrepancy tells us.
For contract-backed work, the comparison must also state which decisive output or contract target is being tested and emit an explicit verdict ledger keyed by subject_id / reference_id, not just a prose comparison.
</objective>
Interpretation:
Load theoretical predictions:
cat .gpd/research-map/ARCHITECTURE.md 2>/dev/null | grep -A 20 "Predictions"
find artifacts/ results/ data/ figures/ simulations/ paper/ -maxdepth 4 \
\( -name "*.json" -o -name "*.csv" -o -name "*.dat" -o -name "*.h5" \) 2>/dev/null | \
grep -i "result\|predict\|spectrum\|observable" | head -20
Treat .gpd/** as internal provenance only. Discover predictions and reusable comparison inputs from stable workspace directories such as artifacts/, results/, data/, figures/, simulations/, or paper/.
<execution_context>
Called from $gpd-compare-experiment command. Produces COMPARISON.md with quantified agreement metrics.
Agreement between theory and experiment must be quantified. "Looks about right" is not physics. The comparison must state: (1) what decisive output or contract target was predicted, (2) what was measured, (3) what the uncertainties are on both sides, (4) whether the agreement is statistically significant, and (5) if not, what the discrepancy tells us. </purpose>
<required_reading> Read these files using the read_file tool:
Load project state and conventions to ensure correct unit systems and sign conventions:
INIT=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local init progress --include state)
if [ $? -ne 0 ]; then
echo "ERROR: gpd initialization failed: $INIT"
# STOP — display the error to the user and do not proceed.
fi
commit_docs, state_exists, project_exists, current_phasestate_exists is true: Read .gpd/state.json to extract convention_lock for unit system, metric signature, and Fourier conventions. Extract active approximations and their validity ranges from state. Load intermediate_results from state for any previously computed quantities.state_exists is false (standalone usage): Proceed with explicit convention declarations required from user via ask_user (unit system, sign conventions, normalization)Convention context is critical for theory-experiment comparison: unit mismatches and convention mismatches are the two most common sources of discrepancy.
Convention verification (if project exists):
CONV_CHECK=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local --raw convention check 2>/dev/null)
if [ $? -ne 0 ]; then
echo "WARNING: Convention verification failed — unit mismatches between theory and experiment are the #1 source of false discrepancies"
echo "$CONV_CHECK"
fi
If the project is contract-backed, first resolve the comparison target against the approved contract:
subject_idsubject_kind (claim, deliverable, acceptance_test, or artifact)subject_role (decisive, supporting, supplemental)reference_id for the benchmark / prior-work / data anchorDo not write a generic comparison report without this mapping when a decisive comparison target exists.
For each prediction, establish:
For each measurement, establish:
Ask the user once using a single compact prompt block if data source is ambiguous:
This step catches the most common theory-experiment comparison errors.
Ensure theory and experiment use the same units:
# Common unit conversion pitfalls in physics:
# - eV vs Kelvin: 1 eV = 11604.5 K
# - eV vs cm^{-1}: 1 eV = 8065.54 cm^{-1}
# - Gaussian vs SI electromagnetism: factor of 4*pi*epsilon_0
# - Natural units to SI: restore factors of hbar, c, k_B
# - Energy vs frequency: E = hbar * omega (not h * nu !)
# - Cross-section: barn (1e-24 cm^2) vs fm^2 (1e-26 cm^2) vs GeV^{-2}
# - Decay width vs lifetime: Gamma = hbar / tau
Document every conversion applied.
Check that theory and experiment use the same conventions:
| Convention | Theory | Experiment | Conversion needed? |
|---|---|---|---|
| Normalization | {convention} | {convention} | {yes/no} |
| Phase convention | {convention} | {convention} | {yes/no} |
| Fourier transform | {convention} | {convention} | {yes/no} |
| Cross-section definition | {convention} | {convention} | {yes/no} |
Experimental data often includes detector acceptance and efficiency:
import numpy as np
# For each data point:
for i in range(len(data)):
x_exp = data[i]['x']
y_exp = data[i]['y']
dy_exp = data[i]['uncertainty'] # total: stat + syst combined
y_theory = theory_prediction(x_exp)
dy_theory = theory_uncertainty(x_exp)
# Pull: (theory - experiment) / combined_uncertainty
dy_combined = np.sqrt(dy_exp**2 + dy_theory**2)
pull = (y_theory - y_exp) / dy_combined
print(f"x={x_exp:.4f} exp={y_exp:.6f}+/-{dy_exp:.6f} "
f"theory={y_theory:.6f}+/-{dy_theory:.6f} pull={pull:.2f}")
# Chi-squared test
chi2 = 0
for i in range(len(data)):
dy_combined = np.sqrt(data[i]['uncertainty']**2 + theory_uncertainty(data[i]['x'])**2)
chi2 += ((theory_prediction(data[i]['x']) - data[i]['y']) / dy_combined)**2
ndof = len(data) - n_free_params
chi2_reduced = chi2 / ndof
p_value = 1 - scipy.stats.chi2.cdf(chi2, ndof)
print(f"chi2/ndof = {chi2:.1f}/{ndof} = {chi2_reduced:.2f}")
print(f"p-value = {p_value:.4f}")
| chi2/ndof | p-value | Interpretation |
|---|---|---|
| ~ 1 | > 0.05 | Good agreement |
| >> 1 | < 0.01 | Significant discrepancy |
| << 1 | > 0.99 | Overfitting or overestimated uncertainties |
# Check for systematic patterns in residuals:
# - Trending residuals suggest missing physics (wrong functional form)
# - Oscillating residuals suggest discretization artifacts
# - Constant offset suggests normalization error
# - Growing residuals suggest wrong scaling / missing terms
If the comparison shows significant disagreement:
| Type | Signature | Likely Cause |
|---|---|---|
| Constant offset | All pulls have same sign | Normalization error, missing constant term, unit conversion error |
| Constant factor | Ratio theory/experiment is constant | Missing factor of 2, pi, or convention mismatch |
| Wrong slope | Discrepancy grows with parameter | Wrong power law, missing logarithmic corrections |
| Wrong curvature | Discrepancy is quadratic | Missing next-order correction |
| Localized | Discrepancy only in one region | Approximation breakdown, phase transition, resonance |
| Oscillatory | Periodic discrepancy pattern | Interference effect, aliasing, finite-size oscillation |
| Statistical scatter | Random, no pattern | Underestimated uncertainties |
# Number of sigma tension:
tension_sigma = abs(theory_central - exp_central) / np.sqrt(theory_unc**2 + exp_unc**2)
# Interpretation:
# < 1 sigma: consistent
# 1-2 sigma: mild tension, not significant
# 2-3 sigma: interesting tension, warrants investigation
# 3-5 sigma: significant discrepancy
# > 5 sigma: discovery-level discrepancy
Write COMPARISON.md:
---