Step-by-step workflow to identify fallback-driven regressions and replace them with strict source-level validation and fail-closed transitions.
Use this skill to execute the fix (not just policy) when fallback behavior causes invalid outputs to pass.
./scripts/run_integration_tests.sh for integration behavior.dataset/evals/run_evals.py with explicit agent/task scope.PLANNED, COMPLETED, etc.).fallback, default, heuristic string checks).benchmark_plan.md / engineering_plan.md / todo.md / YAML for recovery.