Operational playbook for the DispatchPulse Meta PyTorch OpenEnv Hackathon submission. Load this when working in /Users/arunsanjay/Documents/Projects/DispatchPulse or when Arun mentions the hackathon, DispatchPulse, Round 1/Round 2, HF Space Arun-Sanjay/dispatchpulse, or the GitHub repo Arun-Sanjay/DispatchPulse. Covers how to test, debug, deploy, and respond to Phase 1/Phase 2 validator failures without breaking the existing passing submission.
Scope: This skill is the how-to companion to CLAUDE.md. CLAUDE.md tells you WHAT the project is. This skill tells you WHAT TO DO — step-by-step playbooks for every common scenario.
First action in any session: read CLAUDE.md at the project root. Then find your scenario in section 2 below and follow it. Do not improvise — every command here is tested.
cd /Users/arunsanjay/Documents/Projects/DispatchPulse
# Sanity check: working tree clean, on main, venv exists
git status
git log --oneline | head -5
git remote -v
ls .venv/bin/python
Expected state:
On branch main, nothing to commit, working tree clean82ce364 Fix Phase 2: add GET /tasks and POST /grader endpoints (or whatever the user has pushed since)origin (HF Space) and (GitHub)github.venv/bin/python exists and is Python 3.11If any of those is wrong, stop and ask Arun what state he's expecting.
Quick smoke test that nothing is broken:
.venv/bin/python tests/test_reward.py && .venv/bin/python tests/test_simulation.py
# Expected: "All reward tests passed!" and "All simulation tests passed!"
If tests don't pass, do NOT start work. Ask Arun what changed.
Actions:
First step: get the exact failure reason from Arun. The Scaler email names one of these 5 checks:
Do NOT speculatively fix multiple checks at once. Fix exactly the one that failed, verify locally, push, resubmit. If a second check fails after the first is fixed, handle it then.
Most likely cause: the grader's docker build . ran out of time, couldn't pull the base image, or hit a pip install error with the git-installed openenv-core.
Debug checklist:
# 1. Does the Dockerfile still exist at the repo root?
ls -la Dockerfile
# 2. Is the base image pullable from GHCR?
# (only do this if Arun has Docker — skip otherwise)
docker pull ghcr.io/meta-pytorch/openenv-base:latest 2>&1 | tail -5
# 3. Is pyproject.toml's git-based openenv-core dep still working?
# Check by letting uv re-resolve:
.venv/bin/uv lock --upgrade-package openenv-core 2>&1 | tail -20
Possible fixes (in order of safety):
pyproject.toml if the grader can't reach @v0.2.3. Look up the latest stable tag on GitHub first..dockerignore to exclude .venv/, __pycache__/, .git/, tests/ from the Docker context — faster builds, smaller context.RUN uv sync step to use --frozen to force uv to use uv.lock exactly without re-resolving.Do NOT change the base image. Do NOT switch to a plain python:3.11-slim base — the grader expects openenv-base.
Most likely causes:
from_docker_image() call hangs or errors out[END] emissionDebug checklist:
# 1. Does inference.py run locally in the in-process fallback?
DISPATCHPULSE_TASK=easy .venv/bin/python inference.py 2>&1 | head -20
# Must print [START], [STEP]s, [END] with valid format
# 2. Does it handle missing HF_TOKEN gracefully?
unset HF_TOKEN API_KEY
DISPATCHPULSE_TASK=easy .venv/bin/python inference.py 2>&1 | grep -E '^\[(START|STEP|END)\]' | tail -5
# 3. Does it still emit [END] on exception?
DISPATCHPULSE_TASK=easy MODEL_NAME="definitely-not-a-real-model" .venv/bin/python inference.py 2>&1 | grep -E '^\[END\]'
Possible fixes:
env.step() call using asyncio.wait_for(..., timeout=30)[END] line always fires (the current code already does this via finally — verify it's still intact)MAX_STEPS from 60 to 40 to ensure the script always finishes in under 20 min even with slow LLMsfrom_docker_image() specifically: look at the openenv-core LocalDockerProvider source in .venv/lib/python3.11/site-packages/openenv/core/containers/runtime/providers/ and mimic what a passing submission doesThe grader couldn't parse our stdout. Check every byte:
# Capture a real run and inspect line by line
DISPATCHPULSE_TASK=easy .venv/bin/python inference.py > /tmp/out.log 2>&1
grep -E '^\[(START|STEP|END)\]' /tmp/out.log | cat -A # -A shows hidden chars
Common format bugs:
[STEP] line (shouldn't matter but sometimes graders are strict)True/False instead of true/falsereward=0.5 instead of reward=0.50 (missing 2nd decimal)score=0.5 instead of score=0.500 (missing 3rd decimal)[END] on exception pathsCheck inference.py log_start / log_step / log_end functions. Keep them simple f-strings.
Grader couldn't find 3 graded tasks. Our submission exposes them THREE ways:
GET /tasks HTTP endpoint (in server/app.py)POST /grader HTTP endpoint (in server/app.py)tasks: list in openenv.yaml manifestDebug checklist:
# 1. Does /tasks return 3 tasks?
curl -sf https://arun-sanjay-dispatchpulse.hf.space/tasks | python3 -m json.tool | grep has_grader
# 2. Does /grader work for each task?
for t in easy medium hard; do
curl -sf -X POST https://arun-sanjay-dispatchpulse.hf.space/grader \
-H "Content-Type: application/json" -d "{\"task_id\":\"$t\",\"seed\":42}" \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(f'{d[\"task_id\"]}: score={d[\"score\"]:.3f} passed={d[\"passed\"]}')"
done
# 3. Does openenv.yaml declare 3 tasks with has_grader: true?
grep -c "has_grader" openenv.yaml # should output 3
Fallback fix: create task_definitions.py at project root with a Python-level TASKS dict that mirrors the Calendar Scheduling passing submission pattern:
# task_definitions.py
from dataclasses import dataclass
@dataclass(frozen=True)
class TaskDefinition:
task_id: str
name: str
difficulty: str # "easy" | "medium" | "hard"
description: str
max_steps: int
TASKS = {
"easy": TaskDefinition(
task_id="easy", name="easy", difficulty="easy",
description="...", max_steps=30,
),
"medium": TaskDefinition(
task_id="medium", name="medium", difficulty="medium",
description="...", max_steps=45,
),
"hard": TaskDefinition(
task_id="hard", name="hard", difficulty="hard",
description="...", max_steps=60,
),
}
Then import it in server/app.py so static analyzers can trace the symbol.
This is the vaguest failure mode. It means either:
inference.py::parse_action_text)Fix priority:
"Action: ..." prefixes, tolerates trailing periods, accepts "dispatch(CALL-001, ALS-1)" function-call syntax, and regex-matches common variants. Here's a tested version:import re
def parse_action_text(text: str) -> DispatchPulseAction:
"""Lenient parser: tolerates markdown, prefixes, function call syntax."""
text = (text or "").strip()
# Strip markdown code fences
text = re.sub(r"^```\w*\n?", "", text)
text = re.sub(r"\n?```$", "", text)
text = text.strip()
# Take first non-empty line
for line in text.splitlines():
line = line.strip().strip("`").strip()
if line:
text = line
break
# Strip common prefixes
for prefix in ("Action:", "action:", "ACTION:", "Response:", "> "):
if text.startswith(prefix):
text = text[len(prefix):].strip()
# Strip trailing period / quotes
text = text.rstrip(".\"' ")
# Try function-call syntax: dispatch(CALL-001, ALS-1)
match = re.match(r"(\w+)\s*\((.*)\)$", text)
if match:
fn = match.group(1).lower()
args = [a.strip().strip("'\"") for a in match.group(2).split(",")]
text = f"{fn} " + " ".join(args)
# Now use the existing space-separated parser
parts = text.split(maxsplit=4)
# ... rest of existing logic
temperature to 0.0 (already done in current inference.py — verify).Round 1 is submitted and frozen. For Round 2 prep:
git checkout -b round2-experiments
# Make experimental changes here
# Never merge back to main unless we know Round 1 is done with
Tell Arun he should not touch main until the Round 2 finale is over.
If Round 1 is passing and Arun asks for upgrades, the rules:
inference.py, models.py, server/, simulation.py, reward.py, grader.pyFor risky changes: make them on a branch, run the full test suite, run the validator, run inference.py with the in-process fallback, AND open a new HF Space with a different name to test the Docker build in isolation. Only merge to main if Arun explicitly says so.
.venv/bin/python tests/test_reward.py
# Expected: "All reward tests passed!"
.venv/bin/python tests/test_simulation.py
# Expected: "All simulation tests passed!"
These are NOT pytest — they're module-level __main__ scripts. Do not run pytest on them.
# Start server in background
ENABLE_WEB_INTERFACE=true .venv/bin/uvicorn server.app:app --host 127.0.0.1 --port 8765 > /tmp/dp.log 2>&1 &
SERVER_PID=$!
sleep 4
# Exercise every endpoint
curl -sf http://127.0.0.1:8765/health # {"status":"healthy"}
curl -sf http://127.0.0.1:8765/tasks | python3 -m json.tool
curl -sf http://127.0.0.1:8765/tasks/easy | python3 -m json.tool
curl -sf -X POST http://127.0.0.1:8765/reset \
-H "Content-Type: application/json" \
-d '{"task_name":"easy","seed":42}' | python3 -m json.tool | head -20
curl -sf -X POST http://127.0.0.1:8765/step \
-H "Content-Type: application/json" \
-d '{"action":{"action_type":"wait","minutes":2,"text":"wait 2","metadata":{}}}' | python3 -m json.tool | head -20
curl -sf -X POST http://127.0.0.1:8765/grader \
-H "Content-Type: application/json" \
-d '{"task_id":"easy","seed":42}' | python3 -m json.tool
# Cleanup
kill $SERVER_PID 2>/dev/null
BASE=https://arun-sanjay-dispatchpulse.hf.space
curl -sf $BASE/health
curl -sf $BASE/tasks | python3 -m json.tool | head -30
curl -sf -X POST $BASE/reset -H "Content-Type: application/json" -d '{"task_name":"easy","seed":42}' | head -c 300
curl -sf -X POST $BASE/grader -H "Content-Type: application/json" -d '{"task_id":"easy","seed":42}' | python3 -m json.tool
./scripts/validate-submission.sh https://arun-sanjay-dispatchpulse.hf.space .
All 3 checks must pass (or skip cleanly with WARN for docker/openenv if those CLIs aren't installed locally).
DISPATCHPULSE_TASK=easy .venv/bin/python inference.py 2>&1 | grep -E '^\[(START|STEP|END)\]'
Must produce exactly one [START] line, one or more [STEP] lines, and exactly one [END] line — all format-compliant.
curl -sf https://huggingface.co/api/spaces/Arun-Sanjay/dispatchpulse \
| python3 -c "import sys,json; d=json.load(sys.stdin); rt=d.get('runtime',{}); print('stage:', rt.get('stage'), 'sha:', rt.get('sha')[:7], 'lastModified:', d.get('lastModified'))"
stage: RUNNING is what you want. RUNNING_APP_STARTING means a rebuild is in progress.
Small, logical chunks. Example:
git add <specific files>
git commit -m "<imperative verb> <thing>: <why>"
.venv/bin/python tests/test_reward.py && .venv/bin/python tests/test_simulation.py
./scripts/validate-submission.sh https://arun-sanjay-dispatchpulse.hf.space .
Both must pass before handing off push commands.
Never run these yourself. Give him the exact commands, with PASTE_HF_TOKEN_HERE and PASTE_GH_TOKEN_HERE placeholders:
# Push to HF Space (triggers rebuild)
git push https://Arun-Sanjay:[email protected]/spaces/Arun-Sanjay/dispatchpulse main
# Push to GitHub
git push https://Arun-Sanjay:[email protected]/Arun-Sanjay/DispatchPulse.git main
Remind him where to create fresh tokens:
Remind him to revoke both tokens immediately after the push. Tokens in command lines are in shell history and should be treated as burned.
If Arun accidentally pastes a token in chat: stop, warn him, tell him to invalidate it, refuse to use it.
After Arun confirms the pushes landed:
# Poll the HF Space until stage=RUNNING with the new commit sha
for i in 1 2 3 4 5; do
sleep 45
curl -sf https://huggingface.co/api/spaces/Arun-Sanjay/dispatchpulse \
| python3 -c "import sys,json; d=json.load(sys.stdin); rt=d.get('runtime',{}); print(f'poll {$i}: stage={rt.get(\"stage\")} sha={rt.get(\"sha\",\"\")[:7]}')"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://arun-sanjay-dispatchpulse.hf.space/tasks)
if [ "$STATUS" = "200" ]; then
echo "LIVE"; break
fi
done
Give him the two URLs to paste:
https://github.com/Arun-Sanjay/DispatchPulsehttps://huggingface.co/spaces/Arun-Sanjay/dispatchpulseuv.lock, inference.py, openenv.yaml, or Dockerfile. All are load-bearing.tasks/*.yaml. The scenarios are locked in for this submission.__main__-style scripts.git push --force unless explicitly justified (the GitHub repo was force-pushed once to overwrite an auto-generated README — it should never happen again).create_app vs create_fastapi_app. create_app is the right choice — it serves the Gradio UI at / when ENABLE_WEB_INTERFACE=true. create_fastapi_app is API-only and causes the "details not found" 404 at the root URL.Environment base class + EnvClient base class. See commit 64d56f9.uv.lock. Phase 1 validator fails with "Missing uv.lock - run 'uv lock' to generate it". Always run .venv/bin/python -m pip install uv && .venv/bin/uv lock if uv.lock is missing.ENABLE_WEB_INTERFACE=true env var when testing locally. Without it, create_app drops the Gradio routes.pytest. Don't. The tests use if __name__ == "__main__": blocks and run as plain Python scripts.If Arun has just opened a new chat and wants you to pick up where the previous one left off, tell him to paste this into the new chat:
I'm working on DispatchPulse at
/Users/arunsanjay/Documents/Projects/DispatchPulsefor the Meta PyTorch OpenEnv Hackathon India 2026. Round 1 submission is already in flight.Please read these two files before doing anything:
CLAUDE.mdat the project root — full project context.claude/skills/dispatchpulse/SKILL.md— operational playbookThen check the current state with
git log --oneline | head -5,git status, andcurl -sf https://arun-sanjay-dispatchpulse.hf.space/health.Here's the current situation: [briefly describe — e.g. "Phase 2 email came back with X failure" or "Phase 2 passed, I want to prep Round 2" or "I want to add LLM baseline scores to README"]
What do you recommend as the next step?
That prompt plus the two files will get a new Claude session fully up to speed in under 30 seconds of reading.
End of SKILL.md. Load this alongside CLAUDE.md at the start of every session working on DispatchPulse.