Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
Babysit a PR persistently until one of these terminal outcomes occurs:
Do not stop merely because a single snapshot returns idle while checks are still pending.
Accept any of the following:
--pr auto)--watch) unless you are intentionally doing a one-shot diagnostic snapshot.--watch).actions list in the JSON response.diagnose_ci_failure is present, inspect failed run logs and classify the failure.process_review_comment is present, inspect surfaced review items and decide whether to address them.[codex] so it is clear the response is automated. If the watcher later surfaces your own reply, treat that self-authored item as already handled and do not reply again.retry_failed_checks is present, rerun failed jobs with --retry-failed-now.retry_failed_checks are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.gh pr view) alongside CI.--watch before pausing to patch/commit/push, relaunch --watch yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).stop_pr_closed appears or a user-help-required blocker is reached. A green + review-clean + mergeable PR is a progress milestone, not a reason to stop the watcher while the PR is still open.--watch process running and then end the turn as if monitoring were complete.python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --once
Use gh commands to inspect failed runs before deciding to rerun.
gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headShagh run view <run-id> --log-failedPrefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
Read .codex/skills/babysit-pr/references/heuristics.md for a concise checklist.
The watcher surfaces review items from:
It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from chatgpt-codex-connector[bot]) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
When you agree with a comment and it is actionable:
codex: address PR review feedback (#<n>).--watch mode, restart --watch immediately after the push in the same turn; do not wait for the user to ask again.If you disagree or the comment is non-actionable/already addressed, reply once directly on the GitHub comment/thread so the reviewer gets an explicit answer, then continue the watcher loop. Prefix any GitHub reply to a code review comment/thread with [codex] so it is clear the response is automated and not from the human user. If the watcher later surfaces your own reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
git push, then re-run the watcher.--watch session to make the fix, restart --watch immediately after the push in the same turn.--watch processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.Commit message defaults:
codex: fix CI failure on PR #<n>codex: address PR review feedback (#<n>)Use this loop in a live Codex session:
--once.actions.retry_failed_checks is present and you are not about to replace the current SHA with a review/CI fix commit.--watch) in the same turn unless a strict stop condition has already been reached.When the user explicitly asks to monitor/watch/babysit a PR, prefer --watch so polling continues autonomously in one command. Use repeated --once snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
If a --watch process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
Keep review polling aggressive and continue monitoring even after CI turns green:
Stop only when one of the following is true:
Keep polling when:
actions contains only idle but checks are still pending.REVIEW_REQUIRED / similar); continue polling at the base cadence and surface any new review comments without asking for confirmation to keep watching.Provide concise progress updates while monitoring and a final summary that includes:
During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
Treat push confirmations, intermediate CI snapshots, ready-to-merge snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
A review-fix commit + push is not a completion event; immediately resume live monitoring (--watch) in the same turn and continue reporting progress updates.
When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: 🚀 CI is all green! 33/33 passed. Still on watch for review approval.
Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
Final PR SHA
CI status summary
Mergeability / conflict status
Fixes pushed
Flaky retry cycles used
Remaining unresolved failures or review comments
.codex/skills/babysit-pr/references/heuristics.md.codex/skills/babysit-pr/references/github-api-notes.md