Implement Hybrid live-review runner, artifact, session-targeting, and mismatch-truth fixes with bridge-first validation discipline.
NOTE: Startup and cleanup are handled by worker-base. This skill defines the work procedure.
Use for features that primarily change:
packages/sdk/tests/word-benchmark/**Do not use for core runtime/hook/verifier execution-contract features unless they are strictly needed to keep live classifications truthful.
Test-Driven Development — invoke before code changes; add or update failing tests first.Systematic Debugging — invoke if runner behavior, receipt classification, or session routing is unclear.bridge-monitoring — invoke when a Hybrid session is available; use bridge-first checks and explicit session/document routing.mission.md, mission AGENTS.md, and .factory/library/user-testing.md plus .factory/library/live-review.md.packages/sdk/tests/word-agent-benchmark-suite.test.ts plus any directly relevant helper tests.state, events, metadata, summary, and artifactspnpm typecheck if production code changed{
"salientSummary": "Hardened Hybrid live-review session routing and reviewer-only classification so the runner fails closed on wrong targets and no longer treats reviewer passes as mutation success. Targeted benchmark and bridge tests passed; live verification was deferred because no Hybrid session was connected.",
"whatWasImplemented": "Updated the live-review runner to keep metadata/state/events tied to the same resolved Hybrid session, tightened reviewer receipt classification, and corrected mismatch logic so reviewer-only success is treated as non-mutation evidence. Added focused tests around session selection, artifact expectations, and classification branches.",
"whatWasLeftUndone": "A real Hybrid live run is still needed to confirm the runner behavior against an actual pane/session on 4018.",
"verification": {
"commandsRun": [
{
"command": "pnpm --filter @office-agents/sdk exec vitest run tests/word-agent-benchmark-suite.test.ts",
"exitCode": 0,
"observation": "Live-review contract tests passed."
},
{
"command": "pnpm --filter @office-agents/bridge exec vitest run tests/session-selection.test.ts tests/cli-commands.test.ts",
"exitCode": 0,
"observation": "Bridge selection and CLI tests passed."
},
{
"command": "pnpm typecheck",
"exitCode": 0,
"observation": "Typecheck passed for the touched code."
}
],
"interactiveChecks": [
{
"action": "Attempted bridge-first Hybrid validation on https://localhost:4018",
"observed": "No connected Hybrid session was available, so live runner verification was deferred."
}
]
},
"tests": {
"added": [
{
"file": "packages/sdk/tests/word-agent-benchmark-suite.test.ts",
"cases": [
{
"name": "reviewer-only success is non-mutation evidence",
"verifies": "Mismatch and issue logic no longer treat reviewer-only pass as live mutation success."
}
]
}
]
},
"discoveredIssues": [
{
"severity": "medium",
"description": "Hybrid session availability remains an external dependency for end-to-end live validation."
}
]
}