Deploy agent-services to the infra VM. Use when shipping new code, restarting services, or rolling back a broken deploy. Covers pre-deploy snapshot, pull/build/restart, smoke tests, and rollback.
Step-by-step procedure for deploying agent-services to the infra VM. Follow this exactly — skipping steps has caused real incidents (unauthenticated servers, unrecoverable state).
You need:
VERS_AUTH_TOKEN value (check with the orchestrator — if lost, all agents need reconfiguration)root@{VM_ID}.vm.vers.sh)Do this FIRST. Every time. No exceptions.
Before touching anything, snapshot the current state so you can roll back:
# From the orchestrator or any agent with vers tools
vers_vm_commit --vmId {INFRA_VM_ID}
# → returns a commit ID, e.g., "commit-abc123"
Record the commit in the ledger (if the service is still running):
curl -X POST "http://{INFRA_VM_ID}.vm.vers.sh:3000/commits" \
-H "Authorization: Bearer $VERS_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"commitId": "commit-abc123",
"vmId": "{INFRA_VM_ID}",
"label": "pre-deploy-backup",
"agent": "your-agent-name",
"tags": ["backup", "infra", "pre-deploy"]
}'
Save the commit ID. You will need it if the deploy goes wrong.
ssh -o StrictHostKeyChecking=no root@{INFRA_VM_ID}.vm.vers.sh
cd /root/workspace/vers-agent-services
git fetch origin
git checkout main
git reset --hard origin/main
Using reset --hard ensures a clean state — no merge conflicts, no stale local changes.
npm install
npm run build
Watch for errors. If npm run build (TypeScript compilation) fails, do not proceed — the deploy will serve stale code from the old dist/.
Kill the old process and start a new one with the required env vars:
# Stop the running server
pkill -f 'node dist/server.js' || true
sleep 1
# Start with required env vars
export VERS_AUTH_TOKEN=<token>
nohup env VERS_AUTH_TOKEN=$VERS_AUTH_TOKEN node dist/server.js > /tmp/agent-services.log 2>&1 &
# Verify it started
sleep 2
cat /tmp/agent-services.log
You should see: vers-agent-services running on :3000
curl -s http://localhost:3000/health
Expected: {"status":"ok","uptime":...}
# Board
curl -s -H "Authorization: Bearer $VERS_AUTH_TOKEN" http://localhost:3000/board/tasks | head -c 200
# Feed
curl -s -H "Authorization: Bearer $VERS_AUTH_TOKEN" http://localhost:3000/feed/events | head -c 200
# Registry
curl -s -H "Authorization: Bearer $VERS_AUTH_TOKEN" http://localhost:3000/registry/vms | head -c 200
# Commits
curl -s -H "Authorization: Bearer $VERS_AUTH_TOKEN" http://localhost:3000/commits | head -c 200
Each should return JSON with the expected structure. If any returns 401, the auth token is correct (middleware is working). If any returns nothing or errors, check the logs:
tail -50 /tmp/agent-services.log
From the orchestrator or another agent, test the external URL:
curl -s -H "Authorization: Bearer $VERS_AUTH_TOKEN" \
http://{INFRA_VM_ID}.vm.vers.sh:3000/health
vers_vm_commit) and record it with a post-deploy tagIf the deploy is broken:
# From the orchestrator (not from the broken VM)
vers_vm_restore --commitId {PRE_DEPLOY_COMMIT_ID}
# → creates a new VM with the old state
# Update DNS/references to point to the new VM ID
This gives you a brand new VM with the exact pre-deploy state. The old broken VM can be deleted.
ssh -o StrictHostKeyChecking=no root@{INFRA_VM_ID}.vm.vers.sh
cd /root/workspace/vers-agent-services
git log --oneline -5 # find the last known good commit
git reset --hard {GOOD_COMMIT_SHA}
npm install && npm run build
pkill -f 'node dist/server.js' || true
sleep 1
nohup env VERS_AUTH_TOKEN=$VERS_AUTH_TOKEN node dist/server.js > /tmp/agent-services.log 2>&1 &
What happens: The server starts but with no auth — all endpoints are open to anyone. You'll see this warning in the logs:
⚠️ VERS_AUTH_TOKEN is not set — all endpoints are unauthenticated.
Fix: Kill the server, set the env var, restart.
What happens: If the deploy breaks, you have no rollback point. You're stuck debugging a broken VM under pressure.
Fix: Always snapshot before deploying. Make it muscle memory. The commit ledger exists precisely for this.
What happens: npm run build fails but you restart anyway. The server runs old compiled code from dist/, which may not match the new source. Subtle bugs ensue.
Fix: Never restart after a failed build. Fix the build error first, or roll back.
What happens: pkill didn't fully kill the old process. The new server fails to bind to port 3000.
Fix:
# Find what's using port 3000
lsof -i :3000
# Force kill
kill -9 $(lsof -t -i :3000)
sleep 1
# Restart
What happens: You push untested code and deploy it. The build passes but runtime errors break endpoints.
Fix: Always run npm test locally (or on a branch VM) before deploying. The infra VM is shared — breaking it affects all agents.
What happens: The server crashes on first write because data/ doesn't exist or has wrong permissions.
Fix: The stores auto-create directories, but if you see file permission errors:
mkdir -p /root/workspace/vers-agent-services/data
chmod 755 /root/workspace/vers-agent-services/data