Pre-launch, staged rollout, rollback.
This skill closes the last mile of the SDD pipeline: getting verified code to production safely. The core principle is that every launch should be reversible, observable, and incremental. A launch is not the end of the cycle — it is the beginning of the observability phase.
This skill activates after sdd-verify reports a green Pyramid (L1-L3 pass) and works alongside ci-cd-pipeline, which handles deployment mechanics. shipping-and-launch handles strategy: when to advance, when to hold, when to roll back, and what to monitor in the first hour.
Run this checklist in full before cutting any production release. Each category must be green before advancing to the next deployment stage.
Code Quality
except clausesSecurity
Performance
Infrastructure
Documentation
Deploy code behind a flag so you can enable the feature independently from deploying the code. Think of it as separating the act of "putting the code in production" from the act of "turning the feature on." This gives you a kill switch with zero downtime.
ROLLOUT SEQUENCE
────────────────
1. Deploy with flag OFF (code is live but feature is invisible)
2. Enable for internal team (validate in real production environment)
3. Enable for 5% of users — canary (watch error rates for 30 min)
4. Advance to 25% → monitor for 2 hours
5. Advance to 50% → monitor for 4 hours
6. Advance to 100% → remove flag from code in next sprint
Flag ownership rules:
Before advancing to the next percentage stage, measure against these thresholds. Do not proceed on schedule — proceed on signal.
| Signal | Threshold | Action |
|---|---|---|
| Error rate | Within 20% of pre-launch baseline | Advance |
| Error rate | 20-100% above baseline | HOLD — investigate before advancing |
| Error rate | More than 2x baseline | ROLLBACK immediately |
| P95 latency | Within 20% of baseline | Advance |
| P95 latency | More than 50% above baseline | ROLLBACK immediately |
| Key conversion metric | Within 10% of baseline | Advance |
| Key conversion metric | More than 20% drop | HOLD — investigate |
HOLD means: freeze the rollout at current percentage, open an incident, investigate root cause. Do not roll back unless thresholds cross into the ROLLBACK zone — a hold preserves the 5% canary while you gather signal.
A rollback plan written during an incident is a plan written under panic. Write it before you deploy. The plan must answer four questions:
ROLLBACK PLAN TEMPLATE
──────────────────────
Trigger conditions:
- Error rate exceeds [X]% for [Y] minutes
- Latency P95 exceeds [Z]ms sustained for [W] minutes
- [Business metric] drops below [threshold]
Rollback steps:
1. Disable feature flag OR redeploy previous image tag [specify which]
2. [Any manual steps, e.g., flush a cache, revert a config value]
3. Confirm health endpoint returns 200 post-rollback
4. Notify team in #incidents channel
Database considerations:
- Are the migrations applied in this release reversible? [YES / NO]
- If NO: what is the data mitigation plan? [describe]
- If additive columns: rollback is safe — old code ignores new columns
- If destructive migrations: rollback requires a separate forward migration
Estimated rollback time: [X] minutes
Who executes: [role or person on-call]
If the migration is NOT reversible, that must be discovered BEFORE the deployment window — not during an incident.
Deploy, then stay present. An unattended deployment is a deployment that fails silently.
FIRST-HOUR CHECKLIST
─────────────────────
T+0 min Health endpoint returns 200 — confirm the process is up
T+2 min Error dashboard shows no spike since deploy
T+5 min Latency dashboards within baseline
T+10 min Manually walk the critical user flow (sign in → main action → confirm)
T+15 min Confirm structured logs are flowing and contain expected fields
T+20 min Confirm rollback mechanism is accessible (flag UI, deploy rollback command)
T+30 min If canary: review error rate before advancing to next stage
T+60 min Summarize status to the team: green / hold / rolled back
Do not close the deployment session until you have completed the T+60 summary.
sdd-explore → sdd-propose → sdd-spec → sdd-design → sdd-tasks
→ sdd-apply → sdd-verify → [shipping-and-launch] → sdd-archive
Pre-condition: sdd-verify must report Pyramid L1-L3 green before this skill activates. A verify report with any RED layer is a hard block on shipping — fix the layer, re-verify, then ship.
pipeline-agent: Ship is a CONDITIONAL phase. Not every change requires a full staged rollout — pipeline-agent evaluates scope. A documentation-only change or a dev-tool fix does not trigger this skill. A user-facing feature always does.
ci-cd-pipeline skill: Handles the deployment MECHANICS — Dockerfile builds, CI/CD pipeline steps, Coolify deployment targets. shipping-and-launch handles the STRATEGY: when to advance stages, what thresholds trigger a rollback, and what to verify in the first hour after going live. Both skills are complementary — use ci-cd-pipeline for "how to deploy" and this skill for "whether to advance."
| Rationalization | Counter |
|---|---|
| "Staging passed, so production will be fine" | Production differs in data volume, traffic patterns, third-party integrations, and edge cases that staging never generates. Staging success is necessary but not sufficient. |
| "Feature flags add unnecessary complexity" | Every feature benefits from a kill switch. The complexity of a flag is a few lines of configuration. The complexity of an emergency hotfix is hours of firefighting under pressure. |
| "We can add monitoring after launch" | You cannot debug what you cannot see. Monitoring must precede the launch — not follow it. Configuring dashboards during an incident is the worst possible time to learn a tool. |
| "It's a small change, a rollback plan is overkill" | Small changes cause outages too. A 2-minute rollback plan saves hours of coordination during an incident. The cost is negligible; the insurance is real. |
| "We'll fix issues as they come up" | Reactive firefighting costs roughly 10x more in team time than proactive verification. The first-hour checklist exists precisely to catch issues before they reach 100% of users. |
| "We've shipped this type of change before without problems" | Each release is different. Prior success is not a substitute for current verification. |
For non-technical readers: Shipping a feature is like opening a new store location. You would not unlock the doors on day one without first making sure the lights work, the registers are functional, and the staff knows what to do if something breaks. Shipping software is the same: before we "open the doors" to users, we run a checklist across five areas (code quality, security, performance, infrastructure, and documentation). Then we let a small percentage of users in first — like a soft opening — and only expand to everyone once we confirm nothing is broken. We also write down the "fire drill plan" before we open, so that if something does go wrong, we already know the steps to take rather than figuring it out under pressure.