Railway deployment, CI/CD pipelines, health checks, error monitoring (Sentry), uptime alerting, zero-downtime deploys, and go-live operations for the H&J Platform. Covers the full path from git push to production with rollback procedures. Use when deploying, setting up monitoring, debugging production issues, or preparing for August 2026 go-live.
You are a DevOps/SRE engineer preparing a critical internal business application for production go-live. The app currently has ZERO customers on the new platform (they're on Jobber), but August 2026 is the switchover. Your job: make sure this thing is rock-solid BEFORE a business depends on it.
This is Mission-Critical Operations Software — not a SaaS startup. If the system goes down, H&J can't take orders, build products, or process payments. Think "hospital EHR systems" not "social media app."
┌──────────────────────────────────────────────────────────┐
│ THE STACK │
│ │
│ Git Push → Railway (auto-deploy) → Express + Vite SSR │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Railway │ │ Neon │ │ Cloudflare │ │
│ │ (Hosting) │←──→│ PostgreSQL │ │ R2 (Storage) │ │
│ │ Express.js │ │ (SHARED DB) │ │ (Files/Imgs) │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
│ ↕ ↕ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Stripe │ │ Twilio │ │ Google │ │
│ │ (Payments) │ │ (SMS/Voice) │ │ (Maps, AI) │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Sentry │ │ Nodemailer │ │
│ │ (Errors) │ │ (Email/SMTP) │ │
│ └─────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────┘
┌──────────┬─────────────────┬─────────────────┬──────────────┐
│ Branch │ Purpose │ Deployment │ Database │
├──────────┼─────────────────┼─────────────────┼──────────────┤
│ update2 │ PRODUCTION │ Railway auto │ Shared Neon │
│ update3 │ Development │ NOT deployed │ Shared Neon │
│ feature │ Feature branches│ NOT deployed │ Shared Neon │
└──────────┴─────────────────┴─────────────────┴──────────────┘
⚠️ CRITICAL: All branches share the SAME database.
See FLIGHT_RULES.md for implications.
# Standard deploy to production
git push origin update2
# Railway watches update2 and auto-deploys
# Build: vite build && esbuild server/index.ts
# Start: NODE_ENV=production node dist/index.js
# Check deploy status
# → Railway dashboard: https://railway.app/project/[project-id]
# Development run
npm run dev # → http://localhost:3000
// server/routes/health.routes.ts
app.get('/api/health', async (req, res) => {
const checks = {
status: 'ok',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
version: process.env.npm_package_version || 'unknown',
environment: process.env.NODE_ENV,
checks: {} as Record<string, any>,
};
// Database check
try {
const start = Date.now();
await db.execute(sql`SELECT 1`);
checks.checks.database = {
status: 'ok',
latencyMs: Date.now() - start
};
} catch (error) {
checks.checks.database = { status: 'error', message: error.message };
checks.status = 'degraded';
}
// Memory check
const mem = process.memoryUsage();
checks.checks.memory = {
heapUsedMB: Math.round(mem.heapUsed / 1024 / 1024),
heapTotalMB: Math.round(mem.heapTotal / 1024 / 1024),
rssMB: Math.round(mem.rss / 1024 / 1024),
status: mem.heapUsed / mem.heapTotal > 0.9 ? 'warning' : 'ok',
};
const statusCode = checks.status === 'ok' ? 200 : 503;
res.status(statusCode).json(checks);
});
// Lightweight ping (for uptime monitors)
app.get('/api/ping', (req, res) => {
res.status(200).json({ pong: true, timestamp: Date.now() });
});
TIER 1 — MUST HAVE BEFORE GO-LIVE:
□ Health check endpoint (/api/health)
□ Uptime monitor (ping every 60 seconds)
□ Database connectivity check
□ Error rate monitoring (Sentry)
□ SSL certificate expiry alert
TIER 2 — SHOULD HAVE:
□ Response time monitoring (P95 < 500ms)
□ Memory usage alerts (> 80% heap)
□ Database connection pool usage
□ Stripe webhook delivery health
□ Twilio API health
TIER 3 — NICE TO HAVE:
□ Custom business metric monitoring (orders/hour)
□ CPU usage alerts
□ Disk space monitoring
□ Log aggregation
□ APM (Application Performance Monitoring)
H&J already has Sentry installed:
- Package: @sentry/node (^10.36.0)
- DSN: configured in .env (SENTRY_DSN)
- Captures: unhandled exceptions, unhandled rejections
// server/index.ts — Sentry initialization
import * as Sentry from "@sentry/node";
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV || 'development',
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
// Don't send errors in development
enabled: process.env.NODE_ENV === 'production',
// Filter out noisy errors
ignoreErrors: [
'CSRF token missing', // Expected for unauthenticated requests
'Rate limit exceeded', // Expected behavior
],
beforeSend(event) {
// Strip PII from error reports
if (event.user) {
delete event.user.email;
delete event.user.ip_address;
}
return event;
},
});
// After all routes, add Sentry error handler
app.use(Sentry.expressErrorHandler());
// In route handlers — capture errors with context
try {
await processOrder(orderId);
} catch (error) {
Sentry.captureException(error, {
tags: {
feature: 'order-processing',
orderId: orderId,
},
extra: {
contractorId: req.user?.contractorId,
requestBody: req.body,
},
});
logError('Order processing failed', error as Error);
res.status(500).json({ message: 'Order processing failed' });
}
// For business logic warnings (not crashes, but important)
Sentry.captureMessage('Inventory below safety stock', {
level: 'warning',
tags: { feature: 'inventory' },
extra: { product: 'White Privacy Panel', onHand: 45, safetyStock: 83 },
});
ALERT RULES TO CONFIGURE:
1. ERROR SPIKE
Trigger: > 10 errors in 5 minutes
Action: SMS to Joe + email
Why: Something is systematically broken
2. NEW ERROR TYPE
Trigger: First occurrence of a new error
Action: Email to Joe
Why: Catch new bugs immediately
3. UNHANDLED REJECTION
Trigger: Any unhandled promise rejection
Action: Sentry issue + email
Why: These crash the process in Node
4. DATABASE CONNECTION FAILURE
Trigger: Error contains "connection" or "ECONNREFUSED"
Action: SMS to Joe immediately
Why: Database down = everything down
5. PAYMENT PROCESSING ERROR
Trigger: Error in Stripe/payment routes
Action: SMS to Joe immediately
Why: Can't take money = can't do business
CURRENT: Push to update2 → Railway auto-deploys → Done
No tests run, no linting, no security checks in CI.
RECOMMENDED GITHUB ACTIONS PIPELINE:
# .github/workflows/ci.yml