Name: Production Readiness Check
Author: sumrae412

Production Readiness Check | Skills Pool

git diff origin/main...HEAD --name-only

# General credential assignments (catches custom keys, passwords, secrets)
git diff origin/main...HEAD -U0 | grep -iE "(api_key|api_secret|password|secret_key|private_key|token)\s*=\s*['\"][^'\"]{8,}"

# Known-format tokens (AWS, Stripe, GitHub, GitLab)
git diff origin/main...HEAD -U0 | grep -E 'AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9\-]{20,}|pk_live_|ghp_[a-zA-Z0-9]{36}|glpat-[a-zA-Z0-9\-]{20}'

git diff origin/main...HEAD --name-only | grep -i '\.env'

git diff origin/main...HEAD | grep -E 'http://' | grep -vE 'localhost|127\.0\.0\.1|schemas|w3\.org|example\.com'

git diff origin/main...HEAD --name-only | xargs grep -l 'Strict-Transport-Security' /dev/null 2>/dev/null

grep -riE 'Strict-Transport-Security' --include='*.py' --include='*.js' --include='*.ts' --include='*.yaml' --include='*.yml' 2>/dev/null

# Find new endpoints
git diff origin/main...HEAD | grep -E '^\+.*(\.route|\.get|\.post|\.put|\.patch|\.delete|@app\.|@router\.|app\.(use|all))\b'

# Cross-reference against logging in the same files
git diff origin/main...HEAD --name-only | xargs grep -lE '(logger\.|logging\.|console\.log|audit_log|log\.info|log\.warn|log\.error)' 2>/dev/null

# For each workflow, find deterministic-install commands and verify lockfile sibling
for wf in .github/workflows/*.yml .github/workflows/*.yaml; do
  [ -f "$wf" ] || continue
  grep -qE 'npm ci|yarn install --frozen|pnpm install --frozen|poetry install --no-update|pip install -r [^ ]*\.lock' "$wf" || continue
  wd=$(grep -oE 'working-directory:\s*\S+' "$wf" | awk '{print $2}' | head -1)
  wd="${wd:-.}"
  found=0
  for lock in package-lock.json yarn.lock pnpm-lock.yaml poetry.lock; do
    [ -f "$wd/$lock" ] && found=1 && break
  done
  [ "$found" -eq 0 ] && echo "FAIL: $wf runs deterministic install in $wd but no lock file found"
done

File Pattern	Expanded Section
`auth/`, `login`, `session`, `jwt`, `token`, `password`, `oauth`, `middleware/auth`	Authentication
`models/`, `migrations/`, `schema`, `encrypt`, `pii`, `personal`, `backup`, `gdpr`	Data Protection
`logging`, `monitor`, `alert`, `metric`, `sentry`, `datadog`, `grafana`, `prometheus`	Monitoring
`routes/`, `api/`, `endpoints/`, `middleware/`, `cors`, `rate_limit`, `limiter`	Security Extended
`models/`, `migrations/`, `alembic/`, `schema`, `database`, `db/`, `*.sql`	Database
`Dockerfile`, `docker-compose`, `.env.example`, `railway.`, `terraform/`, `deploy/`, `pm2.`, `ecosystem.`, `.github/workflows/`, `systemd/`	Deployment
`*/.js`, `*/.ts`, `*/.jsx`, `*/.tsx`, `*/.py`, `templates/`, `static/`, `package.json`, `uv.lock`, `poetry.lock`	Code Hygiene

ID	Check	Type	Details
A1	MFA Available	infra-confirm	Ask: "Is MFA enabled for user-facing auth (e.g., TOTP, WebAuthn)?" If unconfirmed, mark UNCONFIRMED and provide IaC snippet.
A2	Password Policy	code-check	Grep for password length/complexity enforcement. Look for `minlength`, `MIN_PASSWORD_LENGTH`, `passwordStrength`, `zxcvbn`. Score: 75 if no policy found.
A3	Session Management	code-check	Check for `httpOnly`, `secure`, `sameSite` on cookies. Check session expiry / max-age configuration. Score: 75 per missing attribute.
A4	JWT Secured	code-check	Verify JWT algorithm is pinned (not `none`). Check secret is from env var. Check for expiry (`exp` claim). Score: 100 for hardcoded secret, Score: 85 for algorithm `none` allowed, Score: 75 for missing `exp` claim.

ID	Check	Type	Details
D1	Encryption at Rest	infra-confirm	Ask: "Is encryption at rest enabled for databases and object stores?" If unconfirmed, mark UNCONFIRMED and provide IaC snippets (RDS + S3).
D2	TLS in Transit	covered-by-core	Covered by C2 above.
D3	PII Anonymization	code-check	Grep for PII field names (`email`, `phone`, `ssn`, `address`, `date_of_birth`) in logs, error messages, and API responses. Score: 70 for PII in logs, Score: 80 for PII in API error responses.
D4	Backup & Recovery	infra-confirm	Ask: "Are automated backups configured with tested restore procedures?" If unconfirmed, mark UNCONFIRMED and provide IaC snippet.

ID	Check	Type	Details
M1	Security Logging	covered-by-core	Covered by C3 above.
M2	Anomaly Detection	infra-confirm	Ask: "Are anomaly detection alerts configured (e.g., unusual login patterns, traffic spikes)?" If unconfirmed, mark UNCONFIRMED and provide IaC snippet.
M3	Incident Response Plan	infra-confirm	Check for `docs/incident-response.md`. If missing, mark UNCONFIRMED and provide doc stub.
M4	Security Audit Schedule	infra-confirm	Check for `docs/security-audit-schedule.md`. If missing, mark UNCONFIRMED and provide doc stub.

ID	Check	Type	Details
S1	Route auth audit	code-check	For every new route definition in the diff (`@app.route`, `@router.get\|post\|put\|patch\|delete`, `app.get\|post`, etc.), cross-reference against an auth decorator (`@login_required`, `@jwt_required`, `@require_auth`, `Depends(get_current_user)`, `authenticate`). Score: 90 per unauthenticated route. Exempt routes explicitly tagged public (`/health`, `/metrics`, `/public/`, `@public`, `@health_check`).
S2	CORS lockdown	code-check	Grep for CORS wildcards: `allow_origins=[""]`, `Access-Control-Allow-Origin: `, `origins=[""]`, `cors(origin: '')`. Score: 85 per match — wildcard CORS is prod-unsafe.
S3	Server-side input validation	user-judgment	Ask: "Are user-supplied inputs in changed endpoints validated server-side (schemas, length limits, type coercion)?" UNCONFIRMED if not inspected. Client-side validation is UX, not security.
S4	Rate limiting on auth/sensitive endpoints	code-check	If auth routes (`/login`, `/register`, `/reset-password`, `/api/token`) exist in the diff, grep for rate-limit decorators or middleware (`@limiter.limit`, `slowapi`, `rate_limit`, `RateLimitMiddleware`, `express-rate-limit`). Score: 75 if auth routes present without rate limiting.
S5	Password hashing algorithm	code-check	If password storage code is touched, grep for strong hashes: `bcrypt`, `argon2`, `passlib`, `scrypt`. Flag weak hashes alone: `md5`, `sha1`, `sha256(password)`. Score: 100 for weak-hash match on password fields, Score: 70 if password storage touched without any strong hash import visible.
S6	Logout session invalidation	code-check	For logout endpoints, grep for server-side revocation (`revoke`, `blacklist`, `delete_session`, `session.pop`, `token_blacklist`). Score: 80 if logout only clears client cookie without server-side cleanup — stolen tokens remain valid.

ID	Check	Type	Details
DB1	Backup restore tested	infra-confirm	Ask: "Has the backup restore procedure been tested end-to-end in the last 90 days?" UNCONFIRMED otherwise. Backups without restore tests are untrusted.
DB2	Parameterized queries	code-check	Grep diff for SQL string concatenation: `execute(f"SELECT`, `execute(".* " +` , `query(f'`, `.format(...)` on SQL strings, `%s` fallback via `%` operator. Score: 100 per match — injection risk. Use parameterized queries / ORM binds.
DB3	Dev/prod DB separation	infra-confirm	Ask: "Are dev/staging and production databases distinct instances with separate credentials?" UNCONFIRMED if shared.
DB4	Connection pooling configured	code-check	If DB engine configuration is touched, grep for pool settings: `pool_size=`, `QueuePool`, `max_overflow=`, `poolclass=`, external pooler (`PgBouncer`, `RDS Proxy`). Score: 60 if DB engine configured without explicit pooling.
DB5	Migrations in version control	code-check	If schema-shaped changes appear in diff (`models/`, `schema.sql`, `CREATE TABLE`, `ALTER TABLE`, column adds/drops), verify a corresponding migration file exists in `alembic/versions/`, `migrations/`, or equivalent. Score: 85 per schema change without migration. No manual DB surgery.
DB6	Non-root DB user	infra-confirm	Ask: "Does the application connect with a non-superuser DB account (limited to the schemas/tables it needs)?" UNCONFIRMED if unsure. Superuser connections magnify SQLi blast radius.

ID	Check	Type	Details
DP1	Env vars declared for production	code-check	Grep changed code for `os.getenv`, `os.environ[`, `process.env.`. Cross-reference each referenced env var against `.env.example` (or equivalent) in the repo. Score: 70* per env var referenced in code but missing from `.env.example` — deployment-time footgun.
DP2	SSL cert installed and valid	infra-confirm	Ask: "Is the production SSL certificate installed, valid, not expiring in <30 days, and covering all deployed hostnames?" UNCONFIRMED otherwise.
DP3	Firewall (80/443 only public)	infra-confirm	Ask: "Does the production firewall expose only ports 80/443 publicly, with all other ports internal-only?" UNCONFIRMED otherwise.
DP4	Process manager running	infra-confirm	Ask: "Is the app running under a process manager (systemd, PM2, Docker restart policies, Kubernetes) that auto-restarts on crash?" UNCONFIRMED otherwise.
DP5	Rollback plan exists	code-check	Check for `docs/runbooks/rollback.md`, `docs/rollback.md`, `docs/runbooks/`, or equivalent rollback documentation in the repo. Score: 60 if no rollback runbook exists — every prod system needs a known rollback path.
DP6	Staging test passed	user-judgment	Ask: "Was this change deployed and smoke-tested on staging before production?" UNCONFIRMED if skipped.

ID	Check	Type	Details
CD1	No debug logging in prod build	code-check	Grep diff for leftover debug calls: `console.log(`, `console.debug(`, bare `print(` in non-test Python files, `debugger;` in JS. Exclude framework loggers (`logger.info`, `log.debug`, `structlog`). Score: 60 per leftover debug call.
CD2	Error handling on async operations	code-check (heuristic)	Heuristic pre-filter — grep diff for `await` and `async def` / `async function`; for each hit, check whether the enclosing block contains `try`/`except`/`catch`. Grep cannot verify scope accurately, so treat matches as low-confidence and confirm each flagged location with the user before marking FAIL (e.g., "Function `<name>` contains `await` — is error handling present?"). Score: 70 per confirmed unguarded await in non-test code. Unhandled promise rejections crash Node processes.
CD3	UI loading and error states	code-check (heuristic)	Heuristic pre-filter — for each component/template that fetches data (contains `fetch(`, `axios`, `useQuery`, `useEffect.fetch`, `{% ... %}` template calls), grep for loading/error branches (`isLoading`, `error`, `{#if error}`, `x-if=`, `{% if error %}`, `<ErrorBoundary>`). Grep cannot verify component structure, so flag matches as low-confidence and confirm with the user (e.g., "`<Component>` fetches data — does it render both loading and error states?"). Score: 65* per confirmed component missing error or loading branch.
CD4	Pagination on list endpoints	code-check	Grep new list endpoints (routes returning arrays/collections, e.g. `def list_`, `def get_all_`, `return items`, `return query.all()`) for pagination params (`limit`, `offset`, `page`, `cursor`, `page_size`). Score: 70 per list endpoint without pagination — unbounded queries break under scale.
CD5	Dependency vulnerability audit	code-check	If `package.json` / `package-lock.json` changed: run `npm audit --production --audit-level=high` and report unresolved critical/high issues. If `pyproject.toml` / `uv.lock` / `poetry.lock` changed: run `pip-audit` or `uv pip check`. Score: 80 per unresolved critical vulnerability.

### Production Readiness Check

**Triggered sections:** [list which sections ran, e.g. "Authentication, Monitoring (Data Protection skipped — no matching files)"]

| # | Item | Status | Score | Action |
|---|------|--------|-------|--------|
| 1 | JWT secret from env | PASS | — | — |
| 2 | Session https_only | FAIL | 75 | Fix: set `https_only: True` in session config |
| 3 | MFA enabled | UNCONFIRMED | — | User confirmation needed |
| 4 | Security event logging | FAIL | 80 | Fix: add audit log to new `/api/users` endpoint |
| 5 | Anomaly detection | UNCONFIRMED | — | Generate CloudWatch alarm config? |

**Proposed fix plan:**
1. Set `https_only: True` in `app/config/session.py:14`
2. Add `logger.info("user_action", extra={...})` to `app/routes/users.py:45`
3. Generate `terraform/modules/monitoring/alarms.tf` with anomaly detection config

### Ship Readiness Summary

- **Checks run:** <total items evaluated across triggered sections>
- **PASS:** <n> / **FAIL:** <n> / **UNCONFIRMED:** <n>
- **Critical (score 100):** <n>
- **High (score 75–99):** <n>
- **Medium (score 60–74):** <n>

> Advisory mode — any unresolved FAIL or UNCONFIRMED item is on you. "Can't check every box? You're not ready to ship" is a judgment call, not a hard gate.

Proceed with fixes?

resource "aws_cognito_user_pool" "main" {
  name = var.user_pool_name

  mfa_configuration = "ON"

  software_token_mfa_configuration {
    enabled = true
  }

  account_recovery_setting {
    recovery_mechanism {
      name     = "verified_email"
      priority = 1
    }
  }
}

resource "aws_db_instance" "main" {
  storage_encrypted = true
  kms_key_id        = var.rds_kms_key_arn

  # ... other configuration
}

resource "aws_kms_key" "rds" {
  description             = "KMS key for RDS encryption at rest"
  deletion_window_in_days = 30
  enable_key_rotation     = true
}

resource "aws_s3_bucket_server_side_encryption_configuration" "main" {
  bucket = aws_s3_bucket.main.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.s3.arn
    }
    bucket_key_enabled = true
  }
}

resource "aws_kms_key" "s3" {
  description             = "KMS key for S3 encryption at rest"
  deletion_window_in_days = 30
  enable_key_rotation     = true
}

resource "aws_db_instance" "main" {
  backup_retention_period = 30
  backup_window           = "03:00-04:00"
  copy_tags_to_snapshot   = true
  deletion_protection     = true

  # ... other configuration
}

resource "aws_cloudwatch_metric_alarm" "failed_login_spike" {
  alarm_name          = "${var.app_name}-failed-login-spike"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "FailedLoginAttempts"
  namespace           = var.app_name
  period              = 300
  statistic           = "Sum"
  threshold           = var.failed_login_threshold
  alarm_description   = "Alert on unusual number of failed login attempts"
  alarm_actions       = [var.sns_topic_arn]
}

resource "aws_cloudwatch_metric_alarm" "traffic_anomaly" {
  alarm_name          = "${var.app_name}-traffic-anomaly"
  comparison_operator = "GreaterThanUpperThreshold"
  evaluation_periods  = 3
  threshold_metric_id = "ad1"
  alarm_description   = "Alert on anomalous traffic patterns"

  metric_query {
    id          = "m1"
    metric {
      metric_name = "RequestCount"
      namespace   = "AWS/ApplicationELB"
      period      = 300
      stat        = "Sum"
    }
  }

  metric_query {
    id          = "ad1"
    expression  = "ANOMALY_DETECTION_BAND(m1, 2)"
    label       = "RequestCount (expected)"
    return_data = true
  }

  alarm_actions = [var.sns_topic_arn]
}

# Incident Response Plan

## Severity Levels

| Level | Name | Description | Response Time | Examples |
|---|---|---|---|---|
| SEV-1 | Critical | Service fully down, data breach | 15 min | Production outage, credential leak |
| SEV-2 | Major | Significant degradation | 30 min | Major feature broken, partial outage |
| SEV-3 | Minor | Limited impact | 4 hours | Minor bug in production, single user affected |
| SEV-4 | Low | Minimal impact | 24 hours | Cosmetic issue, non-critical alert |

## Response Procedure

1. **Detect** — alert fires or report received
2. **Triage** — assess severity, assign incident commander
3. **Communicate** — notify stakeholders via #incidents channel
4. **Mitigate** — apply immediate fix or rollback
5. **Resolve** — confirm service restored, root cause identified
6. **Postmortem** — blameless review within 48 hours (SEV-1/2)

## Contacts

| Role | Name | Contact |
|---|---|---|
| Incident Commander | TBD | TBD |
| Engineering Lead | TBD | TBD |
| Communications | TBD | TBD |

## Runbooks

- [ ] Database failover: `docs/runbooks/db-failover.md`
- [ ] Rollback deployment: `docs/runbooks/rollback.md`
- [ ] Credential rotation: `docs/runbooks/credential-rotation.md`

# Security Audit Schedule

## Recurring Audits

| Audit | Frequency | Owner | Last Completed | Next Due |
|---|---|---|---|---|
| Dependency vulnerability scan | Weekly (automated) | CI/CD | TBD | TBD |
| OWASP Top 10 review | Quarterly | Security Lead | TBD | TBD |
| Infrastructure access review | Quarterly | Platform Team | TBD | TBD |
| Penetration test | Annually | External Vendor | TBD | TBD |
| SOC 2 / compliance audit | Annually | Compliance Team | TBD | TBD |

## Ad-Hoc Triggers

Run an unscheduled security review when any of these occur:

- Major infrastructure change (new cloud provider, region, service)
- Authentication or authorization system changes
- New third-party integration with data access
- Security incident or near-miss (post-incident action item)
- Significant codebase refactor affecting security boundaries

Production Readiness Check

When to Use

Production Readiness Check

When to Use

Check Types

Trigger System

Step 1: Get the Diff

Step 2: Run Minimal Core (Always)

C1: Secrets in Code

C2: HTTPS / TLS Enforcement

C3: Security Event Logging

C4: CI Workflow Companion Artifacts

Step 3: Match Deep-Dive Triggers

Step 4: Run Expanded Checks

Authentication Deep-Dive

Data Protection Deep-Dive

Monitoring Deep-Dive

Security Extended Deep-Dive

Database Deep-Dive

Deployment Deep-Dive

Code Hygiene Deep-Dive

Step 5: Report Findings

Step 6: Fix Iteration

IaC Snippet Templates

MFA — AWS Cognito (Terraform)

Encryption at Rest — AWS RDS (Terraform)

Encryption at Rest — S3 (Terraform)

Automated Backups — RDS (Terraform)

Anomaly Detection — CloudWatch (Terraform)

Incident Response Plan Stub

Security Audit Schedule Stub

Next Steps

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns