Operational traps for Terraform provisioners, multi-environment isolation, and zero-to-deployment reliability. Covers provisioner timing races, SSH connection conflicts, DNS record duplication, volume permissions, database bootstrap gaps, snapshot cross-contamination, Cloudflare credential format errors, hardcoded domains in Caddyfiles/compose, and init-data-only-on-first-boot pitfalls. Activate when writing null_resource provisioners, creating multi-environment Terraform setups, debugging containers that are Restarting/unhealthy after terraform apply, setting up fresh instances with cloud-init, or any IaC code that SSHs into remote hosts. Also activate when the user mentions terraform plan/apply errors, provisioner failures, infrastructure drift, TLS certificate errors, or Caddy/gateway configuration.
Failure patterns from real deployments. Every item caused an incident. Organized as: exact error → root cause → copy-paste fix.
docker: not found in remote-execcloud-init still installing Docker when provisioner SSHs in.
provisioner "remote-exec" {
inline = [
"cloud-init status --wait || true",
"which docker || { echo 'FATAL: Docker not ready'; exit 1; }",
]
}
rsync: connection unexpectedly closed in local-execTerraform holds its SSH connection open; local-exec rsync opens a second one that gets rejected. Never use local-exec for file transfer to remote. Use tarball + file provisioner:
provisioner "local-exec" {
command = "tar czf /tmp/src.tar.gz --exclude=node_modules --exclude=.git -C ${path.module}/../../.. myproject"
}
provisioner "file" {
source = "/tmp/src.tar.gz"
destination = "/tmp/src.tar.gz"
}
provisioner "remote-exec" {
inline = ["tar xzf /tmp/src.tar.gz -C /data/ && rm -f /tmp/src.tar.gz"]
}
macOS BSD tar: --exclude must come BEFORE the source argument.
cloud-init status shows "running" foreverapt-get -y does not suppress debconf dialogs. Packages like iptables-persistent block on TTY prompts.
- |
echo iptables-persistent iptables-persistent/autosave_v4 boolean true | debconf-set-selections
echo iptables-persistent iptables-persistent/autosave_v6 boolean true | debconf-set-selections
DEBIAN_FRONTEND=noninteractive apt-get install -y iptables-persistent
Known offenders: iptables-persistent, postfix, mysql-server, wireshark-common.
EACCES: permission denied in container logs, container RestartingHost volume dirs are root-owned; container runs as non-root (uid 1001). Fix before docker compose up:
mkdir -p /data/myapp/data /data/myapp/logs
chown -R 1001:1001 /data/myapp/data /data/myapp/logs
Find UID: grep adduser.*-u or USER in Dockerfile.
set -e exits on first error, hiding subsequent docker logs output. Use set -u without -e, put one verification gate at the end:
provisioner "remote-exec" {
inline = [
"set -u",
"docker compose up -d",
"sleep 15",
"docker logs myapp --tail 20 2>&1 || true",
"docker ps --format 'table {{.Names}}\\t{{.Status}}' || true",
"docker ps --filter name=myapp --format '{{.Status}}' | grep -q healthy || exit 1",
]
}
Restarting — database tables missingDB migrations not in provisioner. PostgreSQL docker-entrypoint-initdb.d only runs on empty data dir. Explicitly create DB + run migrations:
# After postgres healthy:
docker exec pg psql -U postgres -tc "SELECT 1 FROM pg_database WHERE datname='mydb'" | grep -q 1 \
|| docker exec pg psql -U postgres -c "CREATE DATABASE mydb;"
# Idempotent migrations:
for f in migrations/*.sql; do
VER=$(basename $f)
APPLIED=$($PSQL -tAc "SELECT 1 FROM schema_migrations WHERE version='$VER'" | tr -d ' ')
[ "$APPLIED" = "1" ] && continue
{ echo 'BEGIN;'; cat $f; echo 'COMMIT;'; } | $PSQL
$PSQL -tAc "INSERT INTO schema_migrations(version) VALUES ('$VER') ON CONFLICT DO NOTHING"
done
docker compose build ignores env var overrideCompose reads build args from .env file, not shell env. VAR=x docker compose build does NOT work.
# WRONG
DOCKER_WITH_PROXY_MODE=disabled docker compose build
# RIGHT
grep -q DOCKER_WITH_PROXY_MODE .env || echo 'DOCKER_WITH_PROXY_MODE=disabled' >> .env
docker compose build
Invalid format for Authorization headerCaddy DNS-01 ACME needs a Cloudflare API Token (cfut_ prefix, 40+ chars, Bearer auth). A Global API Key (37 hex chars, X-Auth-Key auth) causes HTTP 400 Code:6003. Production may appear to work because it has cached certificates; fresh environments fail on first cert request.
# Verify token format before deploy:
TOKEN=$(grep CLOUDFLARE_API_TOKEN .env | cut -d= -f2)
echo "$TOKEN" | grep -q "^cfut_" || echo "FATAL: needs API Token, not Global Key"
Create scoped token via API:
curl -s "https://api.cloudflare.com/client/v4/user/tokens" -X POST \
-H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_GLOBAL_KEY" \
-d '{"name":"caddy-dns-acme","policies":[{"effect":"allow",
"resources":{"com.cloudflare.api.account.zone.<ZONE_ID>":"*"},
"permission_groups":[
{"id":"4755a26eedb94da69e1066d98aa820be","name":"DNS Write"},
{"id":"c8fed203ed3043cba015a93ad1616f1f","name":"Zone Read"}]}]}'
Caddyfile or compose has literal domain names. Staging Caddy loads production config, tries to get certs for domains it doesn't own → ACME fails.
Caddyfile: Use {$VAR} — Caddy evaluates env vars at startup.
# WRONG
gpt-6.pro { tls { dns cloudflare {env.CLOUDFLARE_API_TOKEN} } }
# RIGHT
{$LOBEHUB_DOMAIN} { tls { dns cloudflare {env.CLOUDFLARE_API_TOKEN} } }
Compose: Use ${VAR:?required} — fail-fast if unset.
# WRONG
- APP_URL=https://gpt-6.pro
# RIGHT
- APP_URL=${APP_URL:?APP_URL is required}
Pass the env var to the gateway container so Caddy can read it: