Mandatory deep-trace verification protocol before declaring ANY system operational. Prevents surface-level diagnostics from masking hidden failures.
Surface-level diagnostics lie. A health endpoint returning "ok" does NOT mean the pipeline works. An API key existing does NOT mean it's loaded into the provider chain. A config default does NOT mean the runtime uses that default.
This protocol exists because of a pattern that has killed pipeline runs repeatedly: code documents one thing, runtime does another, and surface checks can't tell the difference.
MANDATORY before:
The Rule: Never trust that a key exists. Verify the key is LOADED and ACTIVE in the runtime chain.
Case study: GROQ_API_KEY existed in Railway. VidRush status confirmed groq_whisper: true. But Groq was never initialized as an LLM provider because LLM_FAILOVER_ORDER env var in Railway didn't include it. buildTeamLLM(["groq", ...]) silently skipped Groq because it wasn't in providersByName.
The Check:
/api/content-engine/diag and confirm llm_chain lists expected providers in expected order.Implementation: The diag endpoint MUST report llm_chain: ["groq", "gemini", "anthropic", "openai"] — the actual instantiated provider list, not the config default.
The Rule: Any function that reads env vars with a fallback default (envList, envInt, process.env.X || "default") can be silently overridden by a Railway env var set in a previous session.
Case study: config.ts has failoverOrder: envList("LLM_FAILOVER_ORDER", ["anthropic", "gemini", "groq", "openai", "deepseek"]). Default includes Groq. But if LLM_FAILOVER_ORDER was set in Railway before Groq was added, it overrides the default and excludes Groq permanently.
The Check:
envList() / envInt() call as a potential override site.The Rule: Read error messages as data, not just descriptions. Count the providers listed. Compare against what should be there. A missing provider in an error is more informative than the providers that are listed.
Case study: Error said "All LLM providers failed: 1. gemini... 2. anthropic... 3. openai..." — three providers. The pipelineLLM was built with ["groq", "gemini", "anthropic", "openai"] — four providers. The missing provider IS the bug.
The Check:
The Rule: Code existing is not the same as code executing. A function being defined is not the same as it being called. A config value being written is not the same as it being read at runtime.
This is the "THEORY to PRACTICAL FLOWS" pattern from the master reference. It has caught us before (Buffer enum, Make.com scenarios, content engine distribution). It will catch us again if we don't verify.
The Check:
Active model: lines tell you which providers exist. If a provider isn't in the boot log, it doesn't exist at runtime — period.The Rule: Never send the Architect into a live pipeline run without a smoke test that exercises the EXACT path the pipeline will take.
The Check:
/test_tts — verifies the TTS chain fires (which provider, success/failure)./api/content-engine/diag — verifies image gen chain AND LLM chain AND TTS chain with actual runtime data./dryrun <url> — verifies the full pipeline logic without burning real resources.When a pipeline run fails, before attempting any fix:
1. WHAT FAILED: [exact error message, verbatim]
2. WHAT'S MISSING: [compare error against expected chain — what's absent?]
3. WHY IT'S MISSING: [trace from config -> env var -> provider init -> runtime chain]
4. WHERE IT DIVERGES: [the exact line where code expectation != runtime reality]
5. WHAT MASKED IT: [which surface diagnostic gave a false green light?]
6. THE FIX: [code change, env var change, or both]
7. THE GUARD: [what diagnostic/check prevents this class of failure from recurring?]
The /api/content-engine/diag endpoint MUST report:
llm_chain: Array of provider names in actual failover order (e.g. ["groq", "gemini", "anthropic", "openai"])llm_chain_count: Number of active providerspipeline_llm_chain: The pipeline-specific LLM chain (should always have groq first)pollinations_ok: Booleanelevenlabs_status: Status code or "not_configured"edge_tts_available: Boolean (package importable)tts_chain: Array of TTS providers in fallback orderIf pipeline_llm_chain doesn't include "groq" as the first entry, the system is NOT ready.
"It's not real until I see it in the runtime output."
Not the config file. Not the code. Not the env var dashboard. The RUNTIME OUTPUT. Boot logs, diag endpoint responses, actual error messages. That's reality. Everything else is theory.