Render a spoken-word MP3 podcast from wiki pages — single-host by default or two-voice dialogue. Piper TTS default (local, free); falls back to ElevenLabs / OpenAI TTS when their API keys are present. Used by /generate podcast. Not user-invocable directly — go through /generate.
Produce a 3–10 minute MP3 explainer from wiki pages. The LLM writes a spoken-word narrative, TTS renders each line, ffmpeg concatenates into a single MP3.
Artifact-first — output lands in vaults/<vault>/artifacts/podcast/.
/generate podcast <topic> [--vault <name>] [--length short|medium|long] [--two-voice] [--voice <name>]
--length — short (~3 min), medium (~6 min, default), long (~10 min).--two-voice — dialogue between two hosts instead of a monologue.--voice — override the default Piper voice. Ignored when the ElevenLabs / OpenAI fallback kicks in.Same topic resolution as sibling handlers — reuses .claude/skills/generate/lib/select-pages.sh.
wiki pages → LLM script writer → script.md → TTS per line → ffmpeg concat → podcast.mp3
Keep the .script.md alongside the MP3 — it's diffable, re-renderable, and the honest primary artifact.
HAS_FFMPEG=0; HAS_PIPER=0
which ffmpeg >/dev/null 2>&1 && HAS_FFMPEG=1
which piper >/dev/null 2>&1 && HAS_PIPER=1
if [ "$HAS_FFMPEG" = "0" ]; then
echo "ffmpeg missing. Installing via Homebrew…"
brew install ffmpeg
fi
# Piper is optional if ELEVENLABS_API_KEY or OPENAI_API_KEY is set.
if [ "$HAS_PIPER" = "0" ] && [ -z "$ELEVENLABS_API_KEY" ] && [ -z "$OPENAI_API_KEY" ]; then
echo "Piper not found and no cloud TTS key present."
echo "Installing Piper (local, free, robotic-but-serviceable)…"
brew install piper-tts 2>/dev/null || {
echo "Homebrew install failed. See https://github.com/rhasspy/piper for manual install."
exit 1
}
fi
mapfile -t PAGES < <(.claude/skills/generate/lib/select-pages.sh "$VAULT_DIR" "$TOPIC")
Exit 1 from the helper = no pages matched; surface verbatim.
HASH=$(.claude/skills/generate/lib/source-hash.sh "${PAGES[@]}")
The invoking LLM reads the selected pages and writes a narrative script.md. Two shapes supported:
# Podcast: {{topic}}
_Length target: {{length}} (~{{minutes}} min)._
[HOST]: Welcome. Today we're talking about {{topic}}. Here's why that matters…
[HOST]: First, the basics. According to {{cite: wiki/concepts/attention.md}}, attention is…
[HOST]: …
--two-voice)# Podcast: {{topic}}
[A]: Alright, let's get into {{topic}}.
[B]: Why this, why now?
[A]: Because {{cite: wiki/concepts/rag.md}}…
[B]: Huh. I thought…
[A]: Right, but here's the nuance…
Script-writing rules the LLM follows:
{{cite: path}} — preprocessed to *pagename* before TTS sees them.Templates live at .claude/skills/generate-podcast/templates/{single-host,two-voice}.md and give the LLM a starting shape.
Priority order:
| Priority | Backend | Trigger | Cost | Quality |
|---|---|---|---|---|
| 1 | ElevenLabs | ELEVENLABS_API_KEY set | ~$0.30 per 1k chars | Studio-grade |
| 2 | OpenAI TTS | OPENAI_API_KEY set | ~$0.015 per 1k chars | Very good |
| 3 | Piper (local) | always available once installed | free | Robotic but clean |
if [ -n "$ELEVENLABS_API_KEY" ]; then
TTS_BACKEND="elevenlabs"
elif [ -n "$OPENAI_API_KEY" ]; then
TTS_BACKEND="openai"
else
TTS_BACKEND="piper"
fi
en_US-lessac-medium for [HOST] / [A]; en_GB-alan-medium for [B]. Override with --voice <model>.alloy for HOST/A, onyx for B.Rachel for HOST/A, Adam for B (override with ELEVENLABS_VOICE_A / ELEVENLABS_VOICE_B env vars).Walk the script, split by [HOST] / [A] / [B] tags. For each line:
# Piper example
echo "$LINE_TEXT" | piper \
--model "$VOICE_MODEL" \
--output_file "/tmp/podcast_${i}.wav"
Replace {{cite: path}} with the page's title (or filename stem) before TTS — the listener hears "as attention explains", not the raw path.
Short 250ms silence between lines. Longer 600ms silence when speaker changes in two-voice mode.
# build a concat list
for w in /tmp/podcast_*.wav; do echo "file '$w'" >> /tmp/podcast_list.txt; done
# render MP3
ffmpeg -f concat -safe 0 -i /tmp/podcast_list.txt \
-codec:a libmp3lame -qscale:a 2 \
"$VAULT_DIR/artifacts/podcast/<slug>-<date>.mp3"
VBR q2 is the right quality for voice — bigger files aren't audibly better, smaller noticeably worse.
Before writing the sidecar, check for an existing artifact of the same type and topic:
ARTIFACT_TYPE="podcast"
EXISTING=$(ls "$VAULT_DIR/artifacts/$ARTIFACT_TYPE/"*"$TOPIC_SLUG"*.meta.yaml 2>/dev/null | sort | tail -1)
if [ -n "$EXISTING" ]; then
PREV_VERSION=$(grep '^version:' "$EXISTING" | awk '{print $2}')
PREV_VERSION=${PREV_VERSION:-1}
VERSION=$((PREV_VERSION + 1))
PREV_SLUG=$(basename "$EXISTING" .meta.yaml)
else
VERSION=1
PREV_SLUG=""
fi
The old artifact stays in place — not deleted, not overwritten. Multiple files of the same type + topic = version history. The portal discovers and displays these automatically.
Small fixes (CSS tweaks, typo corrections) should update the file in-place without incrementing the version — use judgement based on whether the content meaningfully changed.
META="${MP3_OUT%.mp3}.meta.yaml"
cat > "$META" <<EOF