Generate a podcast-style in-depth scientific interview that introduces an academic paper. Uses multi-agent analysis (field expert, methods specialist, context historian, critical reviewer, accessibility translator, impact assessor) to prepare rich source material, then an editor agent curates the narrative, and a writer agent composes the final interview between a professional science interviewer and the paper's author. The user drops a PDF of the paper; all supplementary context is gathered via web search and PubMed/bioRxiv. MANDATORY TRIGGERS: "paper interview", "podcast interview for paper", "introduce this paper", "generate interview for this paper", "paper podcast", "deep dive interview", "interview about this paper", "논문 인터뷰", "논문 소개 인터뷰", "paper introduction interview". Also trigger when the user uploads a PDF and asks for a podcast, interview, deep dive, or accessible introduction of a scientific paper.
Generate a dense, engaging, podcast-style interview that introduces an academic paper to scientists in the same field. The interview should provide enough depth for specialists while remaining compelling throughout.
At the very beginning of the workflow, ask the user to choose the output language:
en) — Interview text, UI labels, and PDF in English. Uses Pretendard font.ko, 한국어) — Interview text, UI labels, and PDF in Korean. Uses Pretendard font
(which has full Korean glyph coverage).Store the choice as a variable (e.g., LANG="en" or LANG="ko") and pass it to both
generate_interview.py --language $LANG and typeset_interview.py --language $LANG.
If the user does not express a preference, default to English.
The pipeline has 6 stages:
Read the uploaded PDF. Use pdftotext first; if it produces garbled output, fall back to pypdf:
from pypdf import PdfReader
reader = PdfReader("<paper_pdf_path>")
text = "\n".join(page.extract_text() or "" for page in reader.pages)
Then parse the text to identify these structural elements and save as JSON (paper_structure.json):
{
"title": "...",
"authors": ["..."],
"abstract": "...",
"introduction": "...",
"methods_summary": "...", # first ~2000 chars of methods
"results_summary": "...", # first ~3000 chars of results
"discussion_summary": "...", # first ~2000 chars of discussion
"figures_tables": ["captions"], # extracted figure/table captions (see below)
"has_graphical_abstract": true, # whether the paper has a graphical abstract
"references_sample": ["..."], # first 20 references
"keywords": ["..."]
}
Pay special attention to figures_tables. For each figure and table, extract:
Format each entry as: "Figure 1: [caption] [type: bar chart]"
This metadata feeds into the editor's Visual Plan, which decides where to reference original figures and where to generate explanatory diagrams in the interview.
If the paper has a graphical abstract (common in Cell, Elsevier, and Nature journals),
set has_graphical_abstract: true and include it as the first entry in figures_tables.
Extraction does NOT need to be perfect. Capture the gist of each section; the sub-agents work well even with imperfect extraction. Spend at most 2 tool calls on this step.
Use web_search and PubMed tools to gather context. This material will be injected into each sub-agent so they can make informed, grounded analyses.
Search for the paper by title to find:
Based on the paper's topic and references, run 3-5 targeted searches:
Use the PubMed tool to find:
Save all research findings to background_research.md as structured notes.
Aim for 1-2 pages of relevant context. Do NOT over-research — 5-8 search calls total is
enough. The sub-agents need context, not exhaustive literature review.
This is the heart of the skill. Run the orchestration script that calls the Anthropic API to get diverse analytical perspectives on the paper.
pip install anthropic --break-system-packages -q
python <skill_dir>/scripts/generate_interview.py \
--paper paper_text.txt \
--structure paper_structure.json \
--background background_research.md \
--output-dir agent_outputs/ \
--language $LANG # "en" or "ko"
The --model flag defaults to claude-sonnet-4-20250514 but you can pass any Anthropic
model ID (e.g. --model claude-opus-4-6).
The script will call 6 specialist agents, 1 editor agent, and 1 writer agent.
Read references/agent_prompts.md for the detailed role definitions.
If the script fails (API key not available, network issue, etc.), fall back to the manual agent simulation described below.
If the Python script cannot run, simulate the sub-agents yourself. For each of the 6
specialist roles defined in references/agent_prompts.md, write a focused analysis
(300-500 words each) from that agent's perspective. Save each to a separate file in
the agent_outputs/ directory. Then proceed to Stage 4 yourself.
The 6 specialist roles are:
Whether sub-agent outputs came from the script or manual simulation, now act as the
Editor Agent. Read all 6 analyses and the figure inventory, then produce an
editorial plan. Follow the detailed Editor prompt in references/agent_prompts.md
(Agent 7). The editorial plan must include:
diagram for concepts and figure_ref
for original data). Space them out — consecutive visuals without dialogue between them
break reading rhythm and feel like a slideshow rather than a conversation.Save to editorial_plan.md.
Act as the Writer Agent defined in references/agent_prompts.md (Agent 8).
Using the editorial plan and all agent outputs, compose the final interview. This is the
most important stage — the output quality depends entirely on the writing here.
Key principles (see the full Writer prompt in agent_prompts.md for details):
<diagram> and <figure_ref> blocks between dialogue
turns as specified in the Visual Plan. Diagrams illustrate concepts (like whiteboard
sketches); figure refs point to data in the original paper.Save the final interview to interview_final.md and present to the user.
The final stage converts the markdown interview into a professionally typeset PDF that resembles the layout of high-quality science journals (Nature, Science, Cell). This stage has four sub-steps handled by the typesetting script.
pip install pymupdf typst --break-system-packages -q
npm install @mermaid-js/mermaid-cli # provides mmdc
Parse the markdown interview for <figure_ref> blocks, extract the figure_id from each
(e.g., "Figure 3B", "Graphical Abstract"), and use PyMuPDF to locate and render the
corresponding page from the original paper PDF as a high-resolution PNG (288 DPI).
Strategy for locating figures:
The extracted PNGs are saved to the working directory as fig_<sanitized_id>.png.
Not every figure extraction will succeed (e.g., multi-panel figures spanning pages, or figures with no text-searchable caption). When extraction fails, the typesetter falls back to a styled reference callout box (gold accent bar) that directs the reader to the original paper — this ensures the visual plan is always represented even on extraction failure.
Parse the markdown for <diagram> blocks, extract the Mermaid code from each, and render
to PNG using mermaid-cli (mmdc):
mmdc -i diagram.mmd -o diagram.png -b white -s 2 -w 1200
The -s 2 flag produces 2× scale for crisp diagrams. If mermaid-cli is unavailable or
rendering fails, the typesetter falls back to a monospace code block showing the Mermaid
source with caption.
The build script converts the interview markdown to Typst markup:
**Host**: text... → red-accented host label with body text**[Author Name]**: text... → dark author label with body text<diagram> blocks → #figure(image("diagram_N.png"), caption: [...]) if a rendered
PNG is available, otherwise a styled code block<figure_ref> blocks → #figure(image("fig_X.png"), caption: [...]) with a gold-accent
annotation box if an extracted PNG is available, otherwise a gold-accent callout box
citing the original figure**bold**, *italic*, `code` → Typst equivalentsThe converted markup is injected into the Typst template at
templates/interview.typ, which defines:
ensure_pretendard_fonts() in typeset_interview.py).#c0392b), author labels in dark navy
(#1a1a2e)#figure() with captions#fff8e1 fill, #f9a825 left bar)Compilation uses the typst Python package:
import typst
pdf_bytes = typst.compile("interview.typ")
python <skill_dir>/scripts/typeset_interview.py \
--interview interview_final.md \
--paper <paper_pdf_path> \
--structure paper_structure.json \
--template <skill_dir>/templates/interview.typ \
--output-dir typeset_output/ \
--output interview.pdf \
--mmdc ./node_modules/.bin/mmdc \
--language $LANG # "en" or "ko" — must match generate_interview.py
If the script fails, you can manually perform each sub-step:
Extract figures: Use PyMuPDF interactively:
import fitz
doc = fitz.open("paper.pdf")
page = doc[3] # page containing Figure 1
pix = page.get_pixmap(matrix=fitz.Matrix(2, 2))
pix.save("fig_figure_1.png")
Render diagrams: Save each Mermaid block to a .mmd file and run:
./node_modules/.bin/mmdc -i diagram.mmd -o diagram.png -b white -s 2
Build Typst file: Copy the template, replace PARAM_* placeholders with actual
metadata, replace the INTERVIEW_CONTENT_START/END block with hand-converted Typst
markup, and compile:
import typst
pdf_bytes = typst.compile("interview.typ")
open("interview.pdf", "wb").write(pdf_bytes)
Before presenting the final output, verify these key areas:
Content quality — Opening hook is compelling (not generic); all Must-Include Topics are covered; technical claims match the paper (no hallucinated results); dialogue feels natural with varied exchange lengths; host asks at least 2 challenging questions; author acknowledges at least 1 limitation; length is 3000-5000 words.
Visuals — 3-6 visual elements embedded (at least 1 diagram + 1 figure ref); all Mermaid diagrams use valid syntax; figure refs cite correct IDs from the paper; no two visuals appear back-to-back without dialogue between them.
PDF output — Compiles without Typst errors; figures and diagrams appear at correct positions; metadata (title, authors, journal, DOI) is accurate in the header; layout has clean typography and spacing.
--language en or --language ko) controls: (a) the editor
and writer agent prompts (Korean output when ko), (b) PDF UI labels and footer
text, (c) the Typst lang attribute, and (d) font priority order. The Pretendard
font family supports both Latin and Korean glyphs and is auto-downloaded by the
typeset script if not already cached.