Automated scoping literature review generator for Health Informatics. Given a topic, an agent with web_search executes 8 sequential stages (query → search → screen → snowball → extract → rank → synthesize → write) to produce an 8-paper scoping review markdown draft aligned to the HINF 5016 grading rubric. Use when the user asks for a literature review draft, scoping review, or HINF 5016 review on a Health Informatics topic.
You are the executor of an 8-stage scoping literature review pipeline. The user supplies a Health Informatics topic; you produce a Markdown draft covering exactly 8 peer-reviewed papers, structured to match the HINF 5016 grading rubric (§6.3).
You must use your own native web_search tool for every stage that needs the open web. This skill ships no code, no API client, no scraper. All discovery, snowball, and full-text retrieval go through the agent harness's built-in web search.
In scope:
runs/<timestamp>/.runs/<timestamp>/08_literature_review.md.Out of scope:
Ask the user (or read from their request):
| Argument | Required | Description |
|---|---|---|
| Topic | yes | Free-form Health Informatics topic, e.g. "FHIR-based interoperability for EHR data exchange". |
| Title | no | Defaults to Scoping Review: {topic}. |
| Output dir | no | Defaults to runs/<UTC-timestamp>/. Create the directory before stage 1. |
The paper count is fixed at 8 to match the HINF 5016 assignment. The ranking, writer, and verification stages are all hard-coded around N=8; do not accept a user override.
If any of these are missing and the topic is unclear, ask one consolidated question, then proceed.
Run the stages sequentially. Each stage reads its prompt from prompts/, fills the placeholders from the previous stages' outputs, and writes its result to a numbered file under the run directory. Do not skip stages. Do not parallelize.
| # | Stage | Prompt | Output file | web_search? |
|---|---|---|---|---|
| 1 | Query builder | prompts/01_query_builder.md | 01_search_plan.json | no |
| 2 | Search (PubMed + Scholar) | prompts/02_search.md | 02_candidates.json | yes |
| 3 | Title/abstract screening | prompts/03_screen.md | 03_screening.json | no |
| 4 | One-round snowball | prompts/04_snowball.md | 04_snowball.json | yes |
| 5 | 4-field extraction | prompts/05_extract.md | 05_extractions.jsonl (one line per paper) | yes |
| 6 | Rank to exactly 8 | prompts/06_rank.md | 06_ranking.json | no |
| 7 | Thematic synthesis | prompts/07_synthesize.md | 07_synthesis.json | no |
| 8 | Review writer | prompts/08_write_review.md | 08_literature_review.md | no |
Stage details:
Each prompt uses mustache-style placeholders like {{TOPIC}} or {{CANDIDATES_JSON}}. Replace them with the relevant content (the user's topic, or a JSON dump of the previous stage's output) before reading the prompt as your instructions.
Stage 1 — Query builder. Read prompts/01_query_builder.md. Substitute {{TOPIC}}. Produce a search plan as JSON: keywords, MeSH terms, boolean query, inclusion/exclusion bullets, year range, language. Save to 01_search_plan.json.
Stage 2 — Search. Read prompts/02_search.md. Substitute {{SEARCH_PLAN_JSON}} with stage 1's content. Run 3–6 distinct web_search queries spanning PubMed and Google Scholar; vary keyword combinations for recall. Target 20–40 unique candidates. Record every candidate's title, authors, year, venue, DOI, URL, abstract, and source_db. Also record the exact query strings you sent in queries_used. Save to 02_candidates.json.
Stage 3 — Screen. Read prompts/03_screen.md. Substitute {{TOPIC}}, {{INCLUSION_BULLETS}}, {{EXCLUSION_BULLETS}}, {{CANDIDATES_JSON}}. For every candidate output decision (include/exclude/maybe), reason, relevance_score. Save to 03_screening.json.
Stage 4 — Snowball. Pick the highest-relevance included paper as seed. Read prompts/04_snowball.md. Substitute {{SEED_JSON}} and {{TOPIC}}. Run web_search to find forward citations (papers citing the seed) and backward references (papers the seed cites). One round only; 10–20 new candidates total. Save to 04_snowball.json.
Stage 5 — Extract. Build the extraction pool deterministically (no subjective re-ranking):
decision == "include", sorted by relevance_score descending.Then for each paper in the pool, read prompts/05_extract.md, substitute {{PAPER_JSON}}, and use web_search to retrieve full text from PMC / publisher / arXiv. Output the four fields with verbatim quote_span excerpts AND persist the actual retrieved text (trimmed) into retrieved_text so the verifier can check spans from disk. If full text is paywalled, set source_type = "abstract_only" and extract from the abstract. Append one JSON object per paper as a line to 05_extractions.jsonl.
Stage 6 — Rank. Read prompts/06_rank.md. Substitute {{TOPIC}} and {{EXTRACTIONS_JSON}}. Pick exactly 8 papers honoring the diversity constraints (no shared first author; ≤2 papers per dataset; ≤3 papers per method family; ≥3 distinct theme tags). Output selected (with rank 1..8 and theme_tag), rejected (with one-sentence reasons), and diversity_notes. Save to 06_ranking.json.
Stage 7 — Synthesize. Read prompts/07_synthesize.md. Substitute {{EXTRACTIONS_AND_RANK_JSON}}. Produce themes (2–4 groups covering all 8 papers), similarity matrix (6–12 informative pairs), gaps (3–6), and cross-paper observations (3–6 sentences). Save to 07_synthesis.json.
Stage 8 — Write. Read prompts/08_write_review.md. Substitute {{TOPIC}}, {{TITLE}}, {{INCLUSION_BULLETS}}, {{EXCLUSION_BULLETS}}, {{PRISMA_JSON}}, {{EXTRACTIONS_JSON}}, {{SYNTHESIS_JSON}}. Produce the full Markdown manuscript. Use the section structure verbatim. Vancouver references numbered by first appearance. Save to 08_literature_review.md.
Alongside the 8 numbered files, also write to the run directory:
00_input.json — the user's topic, title, output dir, and the start timestamp.prisma_flow.json — counts: identified (stage 2 + 4), screened (stage 3 total), eligible (stage 5 attempted), included (stage 6 = 8). Plus quote_span_verified / quote_span_total after the verification step.search_log.json — every web_search query string you ran across stages 2, 4, 5, with the stage label and result count.05_extractions.jsonl is itself part of the audit bundle and must contain a retrieved_text field on every line — the verifier reads from this file alone, so quote-span checks remain reproducible after the agent session ends.
You do not need a token_usage.json since the agent harness owns token counting; if the harness exposes a way to read usage, write it, otherwise omit.
06_ranking.json.selected has length 8. 08_literature_review.md has eight ### 3.1 … ### 3.8 subsections.08_literature_review.md headings appear in this order: # {title} → ## Abstract → ## 1. Introduction → ## 2. Methods → ### 2.1 Eligibility criteria → ### 2.2 Information sources → ### 2.3 Study selection → ## 3. Results → ### 3.1 … ### 3.8 → ## 4. Discussion → ## 5. Conclusions → ## References.[n] numbers in ## References are in order of first appearance in the body.05_extractions.jsonl, read its retrieved_text field from the file and check that each of the four quote_span values is a substring of that retrieved_text after this normalization: lowercase both sides, collapse runs of whitespace to a single space, strip leading/trailing whitespace. Do not rely on agent memory — the verifier must be reproducible from disk alone. Record pass/fail counts across the 8 selected papers (5_extractions.jsonl × 4 fields = 32 spans) in prisma_flow.json as quote_span_verified / quote_span_total. If any span fails, list the failing paper_id and field name in your final summary so the user knows which to manually check.06_ranking.json.diversity_notes should explain how the constraints were applied. Spot-check that no two of the 8 papers share a first author.After verification, report back in 4–6 lines:
08_literature_review.md.05_extractions.jsonl must trace back to a real web_search hit. If the search returned nothing usable, report that and stop — do not invent.quote_span is a verbatim string from text the agent actually saw. The verification step exists precisely to catch this.