Build PRISMA 2020-compliant systematic literature review systems with RAG-powered analysis in VS Code. Use when researcher needs automated paper retrieval (Semantic Scholar, OpenAlex, arXiv), AI-assisted PRISMA screening (50% or 90% threshold), vector database creation (ChromaDB), or research conversation interface. Supports knowledge_repository (comprehensive, 15K+ papers, teaching/exploration) and systematic_review (publication-quality, 50-300 papers, meta-analysis) modes. Conversation-first workflow with 7 stages.
For: Claude Code (AI assistant in VS Code) Purpose: Guide researchers through PRISMA 2020 systematic literature review + RAG-powered analysis
python scholarag_cli.py initWhen researcher provides a ScholaRAG prompt:
<!-- METADATA ... --> at top of prompt)stage fieldconversation_flow)validation_rules)auto_execute: true).claude/context.json (track progress)next_stage.prompt_file)Researcher should NEVER touch terminal. You execute all scripts automatically.
| Stage | Name | Read This File | Duration | Auto-Execute |
|---|---|---|---|---|
| 1 | Research Setup | skills/claude_only/stage1_research_setup.md | 15-20 min | ✅ scholarag init |
| 2 | Query Strategy | skills/claude_only/stage2_query_strategy.md | 15-25 min | ❌ Manual review |
| 3 | PRISMA Config | skills/claude_only/stage3_prisma_config.md | 20-30 min | ❌ Manual review |
| 4 | RAG Design | skills/claude_only/stage4_rag_design.md | 10-15 min | ❌ Manual review |
| 5 | Execution | skills/claude_only/stage5_execution.md | 2-4 hours | ✅ Run all 5 scripts |
| 6 | Research Conversation | skills/claude_only/stage6_research_conversation.md | Ongoing | ❌ Interactive |
| 7 | Documentation | skills/claude_only/stage7_documentation.md | 30-60 min | ✅ Generate PRISMA |
Progressive Disclosure: Load stage file only when researcher enters that stage. Don't preload all 7 stages (token waste).
Two modes available:
| Mode | Threshold | Output | Best For |
|---|---|---|---|
knowledge_repository | 50% (lenient) | 15K-20K papers | Teaching, AI assistant, exploration |
systematic_review | 90% (strict) | 50-300 papers | Meta-analysis, publication |
Quick decision:
systematic_review ✅knowledge_repository ✅Detailed decision tree: skills/reference/project_type_decision_tree.md
When to read decision tree:
Stage 6 branches into 7 specialized conversation scenarios:
When to read: Stage 6 entry (researcher asks "What can I query?")
When errors occur: skills/reference/error_recovery.md
Quick fixes (common issues):
| Error | Quick Fix | Detailed Guide |
|---|---|---|
| Too many papers (>30K) | Refine query in Stage 2, re-run fetch | error_recovery.md §2.1 |
| API key missing | Add ANTHROPIC_API_KEY to .env | error_recovery.md §3.1 |
| Low PDF success (<30%) | Filter for open_access in Stage 1 | error_recovery.md §4.1 |
| All papers excluded (0 papers) | Lower threshold or broaden query | error_recovery.md §3.2 |
Progressive disclosure: Don't preload these. Read only when researcher asks specific questions.
| Topic | File | When to Read |
|---|---|---|
| API endpoints | skills/reference/api_reference.md | Researcher asks about Semantic Scholar, OpenAlex, arXiv |
| Config schema | skills/reference/config_schema.md | Researcher asks "What fields are in config.yaml?" |
| PRISMA checklist | skills/reference/prisma_guidelines.md | Researcher asks about PRISMA 2020 compliance |
| Troubleshooting | skills/reference/troubleshooting.md | Researcher reports errors not in Quick Fixes |
File dependencies: https://www.scholarag.com/codebook/architecture
Key principle: Scripts read from config.yaml (single source of truth), never hardcode values.
Critical scripts (read project_type from config):
03_screen_papers.py: Sets threshold (50% or 90%)07_generate_prisma.py: Changes diagram title ("Knowledge Repository" vs "Systematic Review")If researcher is using OpenAI Codex instead of Claude Code:
See AGENTS.md for bash-based task workflows.
Codex workflow differs:
Universal reference files (Claude + Codex both use):
skills/reference/project_type_decision_tree.mdskills/reference/api_reference.mdskills/reference/config_schema.mdThis file: ~400 lines (loaded once per conversation)
Stage-specific files: ~300-500 lines each (loaded on-demand)
Total per conversation: ~700 lines (this file + current stage file)
Previous approach: ~2,000 lines (all context upfront)
Token reduction: 65% ✅
How it works:
stage1_research_setup.mdstage2_query_strategy.md (Stage 1 file unloaded)All prompts in prompts/*.md contain HTML comment metadata at top:
<!-- METADATA