Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.). Supports both skimming (search + abstracts) and deep reading (full PDF analysis).
Conduct systematic, comprehensive literature reviews following rigorous academic methodology. Search multiple literature databases, synthesize findings thematically, verify all citations for accuracy, and generate professional output documents in markdown and PDF formats.
This skill integrates with multiple scientific skills for database access (gget, bioservices, datacommons-client) and provides specialized tools for citation verification, result aggregation, and document generation.
NEW: Continuous Literature Search & Deep Reading
Use this skill when:
⚠️ MANDATORY: Every literature review MUST include at least 1-2 AI-generated figures using the scientific-schematics skill.
This is not optional. Literature reviews without visual elements are incomplete. Before finalizing any document:
How to generate figures:
How to generate schematics:
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
The AI will automatically:
When to add schematics:
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
Literature reviews follow a structured, multi-phase workflow:
Define Research Question: Use PICO framework (Population, Intervention, Comparison, Outcome) for clinical/biomedical reviews
Establish Scope and Objectives:
Develop Search Strategy:
Set Inclusion/Exclusion Criteria:
Multi-Database Search with paper_search.py:
Use the integrated paper_search.py script to search multiple databases simultaneously:
# Basic search across all sources
python scripts/paper_search.py "CRISPR gene editing" --max 20 --dedupe
# Search with year filter (Semantic Scholar)
python scripts/paper_search.py "machine learning" --year "2020-2024" --max 15
# Only show papers with PDF links
python scripts/paper_search.py "deep learning" --pdf-only --format markdown
# Search specific sources
python scripts/paper_search.py "neural networks" --sources semantic,arxiv --output results.md
# JSON output for further processing
python scripts/paper_search.py "quantum computing" --format json --output papers.json
Options:
--sources LIST: Comma-separated sources (semantic, arxiv, crossref)--max N: Max results per source (default: 10)--year YEAR: Year filter for Semantic Scholar (e.g., '2020', '2020-2024')--format FORMAT: Output format (markdown, json)--output FILE: Save results to file--dedupe: Remove duplicate papers--pdf-only: Only show papers with PDF linksSupported Databases:
Individual Database Search:
Select databases appropriate for the domain:
Biomedical & Life Sciences:
gget skill: gget search pubmed "search terms" for PubMed/PMCgget skill: gget search biorxiv "search terms" for preprintsbioservices skill for ChEMBL, KEGG, UniProt, etc.General Scientific Literature:
Specialized Databases:
gget alphafold for protein structuresgget cosmic for cancer genomicsdatacommons-client for demographic/statistical dataDocument Search Parameters:
## Search Strategy
### Database: PubMed
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**:
("CRISPR"[Title] OR "Cas9"[Title]) AND ("sickle cell"[MeSH] OR "SCD"[Title/Abstract]) AND 2015:2024[Publication Date]
- **Results**: 247 articles
Repeat for each database searched.
Export and Aggregate Results:
scripts/search_databases.py for post-processing:
python search_databases.py combined_results.json \
--deduplicate \
--format markdown \
--output aggregated_results.md
Deduplication:
python search_databases.py results.json --deduplicate --output unique_results.json
Title Screening:
Abstract Screening:
Full-Text Screening:
Create PRISMA Flow Diagram:
Initial search: n = X
├─ After deduplication: n = Y
├─ After title screening: n = Z
├─ After abstract screening: n = A
└─ Included in review: n = B
Extract Key Data from each included study:
Assess Study Quality:
Organize by Themes:
Create Review Document from template:
cp assets/review_template.md my_literature_review.md
Write Thematic Synthesis (NOT study-by-study summaries):
Example structure:
#### 3.3.1 Theme: CRISPR Delivery Methods
Multiple delivery approaches have been investigated for therapeutic
gene editing. Viral vectors (AAV) were used in 15 studies^1-15^ and
showed high transduction efficiency (65-85%) but raised immunogenicity
concerns^3,7,12^. In contrast, lipid nanoparticles demonstrated lower
efficiency (40-60%) but improved safety profiles^16-23^.
Critical Analysis:
Write Discussion:
CRITICAL: All citations must be verified for accuracy before final submission.
Verify All DOIs:
python scripts/verify_citations.py my_literature_review.md
This script:
Review Verification Report:
Format Citations Consistently:
references/citation_styles.md)Generate PDF:
python scripts/generate_pdf.py my_literature_review.md \
--citation-style apa \
--output my_review.pdf
Options:
--citation-style: apa, nature, chicago, vancouver, ieee--no-toc: Disable table of contents--no-numbers: Disable section numbering--check-deps: Check if pandoc/xelatex are installedReview Final Output:
Quality Checklist:
Access via gget skill:
# Search PubMed
gget search pubmed "CRISPR gene editing" -l 100
# Search with filters
# Use PubMed Advanced Search Builder to construct complex queries
# Then execute via gget or direct Entrez API
Search tips:
"sickle cell disease"[MeSH][Title], [Title/Abstract], [Author]2020:2024[Publication Date]Access via gget skill:
gget search biorxiv "CRISPR sickle cell" -l 50
Important considerations:
Access via direct API or WebFetch:
# Example search categories:
# q-bio.QM (Quantitative Methods)
# q-bio.GN (Genomics)
# q-bio.MN (Molecular Networks)
# cs.LG (Machine Learning)
# stat.ML (Machine Learning Statistics)
# Search format: category AND terms
search_query = "cat:q-bio.QM AND ti:\"single cell sequencing\""
Access via direct API (requires API key, or use free tier):
Use appropriate skills:
bioservices skill for chemical bioactivitygget or bioservices skill for protein informationbioservices skill for pathways and genesgget skill for cancer mutationsgget alphafold for protein structuresgget or direct API for experimental structuresExpand search via citation networks:
Forward citations (papers citing key papers):
Backward citations (references from key papers):
Detailed formatting guidelines are in references/citation_styles.md. Quick reference:
Always verify citations with verify_citations.py before finalizing.
Complete workflow for a biomedical literature review:
# 1. Create review document from template
cp assets/review_template.md crispr_sickle_cell_review.md
# 2. Search multiple databases using appropriate skills
# - Use gget skill for PubMed, bioRxiv
# - Use direct API access for arXiv, Semantic Scholar
# - Export results in JSON format
# 3. Aggregate and process results
python scripts/search_databases.py combined_results.json \
--deduplicate \
--rank citations \
--year-start 2015 \
--year-end 2024 \
--format markdown \
--output search_results.md \
--summary
# 4. Screen results and extract data
# - Manually screen titles, abstracts, full texts
# - Extract key data into the review document
# - Organize by themes
# 5. Write the review following template structure
# - Introduction with clear objectives
# - Detailed methodology section
# - Results organized thematically
# - Critical discussion
# - Clear conclusions
# 6. Verify all citations
python scripts/verify_citations.py crispr_sickle_cell_review.md
# Review the citation report
cat crispr_sickle_cell_review_citation_report.json
# Fix any failed citations and re-verify
python scripts/verify_citations.py crispr_sickle_cell_review.md
# 7. Generate professional PDF
python scripts/generate_pdf.py crispr_sickle_cell_review.md \
--citation-style nature \
--output crispr_sickle_cell_review.pdf
# 8. Review final PDF and markdown outputs
This skill works seamlessly with other scientific skills:
Scripts:
scripts/paper_search.py: Multi-database search (Semantic Scholar, arXiv, CrossRef) with PDF links. NEW: Full abstracts by default (no truncation)scripts/deep_read.py: NEW Full PDF text extraction and structured deep reading notesscripts/verify_citations.py: Verify DOIs and generate formatted citationsscripts/generate_pdf.py: Convert markdown to professional PDFscripts/search_databases.py: Process, deduplicate, and format search resultsReferences:
references/citation_styles.md: Detailed citation formatting guide (APA, Nature, Vancouver, Chicago, IEEE)references/database_strategies.md: Comprehensive database search strategiesAssets:
assets/review_template.md: Complete literature review template with all sectionsGuidelines:
Tools:
Citation Styles:
pip install requests # For citation verification
# For PDF generation
brew install pandoc # macOS
apt-get install pandoc # Linux
# For LaTeX (PDF generation)
brew install --cask mactex # macOS
apt-get install texlive-xetex # Linux
Check dependencies:
python scripts/generate_pdf.py --check-deps
Literature retrieval runs throughout all research phases — not just Phase 1:
# Phase 1 broad search - full abstracts, no truncation
python scripts/paper_search.py "your research topic" --max 30 --dedupe --output phase1_results.md
# Filter for papers with PDF links only
python scripts/paper_search.py "topic" --pdf-only --max 20
# Verify specific hypothesis
python scripts/paper_search.py "specific hypothesis OR claim" --year "2022-2024" --max 10
python scripts/paper_search.py "method OR technique name" --sources semantic,arxiv --max 15
python scripts/paper_search.py "unexpected result explanation" --max 10
# Search for specific claims
python scripts/paper_search.py "specific claim keywords" --year "2020-2024" --max 10
The system NEVER fabricates citations. When retrieval fails:
abstract_only, download_failed, or user_providedLiterature retrieval report:
Full text obtained: K papers ✅
Abstract only / download failed: M papers ⚠️
Papers I could not access:
1. [Author et al., Year] "Title" — URL: [link]
2. ...
To help me read these, you can:
A) Download PDFs manually
B) Provide key findings summary
C) Tell me which to skip
Go beyond skimming abstracts to full paper comprehension.
# Full paper deep reading with structured notes
python scripts/deep_read.py paper.pdf --output reading_notes.md
# Extract only (no analysis)
python scripts/deep_read.py paper.pdf --extract-only
# Extract specific section
python scripts/deep_read.py paper.pdf --section abstract
python scripts/deep_read.py paper.pdf --section methods
python scripts/deep_read.py paper.pdf --section results
# JSON format output
python scripts/deep_read.py paper.pdf --format json --output paper_analysis.json
The deep_read.py script generates structured notes:
# Deep Reading Notes
## Paper Information
- **Title**: [extracted title]
- **DOI**: [verified DOI link]
- **Year**: [publication year]
- **Keywords**: [extracted keywords]
## Paper Structure
- **Sections**: [list of detected sections]
- **Figures**: [figure references]
- **Tables**: [table references]
## Abstract
[Full abstract text - NO TRUNCATION]
## Full Text Content
[Complete extracted text from PDF]
## Analysis Prompts
1. **Main Contribution**: What is the primary contribution?
2. **Methodology**: What methods are used?
3. **Key Findings**: Main results and support?
4. **Limitations**: Acknowledged and identified?
5. **Relevance**: How does this relate to your research?
6. **Future Work**: Suggested directions?
7. **Questions**: What questions does this raise?
## Citation
[Formatted citation placeholder]
| Task | Tool | Use Case |
|---|---|---|
| Skim | paper_search.py | Quick overview, find relevant papers |
| Deep Read | deep_read.py | Full comprehension, detailed analysis |
# Workflow: Skim → Select → Deep Read
# 1. Skim: Find relevant papers with full abstracts
python scripts/paper_search.py "topic" --max 30 --output skim_results.md
# 2. Select: Identify top 5-10 most relevant papers
# 3. Deep Read: For each selected paper
python scripts/deep_read.py paper1.pdf --output paper1_notes.md
python scripts/deep_read.py paper2.pdf --output paper2_notes.md
# ...
For deep reading, install one of:
pip install PyPDF2 # Basic extraction
pip install PyMuPDF # Better quality (recommended)
This literature-review skill provides:
| Task | Command |
|---|---|
| Skim (search) | python scripts/paper_search.py "query" --max 20 |
| Deep read (PDF) | python scripts/deep_read.py paper.pdf --output notes.md |
| Verify citations | python scripts/verify_citations.py review.md |
| Generate PDF | python scripts/generate_pdf.py review.md --output review.pdf |
Conduct thorough, rigorous literature reviews that meet academic standards and provide comprehensive synthesis of current knowledge in any domain.