Comprehensive citation management for academic research. Search Google Scholar and PubMed for papers, extract accurate metadata, validate citations, and generate properly formatted BibTeX entries. This skill should be used when you need to find papers, verify citation information, convert DOIs to BibTeX, or ensure reference accuracy in scientific writing.
Manage citations systematically throughout the research and writing process. This skill provides tools and strategies for searching academic databases (Google Scholar, PubMed), extracting accurate metadata from multiple sources (CrossRef, PubMed, arXiv), validating citation information, and generating properly formatted BibTeX entries.
Critical for maintaining citation accuracy, avoiding reference errors, and ensuring reproducible research. Integrates seamlessly with the literature-review skill for comprehensive research workflows.
Use this skill when:
When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.
If your document does not already contain schematics or diagrams:
For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.
How to generate schematics:
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
The AI will automatically:
When to add schematics:
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
Citation management follows a systematic process:
Goal: Find relevant papers using academic search engines.
Google Scholar provides the most comprehensive coverage across disciplines.
Basic Search:
# Search for papers on a topic
python scripts/search_google_scholar.py "CRISPR gene editing" \
--limit 50 \
--output results.json
# Search with year filter
python scripts/search_google_scholar.py "machine learning protein folding" \
--year-start 2020 \
--year-end 2024 \
--limit 100 \
--output ml_proteins.json
Advanced Search Strategies (see references/google_scholar_search.md):
"deep learning"author:LeCunintitle:"neural networks"machine learning -surveyBest Practices:
PubMed specializes in biomedical and life sciences literature (35+ million citations).
Basic Search:
# Search PubMed
python scripts/search_pubmed.py "Alzheimer's disease treatment" \
--limit 100 \
--output alzheimers.json
# Search with MeSH terms and filters
python scripts/search_pubmed.py \
--query '"Alzheimer Disease"[MeSH] AND "Drug Therapy"[MeSH]' \
--date-start 2020 \
--date-end 2024 \
--publication-types "Clinical Trial,Review" \
--output alzheimers_trials.json
Advanced PubMed Queries (see references/pubmed_search.md):
"Diabetes Mellitus"[MeSH]"cancer"[Title], "Smith J"[Author]AND, OR, NOT2020:2024[Publication Date]"Review"[Publication Type]Best Practices:
Goal: Convert paper identifiers (DOI, PMID, arXiv ID) to complete, accurate metadata.
For single DOIs, use the quick conversion tool:
# Convert single DOI
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2
# Convert multiple DOIs from a file
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# Different output formats
python scripts/doi_to_bibtex.py 10.1038/nature12345 --format json
For DOIs, PMIDs, arXiv IDs, or URLs:
# Extract from DOI
python scripts/extract_metadata.py --doi 10.1038/s41586-021-03819-2
# Extract from PMID
python scripts/extract_metadata.py --pmid 34265844
# Extract from arXiv ID
python scripts/extract_metadata.py --arxiv 2103.14030
# Extract from URL
python scripts/extract_metadata.py --url "https://www.nature.com/articles/s41586-021-03819-2"
# Batch extraction from file (mixed identifiers)
python scripts/extract_metadata.py --input identifiers.txt --output citations.bib
Metadata Sources (see references/metadata_extraction.md):
CrossRef API: Primary source for DOIs
PubMed E-utilities: Biomedical literature
arXiv API: Preprints in physics, math, CS, q-bio
DataCite API: Research datasets, software, other resources
What Gets Extracted:
Goal: Detect and fill in any missing metadata fields using web search. This phase runs AFTER extraction and BEFORE formatting to ensure every BibTeX entry is complete.
Why This Is Critical: Metadata extraction from APIs (CrossRef, PubMed, arXiv) sometimes returns incomplete records — missing volume, pages, issue number, or DOI. These gaps must be filled before the bibliography is considered ready.
After extracting metadata, scan the BibTeX file for entries missing key fields:
Fields to check per entry type:
| Entry Type | Must Have | Should Have |
|---|---|---|
| @article | author, title, journal, year | volume, pages, number, doi |
| @inproceedings | author, title, booktitle, year | pages, doi |
| @book | author/editor, title, publisher, year | isbn, doi |
| @misc | author, title, year | doi or url |
Any @article entry missing volume, pages, or doi is considered incomplete and must be enriched.
For each incomplete entry, search for the missing information:
Option A — Search by title and author (best for finding DOI):
python scripts/parallel_web.py search \
"FIRST_AUTHOR TITLE JOURNAL_NAME volume pages DOI" \
-o sources/search_YYYYMMDD_HHMMSS_citation_CITATIONKEY.md
Option B — Extract from DOI page (best when DOI is known but volume/pages missing):
python scripts/parallel_web.py extract \
"https://doi.org/10.XXXX/YYYY" \
--objective "extract complete citation metadata: volume, issue, pages, publication date" \
-o sources/extract_YYYYMMDD_HHMMSS_doi_CITATIONKEY.md
Option C — Search CrossRef API directly (programmatic, fast):
python scripts/parallel_web.py search \
"crossref DOI metadata FIRST_AUTHOR TITLE" \
-o sources/search_YYYYMMDD_HHMMSS_crossref_CITATIONKEY.md
Option D — Search Google Scholar (fallback for hard-to-find papers):
python scripts/parallel_web.py search \
"google scholar FIRST_AUTHOR TITLE YEAR complete citation" \
-o sources/search_YYYYMMDD_HHMMSS_scholar_CITATIONKEY.md
After finding the missing metadata:
references.bib[HH:MM:SS] METADATA ENRICHED: [CitationKey] - added volume={X}, pages={Y--Z}, doi={10.XXX/YYY} ✅
If metadata genuinely cannot be found after web search (very old paper, obscure conference, etc.):
note field to the BibTeX entry explaining the gap:
note = {Volume and pages not available — published online only}
[HH:MM:SS] METADATA INCOMPLETE: [CitationKey] - pages unavailable (online-only publication) ⚠️
| Missing Field | Best Search Strategy |
|---|---|
| DOI | Search "AUTHOR TITLE DOI" via parallel_web.py |
| Volume | Extract from DOI page or search "JOURNAL YEAR TITLE volume" |
| Pages | Extract from DOI page or search publisher website |
| Issue/Number | Extract from DOI page or CrossRef |
| Publisher | Search "JOURNAL publisher" or check journal website |
Goal: Generate clean, properly formatted BibTeX entries.
See references/bibtex_formatting.md for complete guide.
Common Entry Types:
@article: Journal articles (most common)@book: Books@inproceedings: Conference papers@incollection: Book chapters@phdthesis: Dissertations@misc: Preprints, software, datasetsRequired Fields by Type:
@article{citationkey,
author = {Last1, First1 and Last2, First2},
title = {Article Title},
journal = {Journal Name},
year = {2024},
volume = {10},
number = {3},
pages = {123--145},
doi = {10.1234/example}
}
@inproceedings{citationkey,
author = {Last, First},
title = {Paper Title},
booktitle = {Conference Name},
year = {2024},
pages = {1--10}
}
@book{citationkey,
author = {Last, First},
title = {Book Title},
publisher = {Publisher Name},
year = {2024}
}
Use the formatter to standardize BibTeX files:
# Format and clean BibTeX file
python scripts/format_bibtex.py references.bib \
--output formatted_references.bib
# Sort entries by citation key
python scripts/format_bibtex.py references.bib \
--sort key \
--output sorted_references.bib
# Sort by year (newest first)
python scripts/format_bibtex.py references.bib \
--sort year \
--descending \
--output sorted_references.bib
# Remove duplicates
python scripts/format_bibtex.py references.bib \
--deduplicate \
--output clean_references.bib
# Validate and report issues
python scripts/format_bibtex.py references.bib \
--validate \
--report validation_report.txt
Formatting Operations:
Goal: Verify all citations are accurate and complete.
# Validate BibTeX file
python scripts/validate_citations.py references.bib
# Validate and fix common issues
python scripts/validate_citations.py references.bib \
--auto-fix \
--output validated_references.bib
# Generate detailed validation report
python scripts/validate_citations.py references.bib \
--report validation_report.json \
--verbose
Validation Checks (see references/citation_validation.md):
DOI Verification:
Required Fields:
Data Consistency:
Duplicate Detection:
Format Compliance:
Validation Output:
{
"total_entries": 150,
"valid_entries": 145,
"errors": [
{
"citation_key": "Smith2023",
"error_type": "missing_field",
"field": "journal",
"severity": "high"
},
{
"citation_key": "Jones2022",
"error_type": "invalid_doi",
"doi": "10.1234/broken",
"severity": "high"
}
],
"warnings": [
{
"citation_key": "Brown2021",
"warning_type": "possible_duplicate",
"duplicate_of": "Brown2021a",
"severity": "medium"
}
]
}
Complete workflow for creating a bibliography:
# 1. Search for papers on your topic
python scripts/search_pubmed.py \
'"CRISPR-Cas Systems"[MeSH] AND "Gene Editing"[MeSH]' \
--date-start 2020 \
--limit 200 \
--output crispr_papers.json
# 2. Extract DOIs from search results and convert to BibTeX
python scripts/extract_metadata.py \
--input crispr_papers.json \
--output crispr_refs.bib
# 3. Add specific papers by DOI
python scripts/doi_to_bibtex.py 10.1038/nature12345 >> crispr_refs.bib
python scripts/doi_to_bibtex.py 10.1126/science.abcd1234 >> crispr_refs.bib
# 4. Format and clean the BibTeX file
python scripts/format_bibtex.py crispr_refs.bib \
--deduplicate \
--sort year \
--descending \
--output references.bib
# 5. Validate all citations
python scripts/validate_citations.py references.bib \
--auto-fix \
--report validation.json \
--output final_references.bib
# 6. Review validation report and fix any remaining issues
cat validation.json
# 7. Use in your LaTeX document
# \bibliography{final_references}
This skill complements the literature-review skill:
Literature Review Skill → Systematic search and synthesis Citation Management Skill → Technical citation handling
Combined Workflow:
literature-review for comprehensive multi-database searchcitation-management to extract and validate all citationsliterature-review to synthesize findings thematicallycitation-management to verify final bibliography accuracy# After completing literature review
# Verify all citations in the review document
python scripts/validate_citations.py my_review_references.bib --report review_validation.json
# Format for specific citation style if needed
python scripts/format_bibtex.py my_review_references.bib \
--style nature \
--output formatted_refs.bib
Finding Seminal and High-Impact Papers (CRITICAL):
Always prioritize papers based on citation count, venue quality, and author reputation:
Citation Count Thresholds:
| Paper Age | Citations | Classification |
|---|---|---|
| 0-3 years | 20+ | Noteworthy |
| 0-3 years | 100+ | Highly Influential |
| 3-7 years | 100+ | Significant |
| 3-7 years | 500+ | Landmark Paper |
| 7+ years | 500+ | Seminal Work |
| 7+ years | 1000+ | Foundational |
Venue Quality Tiers:
Author Reputation Indicators:
Search Strategies for High-Impact Papers:
source:Nature or source:Scienceauthor:LastNameAdvanced Operators (full list in references/google_scholar_search.md):
"exact phrase" # Exact phrase matching