Forward and backward citation chaining techniques for literature search
Master forward and backward citation chaining to systematically discover relevant literature by following the threads of scholarly communication.
Citation chaining (also called citation tracking, pearl growing, or snowball searching) exploits the connections between papers through their references and citations. Starting from one or more "seed" papers, you trace connections in two directions:
This approach is especially powerful when keyword searches fail (e.g., when terminology varies across subfields or when concepts predate standardized vocabulary).
Select 3-5 highly relevant papers that are central to your research question. Good seed papers are:
Examine the reference list of each seed paper and identify which cited works are relevant.
import requests
HEADERS = {"User-Agent": "ResearchPlugins/1.0 (https://wentor.ai)"}
def get_references(work_id):
"""Get all references of a paper via OpenAlex."""
url = f"https://api.openalex.org/works/{work_id}"
response = requests.get(url, headers=HEADERS)
paper = response.json()
ref_ids = paper.get("referenced_works", [])
references = []
for ref_id in ref_ids:
ref = requests.get(f"https://api.openalex.org/works/{ref_id.split('/')[-1]}", headers=HEADERS).json()
if ref.get("title"):
references.append(ref)
return references
# Get references of a seed paper
seed_id = "W2741809807"
references = get_references(seed_id)
# Sort by citation count to find the most influential foundations
references.sort(key=lambda p: p.get("cited_by_count", 0), reverse=True)
for ref in references[:15]:
print(f"[{ref.get('publication_year', '?')}] {ref['title']} ({ref.get('cited_by_count', 0)} citations)")
Find all papers that have cited your seed paper.
def get_citations(work_id, limit=200):
"""Get papers citing a given paper via OpenAlex."""
all_citations = []
page = 1
while len(all_citations) < limit:
response = requests.get(
"https://api.openalex.org/works",
params={
"filter": f"cites:{work_id}",
"sort": "cited_by_count:desc",
"per_page": min(200, limit - len(all_citations)),
"page": page
},
headers=HEADERS
)
results = response.json().get("results", [])
if not results:
break
all_citations.extend(results)
page += 1
return all_citations
citations = get_citations(seed_id)
# Filter for recent, well-cited papers
recent_impactful = [c for c in citations if c.get("publication_year", 0) >= 2022 and c.get("cited_by_count", 0) >= 5]
recent_impactful.sort(key=lambda p: p.get("cited_by_count", 0), reverse=True)
Two advanced techniques extend basic citation chaining:
| Technique | Definition | What It Reveals |
|---|---|---|
| Co-citation | Two papers are frequently cited together by the same set of subsequent papers | Conceptual proximity: these works form a shared intellectual foundation |
| Bibliographic coupling | Two papers share many of the same references | Methodological or topical similarity at the time of writing |
def find_co_cited_papers(paper_ids, min_co_citation_count=3):
"""Find papers frequently co-cited with the given papers."""
from collections import Counter
reference_counts = Counter()
for pid in paper_ids:
refs = get_references(pid)
for ref in refs:
ref_id = ref.get("paperId")
if ref_id and ref_id not in paper_ids:
reference_counts[ref_id] += 1
# Papers cited by multiple seeds are co-cited candidates
co_cited = [(pid, count) for pid, count in reference_counts.items()
if count >= min_co_citation_count]
co_cited.sort(key=lambda x: x[1], reverse=True)
return co_cited
Repeat the process with the most relevant papers discovered in each round:
| Tool | Method | Cost |
|---|---|---|
| Google Scholar "Cited by" | Forward chaining | Free |
| Web of Science "Cited References" / "Times Cited" | Both directions | Subscription |
| Scopus "References" / "Cited by" | Both directions | Subscription |
| OpenAlex API | Programmatic, both directions | Free |
| Connected Papers (connectedpapers.com) | Visual co-citation graph | Free (limited) |
| Litmaps (litmaps.com) | Visual citation network | Free tier |
| CoCites (cocites.com) | Co-citation analysis | Free |
| Citation Gecko | Seed-based discovery | Free |