Extract bibliography/references from PDF files using GROBID and return a Collection of Notes (one per reference).
Extract bibliography/references from PDF files using GROBID. Returns a Collection of Notes, where each Note contains structured metadata for one reference (compatible with format-citation).
path: PDF file path (absolute) or Note ID containing PDF URL/metadata (required)grobid_url: Optional GROBID server URL (from world_config)Success (status: "success"):
resource_id: Collection ID containing Notes (one Note per reference)data: Structured reference metadata (title, authors, year, venue, doi, url)metadata: Source PDF, reference index, raw citation text<bibl> elementsformat-citation toolsemantic-scholar), looks up pdf_url from the Note's tool_metadata automatically — no manual metadata extraction neededThis is the right tool for bibliography/reference extraction from papers. It uses GROBID structural parsing (deterministic, fast) rather than LLM extraction (slow, lossy). Prefer this over extract or map(extract) with citation-related instructions.
Common workflow with semantic-scholar:
semantic-scholar → $paper Collectionget_items("$paper")[0] → Note ID (contains pdf_url in tool_metadata)extract-references(path=note_id) → $refs Collection of structured citation Notes{"title": "...", "authors": [...], "year": 2020, "venue": "..."}get_text(note_id) + json.loads() in PythonDo NOT:
$binding string as path — pass the actual Note ID from get_items()pluck(field="text") on result Notes — content is JSON, not plain textextract or map(extract) for reference lists — this tool is faster, structured, and deterministic{"type":"extract-references","path":"/path/to/paper.pdf","out":"$refs"}
{"type":"extract-references","path":"Note_1234","out":"$refs"}
{"type":"format-citation","target":"$refs","format":"bibtex","out":"$bibtex"}
Full semantic-scholar pipeline:
{"type":"semantic-scholar","query":"attention is all you need","limit":1,"out":"$paper"}
items = get_items("$paper")
r = tool("extract-references", path=items[0], out="$refs")
Edit PDFs with natural-language instructions using the nano-pdf CLI.