Use when parsing, extracting, or converting XML data from NCBI Entrez or other bioinformatics sources into tab-delimited tables. Use for selecting specific elements, filtering records, and restructuring hierarchical XML into flat formats for downstream analysis.
xtract/home/vimalinx/miniforge3/envs/bio/bin/xtractreferences/help.md for complete argument list and examplesefetch, esummary, or elink XML output.# 1) Turn document summaries into a simple two-column table
esearch -db assembly -query 'GCF_000001405.40[accn]' | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element Id AssemblyAccession
# 2) Extract PubMed summary IDs and titles
esearch -db pubmed -query 'ebola virus[Title/Abstract]' | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element Id Title
# 3) Conditional extraction from linked records
elink -db pubmed -id 20210808 -cmd score | \
xtract -pattern LinkSet -max Link/Score
-input; xtract is not useful without structured input.-pattern.-element fields incrementally before reaching for -group, -block, or -if.xtract requires XML input from stdin or -input; otherwise it errors immediately.-pattern defines row boundaries; choosing the wrong pattern is the fastest way to get nonsense tables.-strict when embedded HTML or MathML is polluting the text you want to extract.