Use when converting PubMed Central article XML into BioC collection XML for downstream text-mining or annotation pipelines.
EDirect shell pipeline that converts PMC <article> XML into BioC-style collection / document / passage XML. It lifts front-matter metadata into the first passage, then emits abstract, title, and body passages for downstream text-mining workflows.
pmc2bioc/home/vimalinx/miniforge3/envs/bio/bin/pmc2bioc/home/vimalinx/miniforge3/envs/bio/bin to PATH so xtract, transmute, and related EDirect helpers are available# 1) Convert a live PMC article fetch into BioC XML
export PATH=/home/vimalinx/miniforge3/envs/bio/bin:$PATH
efetch -db pmc -id 6260607 -format xml | pmc2bioc > article.bioc.xml
# 2) Convert XML streamed from a PMC OA tarball
export PATH=/home/vimalinx/miniforge3/envs/bio/bin:$PATH
tar -xOzf oa_bundle.tar.gz --to-stdout | pmc2bioc > batch.bioc.xml
# 3) Inspect the first BioC passages before batch processing
export PATH=/home/vimalinx/miniforge3/envs/bio/bin:$PATH
efetch -db pmc -id 6260607 -format xml | pmc2bioc | sed -n '1,40p'
PATH so sibling utilities resolve.passage blocks to confirm the BioC shape your downstream tool expects.WORK IN PROGRESS, so treat the emitted BioC schema as practical but not deeply stabilized.pmc2bioc -help still runs the conversion pipeline and only errors because no XML input was supplied.PATH, the live failure is xtract: command not found plus transmute: command not found.No data supplied to xtract from stdin or file.<article> XML and emits BioC XML; it is not a generic PMC metadata lister or PubMed citation converter.