Transform an NIH Biographical Sketch PDF into SciENcv-compatible XML (sciencv.1.3 schema). Use this skill whenever the user wants to convert an NIH biosketch PDF to XML, mentions SciENcv biosketch XML with NIH, asks about converting an NIH biographical sketch to XML, or has an NIH biosketch PDF they need in XML format. Also trigger when the user mentions "NIH biosketch" or "NIH biographical sketch" together with "XML" or "SciENcv", or when they have a biosketch with sections A through D or a Common Form biosketch with a Products section.
Convert an NIH Biographical Sketch PDF into the XML format defined by the SciENcv
1.3 schema (sciencv.1.3.xsd) for import into the NIH SciENcv system.
The user has an NIH Biographical Sketch PDF (generated from SciENcv or filled out manually) and needs an XML version. The PDF may be in either the new Common Form format (effective 01/25/2026) or the legacy A-D section format.
The SciENcv system uses the sciencv.1.3.xsd schema for CV/biosketch XML. This
is the same schema used for DOE biosketches, but the NIH biosketch PDF has a
different section layout.
The reference file references/xml-structure.md in this skill documents the
complete element hierarchy so you typically won't need to fetch the XSD.
Important: The XML output is identical regardless of whether the input PDF uses the new or old format — the same sciencv.1.3 schema applies in both cases. The difference is only in where data appears in the PDF.
Before parsing, determine which PDF format you're working with:
New Common Form (2026+) indicators:
Legacy format (pre-2026) indicators:
Both formats produce identical XML output — the mapping instructions below cover both layouts.
The new format consists of two combined parts in a single PDF:
Part 1 — Biosketch Common Form:
Part 2 — NIH Biographical Sketch Supplement: 5. Personal Statement: Narrative text only (3,500 character limit), no citations 6. Honors: Up to 15 entries 7. Contributions to Science: Up to 5 narrative descriptions (2,000 characters each), no embedded citations 8. Research Support: Ongoing and completed (optional)
Certification page: Final page with certification statement
Look in the current working directory for a PDF file. If there are multiple PDFs, ask the user which one is the NIH biosketch.
NIH biosketches are limited to 5 pages (plus certification page in new format). Read all pages.
Use the indicators above to determine if this is new or legacy format. This affects where you look for citations and products, but not the XML output.
Read references/xml-structure.md for the complete XML element hierarchy.
Here are the key mappings:
Identification maps to <identification>:
<name current="yes"> with <givennames> and <surname>
givennames/surname, NOT firstname/lastnamecurrent attribute is required and should be "yes"<id idtype="orcid"> with the ORCID as text<account accounttype="era"> with the username as textEducation / Professional Preparation maps to <education>:
<degree> elementdegreetype attribute is required (e.g., "PhD", "BS", "MS", "BA", "MD")Date complex type: <year>YYYY</year> (with optional
<month> and <day> children), NOT date strings<major>Personal Statement maps to <statements>:
<statement statementtype="personalstatement">
with the text in <annotation><citation>
elements within the same <statement>Positions / Appointments maps to <employment>:
<position current="yes/no"> elementcurrent attribute is requiredDate type with <year> childrenHonors maps to <distinctions>:
<distinction> element with <description>,
optional <organization>, and optional <date>Products (new format) / Citations within contributions (legacy format)
map to <contributions>:
<citations group="..."> block:
<citation type="journal"> element<citations group="Contribution N"> blocks with <annotation>
but no <citation> children (since citations are in Products)<citations group="Contribution N"> block with <annotation> for the
narrative and <citation> elements for the embedded citationsResearch Support maps to <funding>:
<award> element<funding/> as a self-closing tagAssemble the XML following the schema's required element order:
<profile xmlns="http://www.ncbi.nlm.nih.gov/sciencv">
<identification>...</identification>
<education>...</education>
<employment>...</employment>
<funding>...</funding>
<distinctions>...</distinctions>
<contributions>...</contributions>
<statements>...</statements>
</profile>
Save the XML file with the same base name as the input PDF but with .xml
extension. Tell the user:
Publication parsing applies to both formats. In the new format, citations appear in the Products section; in the legacy format, they appear under contributions and the personal statement.
Common citation formats:
Smith J, Jones A, Lee B. Title of the paper. Journal Name. 2024;15(3):123-145.