Process genome assembly datasets for VEuPathDB resources
This skill guides processing of genome assembly datasets for VEuPathDB resources.
This workflow requires the following repositories in veupathdb-repos/:
First, run the repository status check to verify repositories are present:
Note: this script is located in the skill directory
bash scripts/check-repos.sh ApiCommonPresenters EbrcModelCommon
If repositories are missing, the script will provide clone instructions.
Branch Confirmation: After verifying repositories exist, check their current branches and status using git -C <path>, then confirm with the user before proceeding. Users typically create dataset-specific branches (see curator branching guidelines).
Example:
git -C veupathdb-repos/ApiCommonPresenters branch --show-current
git -C veupathdb-repos/ApiCommonPresenters status -sb
IMPORTANT: All commands in this workflow must be run from your curation workspace directory (the directory that contains veupathdb-repos/ as a subdirectory).
For Claude Code:
cd commands to change into veupathdb-repos/ subdirectoriesgit -C <path> for git operations in subdirectoriesgit -C veupathdb-repos/ApiCommonPresenters status instead of cd veupathdb-repos/ApiCommonPresenters && git statusThe workflow will create a tmp/ subdirectory in the curation workspace directory for intermediate files.
Gather the following before starting:
GCA_000988875.2 including version)Fetch assembly metadata from NCBI using the GenBank accession.
Command:
curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2/genome/accession/<ASSEMBLY_ACCESSION>/dataset_report" \
-H "Accept: application/json" > tmp/<ASSEMBLY_ACCESSION>_dataset_report.json
Detailed instructions: Step 1 - Fetch NCBI Metadata
Extract the BioProject accession from the assembly report and fetch additional details.
Command:
node scripts/fetch-bioproject.js <BIOPROJECT_ACCESSION>
This retrieves the BioProject title and description, saved to tmp/<BIOPROJECT>_bioproject.json.
Detailed instructions: Step 2 - Fetch BioProject
Find and fetch publications for the genome assembly.
Command:
node scripts/fetch-pubmed.js <ASSEMBLY_ACCESSION>
Results saved to tmp/<ASSEMBLY_ACCESSION>_pubmed.json.
Detailed instructions: Step 3 - Fetch PubMed
Identify and curate contact entries for the genome submission.
Contact identification priority:
Actions:
veupathdb-repos/EbrcModelCommon/Model/lib/xml/datasetPresenters/contacts/allContacts.xmlDetailed instructions: Step 4 - Curate Contacts
Generate the datasetPresenter XML and insert it into the appropriate presenter file.
Command:
node scripts/generate-presenter-xml.js <ASSEMBLY_ACCESSION> <PROJECT> <PRIMARY_CONTACT_ID> [ADDITIONAL_CONTACT_IDS...]
Target file: veupathdb-repos/ApiCommonPresenters/Model/lib/xml/datasetPresenters/<PROJECT>.xml
Detailed instructions: Step 5 - Update Presenter Files
After completing this workflow:
scripts/fetch-bioproject.js - Fetches BioProject metadata from NCBI (esearch + esummary)scripts/fetch-pubmed.js - Fetches PubMed records linked to a BioProject (elink + esummary)scripts/generate-presenter-xml.js - Generates datasetPresenter XML from fetched metadatascripts/check-repos.sh - Validates veupathdb-repos/ repository setup (synced from shared/)