If given a bare accession (SRR/SRX/SRS/SRP/PRJNA/SAMN), map it to the correct NCBI database:
SRR, SRX, SRS, SRP → sra
PRJNA → bioproject, then link to sra
SAMN → biosample, then link to sra
If given a free-text query, use esearch -db sra with structured field tags when helpful:
organism: [Organism]
library strategy: [Strategy]
platform: [Platform]
layout: [Layout]
Run esearch -db sra -query "<term>" to retrieve UIDs.
Pipe to esummary -db sra -format json or efetch -db sra -format runinfo for metadata.
For BioProject-level queries, use esearch -db bioproject | elink -target sra | efetch -format runinfo to get the full run table.
Parse the run info CSV or JSON to extract: Run, SampleName, BioSample, Experiment, LibraryStrategy, LibraryLayout, Platform, spots, bases, size_MB, PublishDate.
Report a summary table of matching runs and key metadata.
If the user wants to download, provide the prefetch or fasterq-dump command with the accession list.
Output Contract
Search query used (exact E-utilities command)
Number of hits found
Summary table with columns: Run (SRR), Sample, BioProject, Strategy, Platform, Layout, Spots, Bases, Size (MB), Date
Full accession list suitable for use with prefetch or fasterq-dump
Saved output files when an output directory is provided:
esearch.json
runinfo.csv (when efetch -format runinfo is used)
accessions.txt (one SRR per line)
Limits
This skill depends on live access to https://eutils.ncbi.nlm.nih.gov/entrez/eutils/.
NCBI rate limit: 3 requests/second without API key, 10/second with API key. Always pass --email for repeated queries.
esummary JSON field names for SRA differ from the runinfo CSV columns; prefer efetch -format runinfo for tabular metadata.
Private or access-controlled SRA datasets (dbGaP) cannot be retrieved without an authorized token.
Very large BioProjects (>10 000 runs) require paging with WebEnv + query_key.
Common failure cases:
confusing SRA database UIDs with run accessions (they are not the same)
querying bioproject instead of sra when an SRR is needed
omitting [Organism] field tag, causing organism name to match free text in other fields
requesting retmax > 10 000 without using history server