Find and retrieve proteomics datasets from public repositories including MassIVE and ProteomeXchange (which aggregates PRIDE, PeptideAtlas, jPOST, and iProX). Search by species, keyword, or accession. Get detailed dataset metadata including instruments, publications, species, modifications, and file counts. Use when asked to find proteomics datasets, search for mass spectrometry data, look up ProteomeXchange or MassIVE accessions, or discover publicly available proteomics experiments for a given organism or topic.
Find and retrieve metadata for publicly available proteomics datasets from MassIVE and ProteomeXchange repositories. Supports searching by species, keyword, or accession, and returns detailed dataset metadata including instruments, publications, species, and post-translational modifications.
Triggers:
Use Cases:
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Dataset quality depends on instrument, sample preparation, and quantification method. TMT/iTRAQ (isobaric labeling) datasets have ratio compression and co-isolation interference biases that differ from label-free quantification (LFQ). DIA datasets require different analysis pipelines than DDA. Check the original publication for methods before reusing data in a meta-analysis or cross-study comparison. Instrument resolution (Orbitrap > ion trap) and acquisition mode (DIA > DDA for completeness) directly affect how many proteins are quantified and at what confidence.
| Repository | Coverage | Strengths |
|---|---|---|
| MassIVE | 10,000+ datasets | Rich metadata (summaries, keywords, modifications, contacts), species filtering by taxonomy ID |
| ProteomeXchange | Aggregates PRIDE, MassIVE, PeptideAtlas, jPOST, iProX | Broadest coverage, standardized PXD accessions |
Query (keyword / species / accession)
|
+-- PHASE 0: Input Resolution
| Determine search type: keyword, species, or accession lookup
|
+-- PHASE 1: Repository Search
| Search MassIVE and/or ProteomeXchange based on query type
|
+-- PHASE 2: Dataset Detail Retrieval
| Get full metadata for promising hits
|
+-- PHASE 3: Result Synthesis
Compile datasets with metadata, publications, and relevance assessment
Objective: Determine the query type and prepare appropriate search parameters.
PXD000001, MSV000079514):
ProteomeXchange_get_dataset and optionally MassIVE_get_datasetMassIVE_get_datasetMassIVE_search_datasets with species filterProteomeXchange_search_datasets with query parameterObjective: Find relevant datasets across repositories.
MassIVE_search_datasets:
page_size: Number of results to return (integer, max 100, default 10)species: NCBI taxonomy ID string to filter by species (e.g., "9606" for human)accessions (array), title, summary, species, instruments, keywordsProteomeXchange_search_datasets:
query: Optional search filter -- keyword or dataset accession (e.g., "phosphoproteomics", "PXD")limit: Max results (1-50, default 10){data: [{accession, title, species}], metadata: {source, total_returned, query}}For species-specific search:
MassIVE_search_datasets(page_size=20, species="9606") for species-filtered resultsProteomeXchange_search_datasets(limit=20) for broader listingFor keyword search:
ProteomeXchange_search_datasets(query="keyword", limit=20)For comprehensive discovery:
{data: ...} wrapper){data: [...], metadata: {...}}Objective: Get full metadata for datasets of interest.
MassIVE_get_dataset:
accession: Dataset accession -- accepts both MSV and PXD formats (e.g., "MSV000079514", "PXD003971")accessions, title, summary, species, instruments, keywords, contacts, publications, modificationsProteomeXchange_get_dataset:
px_id: ProteomeXchange identifier in PXD format (e.g., "PXD000001"){data: {px_id, title, species, identifiers, instruments, publications, file_count}, metadata: {...}}ProteomeXchange_get_dataset for file count; use MassIVE_get_dataset for richer summary/keywordsObjective: Compile and present dataset results in a structured format.
# Proteomics Dataset Search Results
**Query**: [original query]
**Date**: YYYY-MM-DD
**Repositories searched**: MassIVE, ProteomeXchange
## Summary
Found N datasets matching [criteria].
## Datasets
### 1. [Title]
- **Accession**: PXD/MSV number
- **Species**: [organism]
- **Instruments**: [MS platforms]
- **Publications**: [PubMed IDs / DOIs]
- **Modifications**: [PTMs if available]
- **Files**: [count if available]
- **Summary**: [brief description]
### 2. [Title]
...
## Data Gaps
[Note any limitations in search coverage]
| Tool | Parameter | Notes |
|---|---|---|
MassIVE_search_datasets | page_size | Integer, max 100. Default 10 |
MassIVE_search_datasets | species | NCBI taxonomy ID as string (e.g., "9606" not 9606) |
MassIVE_get_dataset | accession | Accepts both MSV and PXD formats |
ProteomeXchange_search_datasets | query | Optional keyword or accession filter |
ProteomeXchange_search_datasets | limit | Integer, 1-50 |
ProteomeXchange_get_dataset | px_id | PXD format only (e.g., "PXD000001") |
Response Format Notes:
{data: [...], metadata: {...}}{data: {...}, metadata: {...}}| Situation | Fallback |
|---|---|
| MassIVE search returns empty | Use ProteomeXchange search (broader coverage) |
| ProteomeXchange search returns empty | Try broader/simpler query terms |
| MassIVE_get_dataset fails for PXD accession | Use ProteomeXchange_get_dataset instead |
| Species taxonomy ID unknown | Search ProteomeXchange by keyword (organism name) |
| No keyword search results | Try individual terms instead of multi-word queries |
| Species | Taxonomy ID |
|---|---|
| Human | 9606 |
| Mouse | 10090 |
| Rat | 10116 |
| Zebrafish | 7955 |
| Fruit fly | 7227 |
| C. elegans | 6239 |
| S. cerevisiae | 559292 |
| A. thaliana | 3702 |
| E. coli | 562 |
| Quality Indicator | Good | Acceptable | Caution |
|---|---|---|---|
| Instrument | Orbitrap Exploris/Eclipse, timsTOF | Q Exactive, TripleTOF 6600 | Older LTQ, ion trap only |
| Publication | Peer-reviewed with PubMed ID | Preprint or DOI only | No associated publication |
| Metadata completeness | Species + instrument + PTMs + summary | Species + instrument only | Title only, no annotations |
Interpreting dataset search results:
Synthesis questions to address in the report:
species parameterDataverse_get_datasetpage_size/limit reasonable| Skill | Relationship |
|---|---|
tooluniverse-proteomics-analysis | Use retrieved datasets as input for MS data analysis |
tooluniverse-protein-modification-analysis | Find PTM-specific datasets to complement iPTMnet annotations |
tooluniverse-multi-omics-integration | Discover proteomics datasets for cross-omics integration |