Comprehensive metabolomics research skill for identifying metabolites, analyzing studies, and searching metabolomics databases. Integrates HMDB (220k+ metabolites), MetaboLights, Metabolomics Workbench, and PubChem. Use when asked to identify or annotate metabolites (HMDB IDs, chemical properties, pathways), retrieve metabolomics study information from MetaboLights (MTBLS*) or Metabolomics Workbench (ST*), search for studies by keywords or disease, or generate comprehensive metabolomics research reports.
mims-harvard1,271 Sterne29.03.2026
Beruf
Kategorien
Wissenschaftliches Rechnen
Skill-Inhalt
Comprehensive metabolomics research skill that identifies metabolites, analyzes studies, and searches metabolomics databases. Generates structured research reports with annotated metabolite information, study details, and database statistics.
Use Case
Use this skill when asked to:
Identify or annotate metabolites (HMDB IDs, chemical properties, pathways)
Retrieve metabolomics study information from MetaboLights or Metabolomics Workbench
Search for metabolomics studies by keywords or disease
Analyze metabolite profiles or datasets
Generate comprehensive metabolomics research reports
Example queries:
"What is the HMDB ID and pathway information for glucose?"
"Get study details for MTBLS1"
"Find metabolomics studies related to diabetes"
"Analyze these metabolites: glucose, lactate, pyruvate"
Databases Covered
Primary metabolite databases:
HMDB (Human Metabolome Database): 220,000+ metabolites with structures, pathways, and biological roles
Verwandte Skills
MetaboLights: Public metabolomics repository with thousands of studies
Metabolomics Workbench: NIH Common Fund metabolomics data repository
PubChem: Chemical properties and bioactivity data (fallback)
Research Workflow
The skill executes a 4-phase analysis pipeline:
Phase 1: Metabolite Identification & Annotation
For each metabolite in the input list:
Search HMDB by metabolite name
Retrieve HMDB ID, chemical formula, molecular weight
Get detailed metabolite information (description, pathways)
Fallback to PubChem for CID and chemical properties if HMDB unavailable
Phase 2: Study Details Retrieval
For provided study IDs:
Detect database type (MTBLS = MetaboLights, ST = Metabolomics Workbench)
Retrieve study metadata (title, description, organism, status)
Extract experimental design and data availability
Phase 3: Study Search
For keyword searches:
Search MetaboLights studies by query term
Return matching study IDs with preview information
PubChem_get_compound_properties_by_CID: Get chemical properties
No manual tool configuration required - all tools loaded automatically.
Common Issues
Issue: HMDB returns "Error querying HMDB: 0"
Cause: HMDB search returned empty results or index error accessing first result
Solution: This is expected for uncommon metabolites; PubChem fallback will be attempted
Issue: Study details show "N/A" for all fields
Cause: Study ID not found or API unavailable
Solution: Verify study ID format (MTBLS* or ST*), check if study is public
Issue: Tool not found errors
Cause: Missing API keys for some databases
Solution: Check .env.template, add required API keys to .env file (most metabolomics tools work without keys)
Issue: Large metabolite lists cause slow execution
Cause: Pipeline queries each metabolite individually
Solution: Reports limit to first 10 metabolites; consider batching for >20 metabolites
Summary
The Metabolomics Research skill provides comprehensive metabolomics analysis through a 4-phase pipeline that:
Identifies metabolites using HMDB (primary) and PubChem (fallback) databases
Retrieves study details from MetaboLights and Metabolomics Workbench repositories
Searches studies by keywords across metabolomics databases
Generates structured reports with all findings in readable markdown format
✅ Graceful error handling (continues if one phase fails)
✅ Progressive report writing (memory-efficient)
✅ Implementation-agnostic documentation (works with Python SDK and MCP)
Best for:
Metabolite annotation and pathway analysis
Study discovery and data retrieval
Comprehensive metabolomics research reports
Multi-database metabolomics queries
Reasoning Framework
Starting Point: Mass Spectrum Analysis
Metabolite identification starts with the mass spectrum. LOOK UP DON'T GUESS — always search HMDB/PubChem with the calculated neutral mass rather than guessing identity from m/z alone.
Step 2 — Search databases: Query HMDB by mass (±5 ppm for Orbitrap/Q-TOF, ±0.5 Da for unit-resolution). Multiple adduct hypotheses yield different neutral masses — check all plausible adducts before concluding.
Step 3 — Resolve ambiguity: Exact mass alone often matches 5-20 candidates. Use isotope pattern (M+1/M+2 ratios indicate element composition — e.g., high M+2 suggests S or Cl), retention time, and MS/MS fragmentation to narrow down. A single mass match is L3 confidence; MS/MS match to reference spectrum is required for L2/L1.
L1 - Confirmed: HMDB ID + retention time + MS/MS match to reference standard
L2 - Probable: HMDB match by exact mass + MS/MS similarity (cosine > 0.7), no standard
L3 - Tentative: Matched by exact mass and molecular formula only; structural isomers unresolved
L4 - Unknown: Detected m/z with no database match; PubChem fallback may provide candidates
Interpretation Guidance
Metabolite identification: HMDB IDs provide the strongest annotation when paired with experimental validation. A PubChem-only match (fallback) indicates the metabolite is chemically characterized but may lack biological context (pathways, disease associations). Always report the identification confidence level.
Pathway enrichment strategy: When multiple metabolites map to the same KEGG or HMDB pathway, enrichment is meaningful only if the input list is unbiased (not pre-selected for that pathway). Report hits vs. pathway size (3/5 detected is more informative than 3/500). LOOK UP DON'T GUESS — use HMDB_get_metabolite to get pathway annotations for each metabolite rather than assuming pathway membership from names alone.
Biomarker discovery reasoning: A candidate biomarker should show: (1) consistent direction of change across samples (fold-change > 1.5), (2) statistical significance (FDR-adjusted p < 0.05), (3) biological plausibility — LOOK UP the metabolite's known disease associations via HMDB, and (4) reproducibility in an independent cohort. Single-study HMDB associations are hypothesis-generating, not confirmatory. Check MetaboLights/Metabolomics Workbench for independent validation datasets.
Synthesis Questions
A complete metabolomics report should answer:
What is the identification confidence level for each metabolite (L1-L4)?
Which biological pathways are enriched among the identified metabolites?
Do any metabolites meet biomarker criteria (fold-change, significance, plausibility)?
Are there relevant metabolomics studies (MTBLS/ST) for the disease or condition of interest?
What cross-database evidence supports the biological relevance of key findings (HMDB pathways, PubChem bioactivity)?
Limitations:
HMDB may not have all metabolites (fallback to PubChem)
Some studies require authentication or are not public
Large metabolite lists (>10) auto-limited in reports
API rate limits may affect large-scale queries
See QUICK_START.md for Python SDK examples, MCP integration, and step-by-step tutorials.