Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows.
scripts/similarity_pipeline.py is the most direct path to complete the request.matchms package behavior rather than a generic answer.scripts/similarity_pipeline.py.references/ for task-specific guidance.Python: 3.10+. Repository baseline for current packaged skills.Third-party packages: not explicitly version-pinned in this skill package. Add pinned versions if this skill needs stricter environment control.cd "20260316/scientific-skills/Data Analytics/matchms"
python -m py_compile scripts/similarity_pipeline.py
python scripts/similarity_pipeline.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.python scripts/similarity_pipeline.py with the validated inputs.scripts/similarity_pipeline.py.references/ contains supporting rules, prompts, or checklists.Use this skill when you need to:
Additional reference material (if present in the repository):
references/filtering.mdreferences/similarity.mdreferences/workflows.mdmatchms (version depends on your environment; pin in your project, e.g., matchms>=0.20,<1.0)numpy (e.g., numpy>=1.20)scipy (e.g., scipy>=1.7)rdkit (optional; required for chemistry/fingerprint-related functionality, version varies by distribution)A minimal, runnable example that loads spectra from an MGF file and computes pairwise cosine scores:
from matchms.importing import load_from_mgf
from matchms import calculate_scores
from matchms.similarity import CosineGreedy
def main():
# Load spectra from an MGF file
spectra = list(load_from_mgf("data.mgf"))
# Compute similarity scores (all-vs-all)
scores = calculate_scores(
references=spectra,
queries=spectra,
similarity_function=CosineGreedy()
)
# Iterate over computed scores
for (reference_idx, query_idx, score, n_matches) in scores:
print(
f"ref={reference_idx:>3} query={query_idx:>3} "
f"cosine={score:.4f} matches={n_matches}"
)
if __name__ == "__main__":
main()
Spectrum objects containing peak m/z and intensity arrays plus metadata (e.g., precursor m/z, charge, compound name/identifier).references/filtering.md for filter patterns and recommended sequences.references/workflows.md for pipeline organization guidance.