Map unstructured biomedical text to standardized ontologies (SNOMED CT.
scripts/main.py.references/ for task-specific guidance.Python: 3.10+. Repository baseline for current packaged skills.dataclasses: unspecified. Declared in requirements.txt.difflib: unspecified. Declared in requirements.txt.cd "20260318/scientific-skills/Evidence Insight/bio-ontology-mapper"
python -m py_compile scripts/main.py
python scripts/main.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.python scripts/main.py with the validated inputs.See ## Workflow above for related details.
scripts/main.py.references/ contains supporting rules, prompts, or checklists.Use this command to verify that the packaged script entry point can be parsed before deeper execution.
python -m py_compile scripts/main.py
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
python -m py_compile scripts/main.py
python scripts/main.py --help
Biomedical terminology normalization tool that maps free-text clinical and scientific concepts to standardized ontologies for semantic interoperability and data harmonization.
Key Capabilities:
Extract and map biomedical entities to ontologies:
from scripts.mapper import BioOntologyMapper
mapper = BioOntologyMapper()
# Map clinical text
result = mapper.map_text(
text="Patient has diabetes and hypertension, taking metformin",
ontologies=["snomed", "mesh", "rxnorm"],
confidence_threshold=0.7
)
for entity in result.entities:
print(f"{entity.text} → {entity.concept_id} ({entity.ontology})")
print(f" Preferred: {entity.preferred_term}")
print(f" Confidence: {entity.confidence:.2f}")
Supported Ontologies:
| Ontology | Domain | Use Case |
|---|---|---|
| SNOMED CT | Clinical | EHR interoperability |
| MeSH | Literature | PubMed indexing |
| ICD-10 | Billing | Diagnosis codes |
| LOINC | Labs | Test result standardization |
| RxNorm | Drugs | Medication normalization |
| HGNC | Genes | Gene name standardization |
Map concepts between different ontologies:
# Cross-map SNOMED to ICD-10
translation = mapper.cross_map(
source_id="22298006", # SNOMED: Myocardial infarction
source_ontology="snomed",
target_ontology="icd10"
)
print(f"ICD-10: {translation.target_id} - {translation.target_term}")
# Output: I21.9 - Acute myocardial infarction, unspecified
Cross-Mapping Coverage:
Process large datasets:
# Batch process CSV
results = mapper.batch_map(
input_file="clinical_terms.csv",
text_column="diagnosis_description",
ontologies=["snomed", "icd10"],
output_format="csv",
max_workers=4
)
# Results include:
# - Original term
# - Mapped concept ID
# - Confidence score
# - Alternative mappings (if ambiguous)
Performance:
Assess mapping reliability:
scoring = mapper.score_mapping(
term="heart attack",
candidate="22298006", # Myocardial infarction
factors=["string_similarity", "context_match", "frequency"]
)
print(f"Overall confidence: {scoring.confidence:.2f}")
print(f"Breakdown: {scoring.factors}")
Scoring Factors:
Pre-Mapping:
During Mapping:
Post-Mapping:
Before Production:
Mapping Errors:
❌ Abbreviation ambiguity → "MI" = Myocardial infarction OR Michigan
❌ Outdated terms → Old terminology not in current ontology
❌ False confidence → High score for wrong concept
Technical Issues:
❌ API failures → No local fallback
❌ Version mismatches → Different ontology versions
❌ PHI exposure → Sending patient data to external APIs
Available in references/ directory:
snomed_ct_guide.md - SNOMED CT hierarchy and relationshipsmesh_structure.md - MeSH tree structure and qualifiersontology_mappings.md - Crosswalks between systemsnlp_best_practices.md - Biomedical text processingapi_documentation.md - External service integrationvalidation_datasets.md - Gold standard test setsLocated in scripts/ directory:
main.py - CLI interface for mappingmapper.py - Core ontology mapping engineextractor.py - Named entity recognitioncross_mapper.py - Ontology-to-ontology translationscorer.py - Confidence calculationbatch_processor.py - Large dataset handlingvalidator.py - Mapping quality checkscaching.py - Local storage for frequent lookups⚠️ Critical: Ontology mapping is for research and data integration, not clinical decision-making. Always validate mappings with domain experts before use in patient care contexts. Never process PHI without appropriate de-identification and compliance measures.
| Parameter | Type | Default | Description |
|---|---|---|---|
--term | str | Required | Single term to map |
--input | str | Required | Input file path |
--output | str | Required | Output file path |
--ontology | str | 'both' | |
--threshold | float | 0.7 | |
--format | str | 'json' | |
--use-api | str | Required | Use UMLS/MeSH APIs |
--api-key | str | Required |
Every final response should make these items explicit when they are relevant:
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.This skill accepts requests that match the documented purpose of bio-ontology-mapper and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
bio-ontology-mapperonly handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Use the following fixed structure for non-trivial requests:
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.