Automatically detect and de-identify PII (Personal Identifiable Information) and PHI (Protected Health Information) from clinical/medical text to ensure HIPAA compliance. Trigger when processing medical records, patient data, clinical notes, insurance information, or any healthcare-related text containing potential patient identifiers.
A clinical-grade PII/PHI detection and de-identification tool for healthcare text data.
This skill analyzes text for HIPAA-protected identifiers and automatically redacts or anonymizes them. It uses a combination of regex patterns, NLP entity recognition, and contextual analysis to identify 18 HIPAA identifier categories.
[PATIENT_NAME], [DATE_1])python scripts/main.py --input "patient_text.txt" --output "deidentified.txt"
python scripts/main.py --text "Patient John Doe, SSN 123-45-6789..." --audit-log audit.json
from scripts.main import HIPAAAuditor
auditor = HIPAAAuditor()
result = auditor.deidentify("Patient John Doe was admitted on 2024-01-15...")
print(result.cleaned_text) # De-identified output
print(result.detected_pii) # List of found PII entities
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
--input, -i | string | - | No | Path to input text file |
--text | string | - | No | Direct text input (alternative to file) |
--output, -o | string | - | No | Path for de-identified output file |
--audit-log | string | - | No | Path for JSON audit log |
--confidence | float | 0.7 | No | Minimum confidence threshold (0.0-1.0) |
--preserve-structure | bool | true | No | Maintain document structure |
--custom-patterns | string | - | No | Path to custom regex patterns JSON |
Original identifiers replaced with semantic tags:
[PATIENT_NAME_1], [PATIENT_NAME_2] ...[DATE_1], [DATE_2] ...[SSN_1][PHONE_1], [PHONE_2] ...[EMAIL_1][MRN_1] (Medical Record Number)[ADDRESS_1]{
"timestamp": "2024-01-15T10:30:00Z",
"input_hash": "sha256:abc123...",
"detections": [
{
"type": "PATIENT_NAME",
"position": [10, 18],
"confidence": 0.95,
"replacement": "[PATIENT_NAME_1]",
"original_length": 8
}
],
"statistics": {
"total_pii_found": 5,
"categories_detected": ["NAME", "DATE", "PHONE", "SSN"]
}
}
See references/requirements.txt for full dependency list.
⚠️ CRITICAL: This tool is designed as a helper, not a replacement for human review.
references/hipaa_safe_harbor_guide.pdf - HIPAA Safe Harbor de-identification standardsreferences/pii_patterns.json - Complete regex pattern definitionsreferences/test_cases/ - Sample clinical texts with expected outputsreferences/requirements.txt - Python dependenciesComplex NLP pipelines, contextual disambiguation, regulatory compliance requirements.
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
# Python dependencies
pip install -r requirements.txt