Manage Unity Catalog metadata tags for data governance and classification. Use when applying tags to tables and columns, classifying data sensitivity (PII, PHI), marking data quality attributes, or when user mentions Unity Catalog tagging, metadata management, data governance, or compliance workflows.
This skill enables AI agents to manage Unity Catalog metadata tags for data governance, classification, and discovery. Tags can be applied to tables, columns, and other catalog objects to indicate data types, sensitivity levels, quality metrics, and compliance requirements.
Use this skill when you need to:
Apply tags to specific columns to indicate their purpose, sensitivity, or quality.
Common Tag Types:
PII, PHI, CONFIDENTIAL, PUBLIC, INTERNALEMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, UUIDVALIDATED, REQUIRES_REVIEW, LOW_QUALITY, HIGH_QUALITYCUSTOMER_DATA, FINANCIAL, MARKETING, OPERATIONALGDPR, HIPAA, SOX, PCI_DSSExample: Tag email column as PII
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Apply PII tag to email column
w.catalog.update_column(
full_name="main.bronze.customers.email",
comment="Customer email address",
tags={"sensitivity": "PII", "data_type": "EMAIL", "quality": "VALIDATED"}
)
Apply tags to entire tables for high-level classification.
Example: Tag table as containing PII
# Apply tags to table
w.catalog.update_table(
full_name="main.bronze.customers",
comment="Customer master data",
tags={
"contains_pii": "true",
"domain": "customer",
"owner": "data_platform_team",
"quality_score": "95",
"last_validated": "2025-12-17"
}
)
Tag multiple columns at once based on profiling results.
Example: Tag multiple columns after profiling
# Profile results indicate which columns need tags
profile_results = {
"email": {"pattern": "EMAIL", "contains_pii": True},
"phone": {"pattern": "PHONE", "contains_pii": True},
"customer_id": {"pattern": "UUID", "is_key": True},
"purchase_amount": {"data_type": "CURRENCY", "quality": "high"}
}
# Apply tags based on profiling
for column_name, tags_info in profile_results.items():
tags = {}
if tags_info.get("contains_pii"):
tags["sensitivity"] = "PII"
if "pattern" in tags_info:
tags["data_type"] = tags_info["pattern"]
if tags_info.get("is_key"):
tags["column_role"] = "PRIMARY_KEY"
w.catalog.update_column(
full_name=f"main.bronze.customers.{column_name}",
tags=tags
)
Add descriptive comments to columns for documentation.
Example: Document column purpose
w.catalog.update_column(
full_name="main.bronze.customers.customer_id",
comment="Unique identifier for customer records. UUID format. Primary key.",
tags={
"data_type": "UUID",
"column_role": "PRIMARY_KEY",
"required": "true"
}
)
Apply organizational tags to schemas.
Example: Tag schema for domain ownership
w.catalog.update_schema(
full_name="main.bronze",
comment="Bronze layer: Raw ingested data",
tags={
"layer": "bronze",
"domain": "ingestion",
"owner": "data_engineering",
"retention_days": "90"
}
)
The Data Quality Agent uses this skill to automatically tag columns based on validation results:
# After running data quality tests
validation_results = {
"email": {
"test": "email_format_validation",
"passed": True,
"quality_score": 0.98
},
"phone": {
"test": "phone_format_validation",
"passed": True,
"quality_score": 0.95
},
"age": {
"test": "range_validation",
"passed": False,
"quality_score": 0.85,
"issues": "5% of values outside valid range"
}
}
# Apply quality tags
for column, result in validation_results.items():
tags = {
"quality_tested": "true",
"last_test_date": "2025-12-17"
}
if result["passed"] and result["quality_score"] > 0.95:
tags["quality_status"] = "HIGH_QUALITY"
tags["validated"] = "true"
elif result["quality_score"] > 0.85:
tags["quality_status"] = "ACCEPTABLE"
else:
tags["quality_status"] = "REQUIRES_REVIEW"
tags["quality_issues"] = result.get("issues", "Unknown")
w.catalog.update_column(
full_name=f"main.bronze.customers.{column}",
tags=tags
)
# Detect PII patterns from data profiler
pii_patterns = {
"email": "EMAIL",
"phone": "PHONE",
"ssn": "SSN",
"credit_card": "CREDIT_CARD"
}
for column, pattern in pii_patterns.items():
w.catalog.update_column(
full_name=f"{table_full_name}.{column}",
tags={
"sensitivity": "PII",
"data_type": pattern,
"requires_masking": "true",
"compliance": "GDPR"
}
)
# Apply quality score tags after validation
quality_results = run_data_quality_tests(table_name)
table_quality_score = sum(r["score"] for r in quality_results) / len(quality_results)
w.catalog.update_table(
full_name=table_name,
tags={
"quality_score": str(int(table_quality_score * 100)),
"quality_tests_run": str(len(quality_results)),
"quality_tests_passed": str(sum(1 for r in quality_results if r["passed"])),
"last_quality_check": datetime.now().isoformat()
}
)
# Tag columns by business domain
business_domains = {
"customer_id": "customer",
"order_id": "sales",
"product_id": "product",
"payment_method": "finance"
}
for column, domain in business_domains.items():
w.catalog.update_column(
full_name=f"{table_full_name}.{column}",
tags={
"business_domain": domain,
"owner": f"{domain}_team"
}
)
# Tag for regulatory compliance
compliance_columns = {
"email": ["GDPR", "CCPA"],
"health_info": ["HIPAA"],
"financial_data": ["SOX", "PCI_DSS"]
}
for column, regulations in compliance_columns.items():
w.catalog.update_column(
full_name=f"{table_full_name}.{column}",
tags={
"compliance_required": ",".join(regulations),
"requires_audit": "true",
"retention_required": "true"
}
)
Sensitivity Levels:
├── PUBLIC (no restrictions)
├── INTERNAL (company-only)
├── CONFIDENTIAL (restricted access)
├── PII (personal information)
└── PHI (protected health information)
Quality Levels:
├── HIGH_QUALITY (>95% validation pass)
├── ACCEPTABLE (85-95% validation pass)
├── REQUIRES_REVIEW (70-85% validation pass)
└── LOW_QUALITY (<70% validation pass)
Data Types:
├── Identifiers: UUID, ID, KEY
├── Contact: EMAIL, PHONE, ADDRESS
├── Financial: CURRENCY, CREDIT_CARD, ACCOUNT_NUMBER
├── Personal: SSN, DOB, NAME
└── Technical: IP_ADDRESS, URL, TIMESTAMP
from databricks.sdk.errors import ResourceDoesNotExist, PermissionDenied