Execute comprehensive platform migrations to Databricks from legacy systems. Use when migrating from on-premises Hadoop, other cloud platforms, or legacy data warehouses to Databricks. Trigger with phrases like "migrate to databricks", "hadoop migration", "snowflake to databricks", "legacy migration", "data warehouse migration".
Comprehensive migration strategies for moving to Databricks from Hadoop, Snowflake, Redshift, Synapse, or legacy data warehouses.
| Source | Pattern | Complexity | Timeline |
|---|---|---|---|
| On-prem Hadoop | Lift-and-shift + modernize | High | 6-12 months |
| Snowflake | Parallel run + cutover | Medium | 3-6 months |
| AWS Redshift | ETL rewrite + data copy | Medium | 3-6 months |
| Legacy DW (Oracle/Teradata) | Full rebuild | High | 12-18 months |
Inventory all source tables with metadata (size, partitions, dependencies, data classification). Generate prioritized migration plan with wave assignments.
Convert source schemas to Delta Lake compatible types. Handle type conversions (char->string, tinyint->int). Enable auto-optimize on target tables.
Batch large tables by partition. Validate row counts and schema match after each table migration.
Convert spark-submit/Oozie jobs to Databricks jobs. Update paths, remove Hive metastore references, adapt for Unity Catalog.
Execute 6-step cutover: validate -> disable source -> final sync -> enable Databricks -> update apps -> monitor. Each step has rollback procedure.
See detailed implementation for assessment scripts, schema conversion, data migration with batching, ETL conversion, and cutover plan generation.
| Error | Cause | Solution |
|---|---|---|
| Schema incompatibility | Unsupported types | Use type conversion mappings |
| Data loss | Truncation during migration | Validate counts at each step |
| Performance issues | Large tables | Use partitioned migration |
| Dependency conflicts | Wrong migration order | Analyze dependencies first |
SELECT 'source' as system, COUNT(*) FROM hive_metastore.db.table
UNION ALL SELECT 'target' as system, COUNT(*) FROM migrated.db.table;
Provides coverage for Databricks platform migrations.45:["$","$L4e",null,{"content":"$4f","frontMatter":{"name":"databricks-migration-deep-dive","description":"Execute comprehensive platform migrations to Databricks from legacy systems.\nUse when migrating from on-premises Hadoop, other cloud platforms,\nor legacy data warehouses to Databricks.\nTrigger with phrases like "migrate to databricks", "hadoop migration",\n"snowflake to databricks", "legacy migration", "data warehouse migration".\n","allowed-tools":"Read, Write, Edit, Bash(databricks:*), Grep","version":"1.0.0","license":"MIT","author":"Jeremy Longshore [email protected]","compatible-with":"claude-code, codex, openclaw"}}]