Configure Databricks local development with Databricks Connect, Asset Bundles, and IDE. Use when setting up a local dev environment, configuring test workflows, or establishing a fast iteration cycle with Databricks. Trigger with phrases like "databricks dev setup", "databricks local", "databricks IDE", "develop with databricks", "databricks connect".
Set up a fast local development workflow using Databricks Connect v2, Asset Bundles, and VS Code. Databricks Connect lets you run PySpark code locally while executing on a remote Databricks cluster, giving you IDE debugging, fast iteration, and proper test isolation.
databricks-install-auth setupmy-databricks-project/
├── src/
│ ├── __init__.py
│ ├── pipelines/
│ │ ├── __init__.py
│ │ ├── bronze.py # Raw ingestion
│ │ ├── silver.py # Cleansing transforms
│ │ └── gold.py # Business aggregations
│ └── utils/
│ ├── __init__.py
│ └── helpers.py
├── tests/
│ ├── conftest.py # Spark fixtures
│ ├── unit/
│ │ └── test_transforms.py # Local Spark tests
│ └── integration/
│ └── test_pipeline.py # Databricks Connect tests
├── notebooks/
│ └── exploration.py
├── resources/
│ └── daily_etl.yml # Job resource definitions
├── databricks.yml # Asset Bundle root config
├── pyproject.toml
└── requirements.txt
set -euo pipefail
# Create virtual environment
python -m venv .venv && source .venv/bin/activate
# Databricks Connect v2 — version MUST match cluster DBR
pip install "databricks-connect==14.3.*"
# SDK and CLI
pip install databricks-sdk
# Testing
pip install pytest pytest-cov
# Verify Connect installation
databricks-connect test
Databricks Connect v2 reads from standard SDK auth (env vars, ~/.databrickscfg, or DATABRICKS_CLUSTER_ID).
# Set cluster for Connect to use
export DATABRICKS_HOST="https://adb-1234567890123456.7.azuredatabricks.net"
export DATABRICKS_TOKEN="dapi..."
export DATABRICKS_CLUSTER_ID="0123-456789-abcde123"
# src/utils/spark_session.py
from databricks.connect import DatabricksSession
def get_spark():
"""Get a DatabricksSession — runs Spark on the remote cluster."""
return DatabricksSession.builder.getOrCreate()
# Usage: df operations execute on the remote cluster
spark = get_spark()
df = spark.sql("SELECT current_timestamp() AS now")
df.show() # Results streamed back locally
# databricks.yml