Interactive walkthrough for setting up a DataSurface Yellow environment on Docker Desktop. Use this skill to guide users through the complete installation process step-by-step.
This skill guides you through setting up a DataSurface Yellow environment on Docker Desktop with Kubernetes. Follow each step in order and verify completion before proceeding.
Before starting, verify the user has:
kubectl CLI installed and configuredhelm CLI installedAsk the user for these environment variables if not already set:
NAMESPACE # Kubernetes namespace (default: demo1)
GITHUB_USERNAME # GitHub username
GITHUB_TOKEN # GitHub Personal Access Token
GITLAB_CUSTOMER_USER # GitLab deploy token username
GITLAB_CUSTOMER_TOKEN # GitLab deploy token
DATASURFACE_VERSION # DataSurface version (default: 1.1.0)
Also ask for the target repository names:
yourorg/demo1_actual)yourorg/demo1_airflow)Detect the Kubernetes storage class (varies between Docker Desktop installations):
kubectl get storageclass
Common values:
standard (some Docker Desktop versions)hostpath (other Docker Desktop versions)Save this value - you'll need it in Step 2.
IMPORTANT: Always run this step, even for "fresh" installations. Docker volumes persist across container deletions, so old Airflow DAG run history and merge data can survive even after removing containers. This causes scheduling issues where the scheduler sees stale runs.
# Check for existing namespace
kubectl get namespace $NAMESPACE
If the namespace exists:
# Uninstall Airflow
helm uninstall airflow -n $NAMESPACE
# Delete namespace
kubectl delete namespace $NAMESPACE
# If namespace is stuck in Terminating state (wait 30 seconds, then check):
kubectl get namespace $NAMESPACE -o json | jq '.spec.finalizers = []' | \
kubectl replace --raw "/api/v1/namespaces/$NAMESPACE/finalize" -f -
Always reset the databases to ensure clean state. Even if you deleted the container, the Docker volume persists with old data.
cd docker/postgres
docker compose down -v
The -v flag removes the named volume (datasurface-postgres-data), ensuring all old Airflow metadata, DAG run history, and merge data are deleted.
Checkpoint:
kubectl get namespace $NAMESPACE should return "not found"docker volume ls | grep datasurface-postgres should return nothingcd docker/postgres
# Verify no stale volume exists (should return nothing)
docker volume ls | grep datasurface-postgres
# Start fresh PostgreSQL
docker compose up -d
Checkpoint:
docker ps | grep datasurface-postgres - container should be runningdocker volume ls | grep datasurface-postgres - should show exactly one volume (newly created)Edit three files to configure for the user's environment:
eco.pyUpdate the repository owner and name:
GIT_REPO_OWNER: str = "<user's github org or username>"
GIT_REPO_NAME: str = "<model repo name, e.g., demo1_actual>"
helm/airflow-values.yamlUpdate the DAG sync repository URL: