Name: Validate Pyspark To Snowpark Connect
Author: Heath-Moose

Validate Pyspark To Snowpark Connect | Skills Pool

uv --version || echo "PREREQ_FAIL: uv not installed"
uv run --project <SKILL_DIRECTORY> \
  python -c "from snowflake import snowpark_connect; spark = snowpark_connect.init_spark_session(); print('OK')" \
  || echo "PREREQ_FAIL: Snowflake connection failed"

# Check 3 (notebook workloads only): jupyter nbconvert
uv run --project <SKILL_DIRECTORY> \
  jupyter nbconvert --version \
  || echo "PREREQ_FAIL: jupyter nbconvert not installed"

test -e "$ARGUMENTS" || echo "ABORT: Migrated workload not found"

⚠️  External cloud reads detected in the migrated workload:
  1. spark.read.<method>("<path>")
  2. ...

These paths reference external cloud storage (S3, GCS, Azure Blob, etc.).
Snowflake recommends creating an external stage that points to these
cloud locations for production use.

For validation, I need a Snowflake stage to upload synthetic test data.

Do you already have an external stage for these locations?
  - If YES: provide the stage name and I'll upload mock data files to it.
  - If NO: I'll create an internal stage (SCOS_VALIDATION_<workload>)
    and upload synthetic files there for testing.

uv run --project <SKILL_DIRECTORY> python -c "
from snowflake.snowpark import Session
session = Session.builder.config('connection_name', 'default').create()

Validate Pyspark To Snowpark Connect

Validate PySpark to Snowpark Connect Migration

When to Load

Arguments

Rules

Validate Pyspark To Snowpark Connect

Validate PySpark to Snowpark Connect Migration

When to Load

Arguments

Rules

Prerequisites

Workflow

Phase 1: Analyze Workload

1.1 Validate migrated workload exists

1.2 Identify and classify external data dependencies

1.2.1 Stopping Point — External File Reads

1.2.2 Stopping Point — Table Dependencies

Database Migrations

Database Migrations

Postgres Patterns

Frontend Query & Mutation

Db Migrations

Drizzle