**[REQUIRED]** For **ALL** data science and machine learning tasks. This skill should ALWAYS be loaded in even if only a portion of the workflow is related to machine learning. Use when: analyzing data, training models, deploying models to Snowflake, registering models, working with ML workflows, running ML jobs on Snowflake compute, model registry, model service, model inference, log model, deploy pickle file, experiment tracking, model monitoring, ML observability, tracking drift, model performance analysis, distributed training, XGBoost, LightGBM, PyTorch, DPF, distributed partition function, many model training, hyperparameter tuning, HPO, compute pools, train at scale, pipeline orchestration, DAG, task graph, schedule training. Routes to specialized sub-skills.
This skill routes to specialized sub-skills for data science and machine learning tasks. This skill provides valuable information about all sorts of data science, machine learning, and mlops tasks. It MUST be loaded in if any part of the user query relates to these topics❗❗❗
⚠️ CRITICAL: Before routing to any sub-skill, you MUST load the environment guide for your surface.
Your system prompt indicates which surface you are operating on. Load the matching guide:
| Surface | Condition | Guide to Load |
|---|---|---|
| Snowsight | You are operating inside the Snowflake Snowsight web interface | guides/snowsight-environment.md |
| CLI / IDE | You are operating in a command line terminal or IDE environment | guides/cli-environment.md |
The environment guide provides surface-specific instructions for session setup, package management, and code execution that apply to ALL sub-skills below. Sub-skills will reference these patterns rather than repeating them.
⚠️ CRITICAL: Route AUTOMATICALLY based on the user's request. Do NOT ask the user which sub-skill to use or how they want to deploy.
When a user asks to "train a model", "build a model" or inquires about a similar task:
ml-development/SKILL.md and start working⚠️ CRITICAL: When a user mentions a service name, check if it's a model inference service:
DESCRIBE SERVICE <DB>.<SCHEMA>.<SERVICE_NAME>managing_object_domain = 'Model' → Route to spcs-inference/SKILL.mdThis applies to ANY task involving the service (testing, REST API calls, latency profiling, benchmarking, debugging, management).
⚠️ CRITICAL: When user mentions "inference" without clear signals, you MUST ask for clarification.
There is a decision matrix located in the public docs https://docs.snowflake.com/en/developer-guide/snowflake-ml/inference/inference-overview.
Inference Disambiguation Workflow:
When user says something like "run inference on my model" or "inference" without batch/online signals:
I can help you run inference on your model. There are three approaches:
1. **Native Batch Inference (SQL)** - Embed inference in SQL pipelines
- <add decision points from docs matrix here>
2. **Job-Based Batch (SPCS)** - Run large-scale inference jobs
- <add decision points from docs matrix here>
3. **Real-Time Inference (SPCS)** - Deploy a REST endpoint
- <add decision points from docs matrix here>
Which approach fits your use case?
⚠️ STOP: Wait for user response before routing.
⚠️ CRITICAL: These two skills are commonly confused. Use this logic:
| User Intent | Key Signals | Route To |
|---|---|---|
| Run inference on a registered model | "model registry" + ("inference", "predictions", "scoring", "run()", "run_batch()") | batch-inference-jobs/SKILL.md |
| Run a Python script on Snowflake compute | "script", "submit", "file", "directory", training code | ml-jobs/SKILL.md |
Decision tree:
batch-inference-jobs/SKILL.md (covers both mv.run() and mv.run_batch())ml-jobs/SKILL.md (uses submit_file() or submit_directory())ml-pipeline-orchestration/SKILL.md| User Intent | Key Signals | Route To |
|---|---|---|
| Run a single ML job | "submit", "run script", "compute pool", one-off execution | ml-jobs/SKILL.md |
| Orchestrate multiple steps on a schedule | "pipeline", "DAG", "schedule", "automate", "task graph", multi-step workflow | ml-pipeline-orchestration/SKILL.md |
⚠️ CRITICAL: When user says "partitioned modeling", "partitioned model", "model per partition", or similar ambiguous phrases, clarify their stage in the workflow:
Partitioned Modeling Workflow:
[1] Train models per partition → [2] Register → [3] Run partitioned inference
(MMT) (Registry) (@partitioned_inference_api)
Decision tree:
distributed-training/SKILL.md (Many Model Training section)model-registry/partitioned-inference/SKILL.mddistributed-training/SKILL.md first, then model-registry/partitioned-inference/SKILL.mdClarification prompt (when ambiguous):
I can help with partitioned modeling. Where are you in the workflow?
1. **Train models per partition** - Use Many Model Training (MMT) to train one model per partition (store, region, etc.)
2. **Run partitioned inference** - Already have trained models, need to run predictions per partition
3. **Full workflow** - Train per partition → Register → Run partitioned inference
Which step do you need help with?
⚠️ STOP: Wait for user response before routing.
| User Says | Route To | Action |
|---|---|---|
| "analyze data", "train model", "build model", "feature engineering", "predict", "classify", "regression" | ml-development/SKILL.md | Load immediately, start training |
| "register model", "model registry", "log model", "pickle to snowflake", "save model to snowflake", "upload model", ".pkl file", ".ubj file" | model-registry/SKILL.md | Load immediately, start registration (Workflow A) |
| "deploy model", "deploy model for inference", "deploy for inference" | model-registry/SKILL.md | Load immediately, ask deployment target (Workflow B) |
| "create inference service", "inference endpoint", "serve model", "snowpark container services", "model endpoint", "deploy in container", "deploy model service", "real-time inference", "online inference" | spcs-inference/SKILL.md | Load immediately, create SPCS service |
| "batch inference", "bulk predictions", "run_batch", "run()", "offline scoring", "score dataset", "batch predictions", "inference on registered model", "run predictions on registry model", "score with registered model", "offline inference", "SQL inference", "dbt inference", "dynamic table inference" | batch-inference-jobs/SKILL.md | Load immediately, set up batch inference |
| "inference", "run inference" (ambiguous, no batch/online signals) | ASK USER | Use disambiguation workflow above to clarify batch vs online |
| "ml job", "ml jobs", "run on snowflake compute", "submit job", "submit script", "submit file", "remote execution", "GPU training", "run python script on snowflake" | ml-jobs/SKILL.md | Load immediately, set up job |
| "pipeline", "DAG", "task graph", "schedule training", "schedule inference", "orchestrate", "productionize", "automate retraining", "convert notebook to pipeline" | ml-pipeline-orchestration/SKILL.md | Load immediately, set up DAG |
| "experiment tracking", "track experiment", "log metrics", "log parameters", "autolog", "training callback", "XGBoost callback", "LightGBM callback" | experiment-tracking/SKILL.md | Load immediately, set up experiment tracking |
| "model monitor", "monitor model", "add monitoring", "enable monitoring", "ML observability", "track drift", "model performance", "monitor predictions", "observability" | model-monitor/SKILL.md | Load immediately, set up monitoring |
| "inference logs", "inference table", "captured inference", "autocapture data", "view inference history", "INFERENCE_TABLE", "inference requests", "inference responses", "view captured predictions" | inference-logs/SKILL.md | Load immediately, query inference data |
| "inference error", "mv.run fails", "service failing", "OOM", "debug inference", "inference not working" | debug-inference/SKILL.md | Load immediately, diagnose issue |
| "distributed training", "distributed XGBoost", "distributed LightGBM", "XGBEstimator", "LightGBMEstimator", "PyTorchDistributor", "multi-node training", "multi-GPU training", "train at scale", "DPF", "distributed partition function", "many model training", "MMT", "train per partition", "ManyModelTraining", "partition by", "hyperparameter tuning", "hyperparameter optimization", "HPO", "Tuner", "TunerConfig", "search space", "grid search", "random search", "bayesian optimization", "tune model", "tune hyperparameters", "num_trials", "search_alg" | distributed-training/SKILL.md | Load immediately, distributed training/processing/tuning |
| "partitioned inference", "@partitioned_api", "inference per partition", "CustomModel partition" | model-registry/partitioned-inference/SKILL.md | Load immediately, partitioned inference |
| "partitioned modeling", "partitioned model", "model per partition", "per-partition models" (ambiguous) | ASK USER | Use partitioned modeling disambiguation above |
Sub-skill path aliases (for routing resolution):
ml-job → ml-jobs/SKILL.md (singular form routes to plural directory)ml-jobs → ml-jobs/SKILL.mdmljob → ml-jobs/SKILL.mdmljobs → ml-jobs/SKILL.mdUser Request → Load Environment Guide → Detect Intent → Load appropriate sub-skill → Execute
Examples:
- "Train a classifier" → Load ml-development → Train locally → Done
- "Deploy my model.pkl" → Load model-registry → Register to Snowflake → Done
- "Train AND deploy" → Load ml-development → Train → Save model → Report artifacts → Ask about deployment → If yes, load model-registry WITH CONTEXT (file path, framework, schema)
Key principle: Complete ONE task at a time. Only ask about the next step after the current step is done.
⚠️ CRITICAL: When transitioning from ml-development to model-registry:
Information to preserve and pass along:
Why this matters:
How to do it:
Example handoff:
ml-development: "Model saved to /path/to/model.pkl (sklearn). Would you like to register it?"
User: "Yes"
[Load model-registry with context: path=/path/to/model.pkl, framework=sklearn, schema=[...]]
model-registry: "I see you just trained a sklearn model. What should I call it in Snowflake?"
Data exploration, statistical analysis, model training, and evaluation. Covers the full ML development workflow from data loading to model evaluation.
Deploy serialized models to Snowflake Model Registry. Supports various model formats (.pkl, .ubj, .json, .pt, etc.) depending on framework. Routes to spcs-inference sub-skill for inference service creation. Includes partitioned-inference sub-skill for partition-aware model deployment.
Skills for tracking model training experiments using Snowflake's experiment tracking framework.
Deploy registered models to Snowpark Container Services for real-time inference. Handles compute pool selection, GPU/CPU configuration, num_workers, and service creation.
Run batch inference on models already registered in the Snowflake Model Registry. Covers two approaches:
mv.run()): Warehouse-based, integrates with SQL pipelinesmv.run_batch()): SPCS compute pools, for large-scale and unstructured dataTransform local Python scripts into Snowflake ML Jobs that run on Snowflake compute pools. Uses submit_file() or submit_directory(). Also includes compute pool reference (instance families, sizing).
Orchestrate multi-step ML workflows using Snowflake Task Graphs (DAGs) with the Python DAG API. Covers DAG creation, scheduling (Cron/timedelta), inter-task data passing, and notebook-to-pipeline conversion. Uses @remote for ML tasks on compute pools and warehouse tasks for orchestration.
Set up ML Observability for models in the Snowflake Model Registry. Track drift, performance metrics, and prediction statistics over time.
Consolidated skill covering all distributed ML training, processing, and tuning:
XGBEstimator, LightGBMEstimator, PyTorchDistributor for training one large model across nodes/GPUsget_model()Note: These APIs run server-side — either inside ML Jobs (submitted via CLI) or in Snowflake Notebooks with Container Runtime (Snowsight). For CLI submission, see ml-jobs.
Partitioned inference in the Model Registry using @partitioned_api decorator. Run inference with different submodels per data partition. Located at model-registry/partitioned-inference/SKILL.md.
Query and analyze captured inference data from model services with Auto-Capture enabled. View historical request/response data logged via INFERENCE_TABLE(). Useful for debugging unexpected predictions, building retraining datasets, and A/B testing model versions.
When the workflow involves creating or writing to any Snowflake object (table, stage, model registry entry, experiment, etc.), never silently pick a database/schema. Always confirm with the user first.
DATABASE.SCHEMA has already been used in this session, offer it as the default:
I'll need to create [object] in Snowflake. I see we've been working with `<DATABASE>.<SCHEMA>`.
Should I use that, or would you prefer a different database/schema?
Which database and schema should I use for [object]? (format: DATABASE.SCHEMA)
USER$VINAY) are not supported for ML workflows. If the user picks a personal database, warn them:
Personal databases like `USER$<USERNAME>` don't support creating tables, model registry operations, or inference services. Please provide a standard database/schema instead.