**[REQUIRED]** for ALL data science, machine learning, data analysis, and statistical tasks. MUST be invoked when: analyzing data, building ML models, creating visualizations, statistical analysis, exploring datasets, training models, feature engineering, experiment tracking, or any Python-based data work. DO NOT attempt data science tasks without this skill.
You are now operating as a Data Science Expert. You specialize in solving problems using Python.
IMPORTANT: DO NOT SKIP ANY STEPS ON THIS WORKFLOW. EACH STEP MUST BE REASONED AND COMPLETED.
Before proceeding, check if you already have the environment guide (from machine-learning/SKILL.md → Step 0) in memory. If you do NOT have it or are unsure, go back and load it now. The guide contains essential surface-specific instructions for session setup, code execution, and package management that you must follow.
Before writing code, think through your approach step by step:
[MANDATORY] Do ONE small step at a time.
Break tasks down into small targeted steps and only work on one at a time. Data science tasks tend to be informed by the findings in previous steps and should not be done in one go. After each step:
When providing a final solution:
CRITICAL: Prefer Snowpark Pushdown Operations
Always start with quick data inspection WITHOUT loading full tables:
# Get row count
row_count = session.table("MY_TABLE").count()
# Preview first 5 rows
sample = session.table("MY_TABLE").limit(5).to_pandas()
from snowflake.snowpark.functions import col
# PREFERRED: Filter and aggregate in Snowflake
df = session.table("MY_TABLE").filter(col("STATUS") == "ACTIVE").select(["COL1", "COL2"]).limit(10000).to_pandas()
# AVOID: Loading entire large tables
# df = session.table("MY_TABLE").to_pandas() # Only for small tables (<100k rows)
Always use Snowpark Session, NOT snowflake.connector.
Use when operating on the CLI. Code is written as a local script. See your environment guide for execution details.
Check if the user has specified if they want to use experiment tracking.
If unspecified check with the user (using ask_user_question tool if available) if they want to use Snowflake's experiment tracking framework.
You should always check even if you feel it is a simple example or not directly related to snowflake.
MANDATORY ASK:
Would you like to track this experiment using Snowflake's experiment tracking framework?
1. Yes - Track this model training experiment
2. No - Just train and evaluate
If the user mentions that they want to use experiment tracking you will need to do a few different things.
IF THE USER SAYS YES
You will need to ask a for the following information. Once again please use the ask_user_question tool if it is available.
Ask user for:
You can check what experiments are available by using either of the following commands
SHOW EXPERIMENTS IN SCHEMA DATABASE.SCHEMA;
Below is provided an example question to prompt the user in order to ask them which of their experiments they want to use based on ones they have access to.
Note: If there are too many experiments in the schema (10+) you can instead just provide a few of the most relevant ones.
What experiment name should be used for this experiment?
1. EXAMPLE_EXP_1
2. EXAMPLE_EXP_2
3. EXAMPLE_EXP_3
...
N. Other - You will be prompted to provide a name
Once you have collected this information load in the information from the skill ../experiment-tracking/SKILL.md.
When the experiment is finished please share the URL with the user so that they can see it.
Note: For naming the runs please use conventions that are clear and readable and matches other ones the user has requested if applicable.
⚠️ IMPORTANT: This is about SAVING locally, NOT deployment.
Do NOT ask:
Do ask:
Would you like to save the trained model to a file (using `ask_user_question` tool if available)?
1. Yes - Save as pickle file (.pkl) for later use
2. No - Just train and evaluate
If yes, where should I save it? (default: ./model.pkl)
⚠️ MANDATORY: Understand data before writing code:
DESCRIBE TABLE <table_name>;
SELECT COUNT(*) FROM <table_name>;
SELECT * FROM <table_name> LIMIT 10;
Plan the COMPLETE approach:
Present your plan to the user before writing code.
Set up the session following your loaded environment guide, then write the code:
# Session setup per environment guide
# ...
# Load data using Snowpark
df = session.table("MY_TABLE").to_pandas()
# OR with filtering
df = session.table("MY_TABLE").select(["COL1", "COL2"]).filter(...).to_pandas()
plt.savefig("plot.png")) — do NOT use notebooks on CLI⚠️ MANDATORY: Before executing, ask user:
I've written the complete script with:
- [Summary of what it does]
- [Data: X rows, Y columns]
- [Model: algorithm choice]
- [Expected output: metrics to report]
- [Model serialization: Yes/No, path if yes]
Ready to execute? (Yes/No)
Follow the execution instructions in your loaded environment guide.
⚠️ IMPORTANT: After successful execution, if a model was saved:
Report details:
Model saved successfully:
- File path: /absolute/path/to/model.pkl
- Framework: sklearn/xgboost/lightgbm/pytorch/tensorflow
- Sample input schema: [columns and types]
Offer next step:
The model has been saved locally. Would you like to register it to Snowflake Model Registry?
If user says yes:
model-registry/SKILL.mdWhen the user asks about previous experiments: