Name: Snowflake Development
Author: alirezarezvani

Skills suchen.../

Snowflake Development | Skills Pool

-- WRONG: missing colon prefix
SELECT name INTO result FROM users WHERE id = p_id;

-- CORRECT: colon prefix on both variable and parameter
SELECT name INTO :result FROM users WHERE id = :p_id;

MERGE INTO target t USING source s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.name = s.name, t.updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN INSERT (id, name, updated_at) VALUES (s.id, s.name, CURRENT_TIMESTAMP());

Approach	When to Use
Dynamic Tables	Declarative transformations. Default choice. Define the query, Snowflake handles refresh.
Streams + Tasks	Imperative CDC. Use for procedural logic, stored procedure calls, complex branching.
Snowpipe	Continuous file loading from cloud storage (S3, GCS, Azure).

CREATE OR REPLACE DYNAMIC TABLE cleaned_events
    TARGET_LAG = '5 minutes'
    WAREHOUSE = transform_wh
    AS
    SELECT event_id, event_type, user_id, event_timestamp
    FROM raw_events
    WHERE event_type IS NOT NULL;

CREATE OR REPLACE STREAM raw_stream ON TABLE raw_events;

CREATE OR REPLACE TASK process_events
    WAREHOUSE = transform_wh
    SCHEDULE = 'USING CRON 0 */1 * * * America/Los_Angeles'
    WHEN SYSTEM$STREAM_HAS_DATA('raw_stream')
    AS INSERT INTO cleaned_events SELECT ... FROM raw_stream;

-- Tasks start SUSPENDED. You MUST resume them.
ALTER TASK process_events RESUME;

Function	Purpose
`AI_COMPLETE`	LLM completion (text, images, documents)
`AI_CLASSIFY`	Classify text into categories (up to 500 labels)
`AI_FILTER`	Boolean filter on text or images
`AI_EXTRACT`	Structured extraction from text/images/documents
`AI_SENTIMENT`	Sentiment score (-1 to 1)
`AI_PARSE_DOCUMENT`	OCR or layout extraction from documents
`AI_REDACT`	PII removal from text

-- WRONG: single combined argument
TO_FILE('@stage/file.pdf')

-- CORRECT: two arguments
TO_FILE('@db.schema.mystage', 'invoice.pdf')

from snowflake.snowpark import Session
import os

session = Session.builder.configs({
    "account": os.environ["SNOWFLAKE_ACCOUNT"],
    "user": os.environ["SNOWFLAKE_USER"],
    "password": os.environ["SNOWFLAKE_PASSWORD"],
    "role": "my_role", "warehouse": "my_wh",
    "database": "my_db", "schema": "my_schema"
}).create()

-- Dynamic table materialization (streaming/near-real-time marts):
{{ config(materialized='dynamic_table', snowflake_warehouse='transforming', target_lag='1 hour') }}

-- Incremental materialization (large fact tables):
{{ config(materialized='incremental', unique_key='event_id') }}

-- Snowflake-specific configs (combine with any materialization):
{{ config(transient=true, copy_grants=true, query_tag='team_daily') }}

Error	Cause	Fix
"Object does not exist"	Wrong database/schema context or missing grants	Fully qualify names (`db.schema.table`), check grants
"Invalid identifier" in procedure	Missing colon prefix on variable	Use `:variable_name` inside SQL statements
"Numeric value not recognized"	VARIANT field not cast	Cast explicitly: `src:field::NUMBER(10,2)`
Task not running	Forgot to resume after creation	`ALTER TASK task_name RESUME;`
DT refresh failing	Schema change upstream or tracking disabled	Use explicit columns, verify change tracking
TO_FILE error	Combined path as single argument	Split into two args: `TO_FILE('@stage', 'file.pdf')`

Anti-Pattern	Why It Fails	Better Approach
`SELECT *` in Dynamic Tables	Schema changes upstream break the DT silently	Use explicit column lists
Missing colon prefix in procedures	"Invalid identifier" runtime error	Always use `:variable_name` in SQL blocks
Single warehouse for all workloads	Contention between load, transform, and query	Separate warehouses per workload type
Hardcoded credentials in Snowpark	Security risk, breaks in CI/CD	Use `os.environ[]` or key pair auth
`collect()` on large DataFrames	Pulls entire result set to client memory	Process server-side with DataFrame operations
Nested subqueries instead of CTEs	Unreadable, hard to debug, Snowflake optimizes CTEs better	Use `WITH` clauses
Using deprecated Cortex functions	`CLASSIFY_TEXT`, `SUMMARIZE` etc. will be removed	Use `AI_CLASSIFY`, `AI_COMPLETE` etc.
Tasks without `WHEN SYSTEM$STREAM_HAS_DATA`	Task runs on schedule even with no new data, wasting credits	Add the WHEN clause for stream-driven tasks
Double-quoted identifiers	Forces case-sensitive names across all queries	Use `snake_case` unquoted identifiers

Skill	Relationship
`engineering/sql-database-assistant`	General SQL patterns — use for non-Snowflake databases
`engineering/database-designer`	Schema design — use for data modeling before Snowflake implementation
`engineering-team/senior-data-engineer`	Broader data engineering — pipelines, Spark, Airflow, data quality
`engineering-team/senior-data-scientist`	Analytics and ML — use alongside Snowpark for feature engineering
`engineering-team/senior-devops`	CI/CD for Snowflake deployments (Terraform, GitHub Actions)

Document	Contents
`references/snowflake_sql_and_pipelines.md`	SQL patterns, MERGE templates, Dynamic Table debugging, Snowpipe, anti-patterns
`references/cortex_ai_and_agents.md`	Cortex AI functions, agent spec structure, Cortex Search, Snowpark
`references/troubleshooting.md`	Error reference, debugging queries, common fixes

Snowflake Development

Quick Start

Snowflake Development

Quick Start

SQL Best Practices

Naming and Style

Stored Procedures -- Colon Prefix Rule

Semi-Structured Data

MERGE for Upserts

Data Pipelines

Choosing Your Approach

Dynamic Tables

Streams and Tasks

Cortex AI

Function Reference

TO_FILE -- Common Pitfall

Cortex Agents

Snowpark Python

dbt on Snowflake

Performance

Security

Proactive Triggers

Common Errors

Practical Workflows

Workflow 1: Build a Reporting Pipeline (30 min)

Workflow 2: Add AI Classification to Existing Data

Workflow 3: Debug a Failing Pipeline

Anti-Patterns

Cross-References

Reference Documentation

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns