Expert guidance for generating accurate, performant, and secure SQL queries. Focuses on schema-first prompting, index awareness, and handling complex joins/aggregations.
This skill provides professional mental models and patterns for Text-to-SQL generation. It ensures that AI agents generate SQL that is not only syntactically correct but also performant, secure, and aligned with the actual database schema.
| Feature |
|---|
| Natural Language |
|---|
| Relational Logic (SQL) |
|---|
| Ambiguity | High ("Get the best users") | Zero (Specific columns, specific criteria) |
| Context | Implicit | Explicit (Must know table/column names) |
| Operations | Descriptive | Set-based (Joins, Unions, Intersections) |
| Order | Chronological | Logical (FROM -> WHERE -> GROUP BY -> SELECT) |
An agent cannot guess your schema. You must provide it explicitly.
Pattern: Always include a compact schema definition in the prompt.
Example Prompt Fragment:
Context:
You are an expert SQL agent. Use the following schema:
- users (id INT PK, email VARCHAR, created_at TIMESTAMP)
- orders (id INT PK, user_id INT FK, total DECIMAL, status VARCHAR)
- order_items (id INT PK, order_id INT FK, product_id INT)
Prevent common LLM mistakes by setting strict constraints.
Pattern:
Agents often write "correct" SQL that kills database performance.
Guidelines:
SELECT *: Always specify columns. SELECT * breaks index coverage and wastes bandwidth.WHERE clauses can use indexes.
WHERE YEAR(created_at) = 2023 (Function on column prevents index use)WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'INNER JOIN over LEFT JOIN unless missing data is explicitly required.For complex logic, avoid deep nesting of subqueries.
Pattern: Use Common Table Expressions (CTEs) for readability.
Bad (Nested):
SELECT * FROM (SELECT user_id, count(*) as c FROM orders GROUP BY user_id) WHERE c > 5
Good (CTE):
WITH user_order_counts AS (
SELECT user_id, count(*) as order_count
FROM orders
GROUP BY user_id
)
SELECT * FROM user_order_counts WHERE order_count > 5
If an executed query fails, the agent must treat the error message as feedback.
Pattern:
Before executing generated SQL:
GROUP BY include all non-aggregated columns?❌ Ambiguous Joins: Joining tables without specifying the ON clause (implicit cross join).
❌ Recursive CTEs without Limit: Causing infinite loops.
❌ Mixing Dialects: Using ILIKE (Postgres) on MySQL (requires LIKE).