Use this skill when the user uploads Excel (.xlsx/.xls) or CSV files and wants to perform data analysis, generate statistics, create summaries, pivot tables, SQL queries, or any form of structured data exploration. Supports multi-sheet Excel workbooks, aggregation, filtering, joins, and exporting results to CSV/JSON/Markdown.
This skill analyzes user-uploaded Excel/CSV files using DuckDB — an in-process analytical SQL engine. It supports schema inspection, SQL-based querying, statistical summaries, and result export, all through a single Python script.
When a user uploads data files and requests analysis, identify:
/mnt/user-data/uploads//mnt/user-dataFirst, inspect the uploaded file to understand its schema:
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action inspect
This returns:
Based on the schema, construct SQL queries to answer the user's questions.
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action query \
--sql "SELECT category, COUNT(*) as count, AVG(amount) as avg_amount FROM Sheet1 GROUP BY category ORDER BY count DESC"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action summary \
--table Sheet1
This returns for each numeric column: count, mean, std, min, 25%, 50%, 75%, max, null_count. For string columns: count, unique, top value, frequency, null_count.
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action query \
--sql "SELECT * FROM Sheet1 WHERE amount > 1000" \
--output-file /mnt/user-data/outputs/filtered-results.csv
Supported output formats (auto-detected from extension):
.csv — Comma-separated values.json — JSON array of records.md — Markdown table| Parameter | Required | Description |
|---|---|---|
--files | Yes | Space-separated paths to Excel/CSV files |
--action | Yes | One of: inspect, query, summary |
--sql | For query | SQL query to execute |
--table | For summary | Table/sheet name to summarize |
--output-file | No | Path to export results (CSV/JSON/MD) |
[!NOTE] Do NOT read the Python file, just call it with the parameters.
Sheet1, Sales, Revenue)data.csv → data)"2024_Sales"-- Row count
SELECT COUNT(*) FROM Sheet1
-- Distinct values in a column
SELECT DISTINCT category FROM Sheet1
-- Value distribution
SELECT category, COUNT(*) as cnt FROM Sheet1 GROUP BY category ORDER BY cnt DESC
-- Date range
SELECT MIN(date_col), MAX(date_col) FROM Sheet1
-- Revenue by category and month
SELECT category, DATE_TRUNC('month', order_date) as month,
SUM(revenue) as total_revenue
FROM Sales
GROUP BY category, month
ORDER BY month, total_revenue DESC
-- Top 10 customers by spend
SELECT customer_name, SUM(amount) as total_spend
FROM Orders GROUP BY customer_name
ORDER BY total_spend DESC LIMIT 10
-- Join sales with customer info from different files
SELECT s.order_id, s.amount, c.customer_name, c.region
FROM sales s
JOIN customers c ON s.customer_id = c.id
WHERE s.amount > 500
-- Running total and rank
SELECT order_date, amount,
SUM(amount) OVER (ORDER BY order_date) as running_total,
RANK() OVER (ORDER BY amount DESC) as amount_rank
FROM Sales
-- Pivot: monthly revenue by category
SELECT category,
SUM(CASE WHEN MONTH(date) = 1 THEN revenue END) as Jan,
SUM(CASE WHEN MONTH(date) = 2 THEN revenue END) as Feb,
SUM(CASE WHEN MONTH(date) = 3 THEN revenue END) as Mar
FROM Sales
GROUP BY category
User uploads sales_2024.xlsx (with sheets: Orders, Products, Customers) and asks: "Analyze my sales data — show top products by revenue and monthly trends."
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action inspect
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action query \
--sql "SELECT p.product_name, SUM(o.quantity * o.unit_price) as total_revenue, SUM(o.quantity) as total_units FROM Orders o JOIN Products p ON o.product_id = p.id GROUP BY p.product_name ORDER BY total_revenue DESC LIMIT 10"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action query \
--sql "SELECT DATE_TRUNC('month', order_date) as month, SUM(quantity * unit_price) as revenue FROM Orders GROUP BY month ORDER BY month" \
--output-file /mnt/user-data/outputs/monthly-trends.csv
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action summary \
--table Orders
Present results to the user with clear explanations of findings, trends, and actionable insights.
User uploads orders.csv and customers.xlsx and asks: "Which region has the highest average order value?"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/orders.csv /mnt/user-data/uploads/customers.xlsx \
--action query \
--sql "SELECT c.region, AVG(o.amount) as avg_order_value, COUNT(*) as order_count FROM orders o JOIN Customers c ON o.customer_id = c.id GROUP BY c.region ORDER BY avg_order_value DESC"
After analysis:
present_files toolThe script automatically caches loaded data to avoid re-parsing files on every call:
/mnt/user-data/workspace/.data-analysis-cache/This is especially useful when running multiple queries against the same data files (inspect → query → summary).
DATE_TRUNC, EXTRACT, etc.)"Column Name"