Answer questions from Excel files using natural language. Handles files with unknown schemas, headers on any row, and large datasets (100k+ rows). Use when the user provides an Excel file path, sheet number, and a question to answer. Supports lookups ("What is the value for X?") and aggregations ("Total sales by region"). Returns text summary answers.
Answer natural language questions from Excel files by discovering schema, writing pandas queries, and summarizing results.
Run the schema discovery script to understand the Excel structure:
python3 {SKILL_PATH}/scripts/discover_schema.py "<file_path>" --sheet <sheet_number>
This returns:
Based on the schema and user question, write pandas code to answer the question.
Key considerations:
header=HEADER_ROW - 1 (convert to 0-indexed).loc[] with conditions.sum(), .mean(), .groupby().str.lower() or case=FalseSee references/query-patterns.md for common patterns.
Execute the pandas code and format the result as a clear text summary.
Example outputs:
User: "What is the total revenue for Q1 2024?"
Process:
python3 scripts/discover_schema.py "sales.xlsx" --sheet 0
Output shows:
import pandas as pd
df = pd.read_excel("sales.xlsx", sheet_name=0, header=2) # Row 3 = index 2
q1_revenue = df.loc[df['Quarter'] == 'Q1 2024', 'Revenue'].sum()
print(f"${q1_revenue:,.2f}")
If the user's question mentions a column that doesn't exist, check for similar column names in the schema and clarify with the user if needed.
For queries that return many rows, summarize the top results:
If a lookup returns empty results, state this clearly:
If numeric columns show as text type with currency/percentage symbols, clean them:
df['Amount'] = df['Amount'].replace('[\$,]', '', regex=True).astype(float)