Professional Excel file creation, editing, analysis, and automation. Use this skill any time a spreadsheet file is the primary input or output. Covers: creating new .xlsx files from scratch or from data; reading, editing, or fixing existing .xlsx, .xlsm, .csv, .tsv files; adding columns, formulas, formatting, charts, conditional formatting; cleaning messy tabular data; building financial models, dashboards, KPI trackers, and reports; converting between tabular file formats. Trigger keywords: xlsx, spreadsheet, Excel, csv, tsv, workbook, pivot table, chart, formula, financial model, dashboard, data analysis, openpyxl, pandas. Do NOT trigger when the primary deliverable is a Word document, PDF, HTML report, standalone Python script, or database pipeline — even if tabular data is involved.
This skill teaches the agent how to create, edit, analyze, and automate Excel files (.xlsx) using Python with openpyxl and pandas. It includes a formula recalculation pipeline powered by LibreOffice.
pip install openpyxl pandas
LibreOffice must be installed for formula recalculation. The included
scripts/recalc.py handles macro setup automatically on first run.
# Ubuntu/Debian
sudo apt-get install -y libreoffice
# macOS
brew install --cask libreoffice
Unless otherwise stated by the user or an existing template:
| Color | RGB | Usage |
|---|---|---|
| Blue text | 0,0,255 | Hardcoded inputs / assumptions |
| Black text | 0,0,0 | ALL formulas and calculations |
| Green text | 0,128,0 | Cross-sheet links within same workbook |
| Red text | 255,0,0 | External links to other files |
| Yellow background | 255,255,0 | Key assumptions needing attention |
$#,##0;($#,##0);"-"(123) not minus -123=B5*(1+$B$6)=B5*1.05Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'
This applies to ALL calculations — totals, percentages, ratios, differences, etc.
python3 {baseDir}/scripts/recalc.py output.xlsx
status is errors_found, check error_summary for specific error
types and locations.#REF! — Invalid cell references#DIV/0! — Division by zero#VALUE! — Wrong data type in formula#NAME? — Unrecognized formula nameimport pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Analyze
df.head() # Preview
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)
df = pd.read_excel('file.xlsx', dtype={'id': str})
df = pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
df = pd.read_excel('file.xlsx', parse_dates=['date_column'])
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
from openpyxl import load_workbook
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName']
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2)
sheet.delete_cols(3)
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
Excel files created or modified by openpyxl contain formulas as strings but
not calculated values. Use the provided scripts/recalc.py to recalculate:
python3 {baseDir}/scripts/recalc.py <excel_file> [timeout_seconds]
Example:
python3 {baseDir}/scripts/recalc.py output.xlsx 30
The script:
{
"status": "success",
"total_errors": 0,
"total_formulas": 42,
"error_summary": {}
}
If errors are found:
{
"status": "errors_found",
"total_errors": 2,
"total_formulas": 42,
"error_summary": {
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
from openpyxl.chart import BarChart, LineChart, PieChart, Reference
# Bar Chart
chart = BarChart()
chart.title = "Sales by Quarter"
chart.y_axis.title = "Revenue ($)"
chart.x_axis.title = "Quarter"
data = Reference(sheet, min_col=2, min_row=1, max_col=5, max_row=4)
cats = Reference(sheet, min_col=1, min_row=2, max_row=4)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
chart.shape = 4
sheet.add_chart(chart, "G2")
# Line Chart
line = LineChart()
line.title = "Trend"
line.add_data(data, titles_from_data=True)
line.set_categories(cats)
sheet.add_chart(line, "G18")
from openpyxl.formatting.rule import (
CellIsRule, DataBarRule, ColorScaleRule, FormulaRule
)
# Highlight cells > 100
sheet.conditional_formatting.add(
'B2:B20',
CellIsRule(operator='greaterThan', formula=['100'],
fill=PatternFill(start_color='00B050'))
)
# Data bars
sheet.conditional_formatting.add(
'C2:C20',
DataBarRule(start_type='min', end_type='max',
color='638EC6', showValue=True)
)
# Color scale (red-yellow-green)
sheet.conditional_formatting.add(
'D2:D20',
ColorScaleRule(start_type='min', start_color='FF0000',
mid_type='percentile', mid_value=50, mid_color='FFFF00',
end_type='max', end_color='00B050')
)
from openpyxl.workbook.defined_name import DefinedName
from openpyxl.worksheet.datavalidation import DataValidation
# Named Range
ref = DefinedName('SalesData', attr_text="Sheet1!$A$1:$D$100")
wb.defined_names.add(ref)
# Data Validation — dropdown list
dv = DataValidation(type="list", formula1='"Yes,No,Maybe"', allow_blank=True)
dv.error = "Please select from the dropdown"
dv.errorTitle = "Invalid Entry"
sheet.add_data_validation(dv)
dv.add('E2:E100')
from openpyxl.worksheet.table import Table, TableStyleInfo
from openpyxl.utils import get_column_letter
table = Table(
displayName="SalesTable",
ref=f"A1:{get_column_letter(last_col)}{last_row}"
)
table.tableStyleInfo = TableStyleInfo(
name="TableStyleMedium9",
showFirstColumn=False,
showLastColumn=False,
showRowStripes=True,
showColumnStripes=False
)
sheet.add_table(table)
# Set Right-to-Left layout for Arabic sheets
def set_rtl(ws):
ws.sheet_view.rightToLeft = True
for sheet in wb.worksheets:
set_rtl(sheet)
COLORS = {
'input': '0000FF', # Blue — manual inputs
'formula': '000000', # Black — formulas
'crossref': '008000', # Green — cross-sheet links
'warning': 'FFFF00', # Yellow background — attention
'header_bg': '1F3864', # Dark blue — table header background
'header_fg': 'FFFFFF', # White — header text
'subheader': 'D6E4F0', # Light blue — sub-header
'total': 'E8F4FD', # Pale blue — totals row
'positive': '00B050', # Green — positive values
'negative': 'FF0000', # Red — negative values / warnings
}
pd.notna()/ in formulasSheet1!A1data_only=True to read calculated values: load_workbook('f.xlsx', data_only=True)data_only=True and saved, formulas are
permanently replaced with cached values.read_only=True for reading, write_only=True for writing.scripts/recalc.py.