Audits, designs, and restructures spreadsheet workbooks — validates data quality, traces and fixes formulas, designs pivot tables, and documents cross-sheet architecture. Trigger: "audit this spreadsheet", "fix this formula", "pivot table design", "data validation", "spreadsheet architecture", "formula error", "variance analysis", "what-if scenario", "clean up this workbook". Do NOT use for generating .xlsx files from scratch (use xlsx-generation), for database queries (use sqlite/postgresql), or for Python/pandas data analysis (use python).
merceralex397-collab0 星標2026年3月17日
職業
分類
教育
技能內容
Provides a structured methodology for auditing, analyzing, and improving spreadsheet
workbooks. Covers data quality validation, formula auditing and error correction,
pivot table design, cross-sheet architecture, and common analytical patterns so that
workbook deliverables are reliable, maintainable, and well-documented.
When to use this skill
User asks to audit, review, or debug an existing spreadsheet or workbook
Task requires formula tracing, error detection, or fixing circular references
User needs help designing pivot tables, dashboards, or summary views
Workbook architecture needs restructuring: naming conventions, sheet layout, data flow
User requests variance analysis, trend analysis, cohort analysis, or what-if modeling
Data quality concerns: duplicates, missing values, inconsistent formats, validation rules
User asks "why is this formula returning #REF!" or "this spreadsheet is a mess"
Do not use this skill when
The task is to generate a new .xlsx file from structured data — use xlsx-generation
The user needs to build a financial tracker from scratch — use
相關技能
financial-tracker-ops
The analysis belongs in Python/pandas or SQL, not a spreadsheet — use python or sqlite
The task is data visualization only (chart design without spreadsheet context) — use spreadsheet for basic charting
The user wants to extract data from a PDF table — use table-extraction first, then this skill
Operating procedure
Step 1 — Workbook intake and orientation
Identify all sheets, their purpose, and the data flow between them (which sheets feed which).
Document the workbook's intended audience and decision it supports.
Note the file format (xlsx, Google Sheets, CSV collection) and any external data connections.
Count rows, columns, and approximate formula density per sheet.
Identify the "source of truth" sheets (raw data) vs. derived sheets (calculations, summaries).
Step 2 — Data quality audit
Run these checks against every data sheet:
Check
Method
Pass criteria
Type consistency
Verify each column has a single data type (text, number, date, boolean)
No mixed types in a column
Range validation
Check numeric columns for values outside expected bounds
No out-of-range values, or flagged with explanation
Duplicate detection
Check for duplicate rows on the primary key columns
Zero exact duplicates, near-duplicates flagged
Completeness
Calculate % populated for each column
Critical columns ≥ 95% populated
Date format consistency
Verify all dates use a single format (ISO 8601 preferred)
One format per column
Trailing whitespace / invisible chars
TRIM and CLEAN checks
No hidden characters affecting lookups
Referential integrity
VLOOKUP/INDEX-MATCH keys exist in source tables
Zero orphaned references
Produce a Data Quality Report summarizing findings per sheet.
Step 3 — Formula auditing
For each sheet with formulas:
Precedent tracing — map which cells feed into each formula; document the dependency chain.
Dependent tracing — identify all cells that depend on a given input; flag single-point-of-failure inputs.
Error detection — scan for:
#REF! — broken references from deleted rows/columns/sheets
#N/A — failed lookups; check for type mismatches or trailing whitespace
#VALUE! — type errors in arithmetic
#DIV/0! — division by zero without IFERROR wrapper
#NAME? — misspelled function names or missing named ranges
Circular references — trace the loop and propose a resolution
Volatile function identification — flag NOW(), TODAY(), RAND(), INDIRECT(), OFFSET() usage; these recalculate on every edit and slow large workbooks.
Hardcoded values in formulas — identify magic numbers embedded in formulas that should be named ranges or input cells.
Inconsistent formulas in ranges — check that a formula copied down a column is consistent (no skipped rows or manually overridden cells).
Produce a Formula Audit Report with severity (Critical / Warning / Info) for each finding.
Step 4 — Pivot table and summary design
When the user needs aggregated views:
Dimension selection — identify the categorical fields to group by (rows and columns).
Aggregation choice — select appropriate aggregation: SUM for amounts, COUNT for occurrences, AVERAGE for rates, MEDIAN for skewed distributions.
Red/amber/green for KPI thresholds (define the thresholds explicitly)
Data bars for quick magnitude comparison
Icon sets sparingly (only when 3-state status adds clarity)
Chart type selection guide:
Bar/Column → comparing categories (e.g., sales by region)
Line → trends over time (e.g., monthly revenue)
Scatter → correlation between two variables (e.g., spend vs. conversion)
Pie → avoid; use stacked bar if part-of-whole is needed
Waterfall → showing incremental changes (e.g., bridge from budget to actual)
Combo → dual-axis for different scales (e.g., revenue bars + margin % line)
Formatting hygiene: consistent number formats, aligned decimal points, frozen header rows, print area defined.
Decision rules
Mixed data types in a column → always flag and fix before any analysis; a single text value in a numeric column breaks SUM/AVERAGE silently.
Formula error count > 0 → resolve all Critical errors before delivering; Warning-level errors must be documented if not fixed.
Hardcoded values in formulas → extract to named input cells in 100% of cases; zero tolerance for magic numbers.
Volatile functions → replace with static alternatives where possible (e.g., paste-as-values for timestamps); if volatile is required, document the performance impact.
Merged cells detected → unmerge and use "Center Across Selection" or reformat; merged cells break sorting, filtering, and formulas.
No data validation on input cells → add validation rules (dropdowns, numeric ranges, date constraints) for every user-editable input.
Circular reference → must be resolved or explicitly justified with iterative calculation settings documented.
Pivot table source is a fixed range → convert to a structured Table so the pivot auto-expands with new data.
Charts without titles or axis labels → always add; a chart without context is noise.
Output structure
Deliver these sections as applicable:
1. Data Quality Report
Per-sheet summary table of validation checks with pass/fail status, row counts affected,
and recommended fixes.
2. Formula Audit Report
Table of findings: cell reference, issue type, severity, current formula, recommended fix.
3. Analysis Workbook Design
Architecture diagram (text-based sheet map) showing data flow, named ranges index,
and input/output cell inventory.
4. Pivot Table Specifications
For each pivot: source table, row/column dimensions, aggregations, filters, calculated fields.
5. Results Summary
Narrative summary of analytical findings with supporting data tables and chart recommendations.
6. Formatting Guide
Conditional formatting rules, chart specifications, and print layout settings for the final workbook.
Anti-patterns
Hardcoded values in formulas — embedding 1.08 for tax rate inside =B2*1.08 instead of =B2*TaxRate; makes updates error-prone and auditing impossible.
Merged cells — break sorting, filtering, VLOOKUP, and pivot tables; the single most destructive formatting choice in spreadsheets.
Inconsistent date formats — mixing MM/DD/YYYY and DD/MM/YYYY in the same workbook causes silent misinterpretation; standardize to ISO 8601 or locale-consistent format.
No data validation on inputs — allowing free-text entry in cells that should be constrained leads to garbage-in-garbage-out.
Fixed-range pivot sources — using A1:Z100 instead of a structured Table means new data rows are silently excluded.
VLOOKUP instead of INDEX-MATCH — VLOOKUP breaks when columns are inserted; INDEX-MATCH is column-order independent.
Pie charts for more than 5 categories — becomes unreadable; use a sorted bar chart instead.
No documentation — a workbook without a README sheet, named ranges, or comments becomes unmaintainable within weeks.
Copy-paste instead of formulas — static values that look like live calculations; breaks on data refresh.
One massive sheet — cramming data, calculations, and presentation onto a single sheet; separate concerns into distinct sheets.
Related skills
xlsx-generation — creating new .xlsx files programmatically from structured data
spreadsheet — basic spreadsheet operations and simpler tasks
financial-tracker-ops — building and maintaining financial tracking workbooks
table-extraction — extracting tabular data from PDFs or images to feed into spreadsheet analysis
csv-ready — preparing clean CSV data for import into spreadsheets
document-to-structured-data — converting unstructured documents into tabular format
Failure handling
If the user provides no file or data, ask for the workbook or a representative sample before proceeding.
If the file format is unsupported or corrupted, state this clearly and suggest export to .xlsx or CSV.
If the workbook is too large to analyze fully (>50 sheets, >100K rows), prioritize: audit the summary/output sheets first, then trace back to source data on demand.
If formulas use platform-specific features (Google Sheets QUERY, Excel Power Query), note compatibility limitations and suggest portable alternatives.
If the analysis request is ambiguous ("make this spreadsheet better"), run Step 1–3 (intake, data quality, formula audit) as a baseline and present findings before deeper work.